You can read our blog post about it here: https://getindata.com/blog/how-to-build-continuously-processing-for-24-7-real-time-data-streaming-platform/
Hot to build continuously processing for 24/7 real-time data streaming platform?
Good observability is essential for modern software. It gives us confidence that our systems are working properly. And it also allows us to debug issues efficiently. In this talk, we’ll explore everything you need to know to start applying good observability to your projects. And we’ll see the most common pitfalls you need to be aware of. We will start with the tools and basic concepts in monitoring. And we’ll go over the 3 most common mistakes people make with it. Then we’ll see how to have automatic alerts to detect issues. And, we’ll touch on the principles for setting up good alerts. As a final step, we’ll see how to build our logging system and how to apply it in the most efficient way to debug issues easily.
Nanog75, Network Device Property as CodeDamien Garros
Device configuration templates have simplified a lot of things for the network industry but many networks are still managing their device properties (aka variables) manually which is very tedious and error prone. This talk will present a new approach to generate and manage network device properties easily using infrastructure as code principles.
DCEU 18: From Monolith to MicroservicesDocker, Inc.
Jeff Nickoloff - Co-founder, Topple
Growth can be challenging to address once monolithic systems begin to fail under strain or internal software development processes begin to slow the release cadence. Many organizations are looking to microservices architecture to solve these application issues, whether they plan to write new applications or rewrite the monoliths into microservices. This talk will highlight the common technical and cultural issues that will make microservice architectures a challenge to adopt and maintain. Issues include impact of Dunbar's Number and Conway's Law, build-time vs runtime continuous integration, evolution of testability, API versioning impact, logistics overhead, artifact management, and strategies for iteration in a distributed environment. Attendees will learn: - How and why microservice architectures and ownership end up falling along organizational lines (and why that is a good thing) - How we can learn from monolith tooling to inform our tooling in a microservice environment - How you can achieve operational excellence at scale taking a logistical approach with Docker.
Infrastructure as Code, tools, benefits, paradigms and more.
Presentation from DigitalOnUs DevOps: Infrastructure as Code Meetup (September 20, 2018 - Monterrey Nuevo Leon MX)
Uncover the mysteries of infrastructure as code (iac)!Prashant Kalkar
In the era of cloud and containerisation, infrastructure as code (IAC) is invaluable. In this talk, we will explore the evolution of Infrastructure practices and tools. We will further look at the practices and tools before the emergence of the clouds. Then we will explore how the rise of the cloud changed the infrastructure automation practices and made the IAC a mainstream practice.
We will also explore what it means to treat infrastructure as code. We will talk about Code vs Configuration, versioning, Configurability vs Standardisation, Modularity and code organisation for infrastructure code.
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...GetInData
Did you like it? Check out our E-book: Apache NiFi - A Complete Guide
https://ebook.getindata.com/apache-nifi-complete-guide
Apache NiFi is one of the most popular services for running ETL pipelines otherwise it’s not the youngest technology. During the talk, there are described all details about migrating pipelines from the old Hadoop platform to the Kubernetes, managing everything as the code, monitoring all corner cases of NiFi and making it a robust solution that is user-friendly even for non-programmers.
Author: Albert Lewandowski
Linkedin: https://www.linkedin.com/in/albert-lewandowski/
___
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
DCEU 18: 5 Patterns for Success in Application TransformationDocker, Inc.
Elton Stoneman - Developer Advocate, Docker
Legacy applications weren't designed to run in a modern distributed platform like Docker. They have their own ideas about logging, configuration and health which don't translate to the world of containers and make transformation projects hard.
This session shows you how to bring your old applications into the modern world, and integrate them with Docker - without changing code. We'll cover patterns for all the core application concerns:
* logging
* configuration
* monitoring
* health
* dependency management
The sample applications will be in .NET and Java, and will show you how to turn your existing applications into good Docker citizens.
Good observability is essential for modern software. It gives us confidence that our systems are working properly. And it also allows us to debug issues efficiently. In this talk, we’ll explore everything you need to know to start applying good observability to your projects. And we’ll see the most common pitfalls you need to be aware of. We will start with the tools and basic concepts in monitoring. And we’ll go over the 3 most common mistakes people make with it. Then we’ll see how to have automatic alerts to detect issues. And, we’ll touch on the principles for setting up good alerts. As a final step, we’ll see how to build our logging system and how to apply it in the most efficient way to debug issues easily.
Nanog75, Network Device Property as CodeDamien Garros
Device configuration templates have simplified a lot of things for the network industry but many networks are still managing their device properties (aka variables) manually which is very tedious and error prone. This talk will present a new approach to generate and manage network device properties easily using infrastructure as code principles.
DCEU 18: From Monolith to MicroservicesDocker, Inc.
Jeff Nickoloff - Co-founder, Topple
Growth can be challenging to address once monolithic systems begin to fail under strain or internal software development processes begin to slow the release cadence. Many organizations are looking to microservices architecture to solve these application issues, whether they plan to write new applications or rewrite the monoliths into microservices. This talk will highlight the common technical and cultural issues that will make microservice architectures a challenge to adopt and maintain. Issues include impact of Dunbar's Number and Conway's Law, build-time vs runtime continuous integration, evolution of testability, API versioning impact, logistics overhead, artifact management, and strategies for iteration in a distributed environment. Attendees will learn: - How and why microservice architectures and ownership end up falling along organizational lines (and why that is a good thing) - How we can learn from monolith tooling to inform our tooling in a microservice environment - How you can achieve operational excellence at scale taking a logistical approach with Docker.
Infrastructure as Code, tools, benefits, paradigms and more.
Presentation from DigitalOnUs DevOps: Infrastructure as Code Meetup (September 20, 2018 - Monterrey Nuevo Leon MX)
Uncover the mysteries of infrastructure as code (iac)!Prashant Kalkar
In the era of cloud and containerisation, infrastructure as code (IAC) is invaluable. In this talk, we will explore the evolution of Infrastructure practices and tools. We will further look at the practices and tools before the emergence of the clouds. Then we will explore how the rise of the cloud changed the infrastructure automation practices and made the IAC a mainstream practice.
We will also explore what it means to treat infrastructure as code. We will talk about Code vs Configuration, versioning, Configurability vs Standardisation, Modularity and code organisation for infrastructure code.
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...GetInData
Did you like it? Check out our E-book: Apache NiFi - A Complete Guide
https://ebook.getindata.com/apache-nifi-complete-guide
Apache NiFi is one of the most popular services for running ETL pipelines otherwise it’s not the youngest technology. During the talk, there are described all details about migrating pipelines from the old Hadoop platform to the Kubernetes, managing everything as the code, monitoring all corner cases of NiFi and making it a robust solution that is user-friendly even for non-programmers.
Author: Albert Lewandowski
Linkedin: https://www.linkedin.com/in/albert-lewandowski/
___
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
DCEU 18: 5 Patterns for Success in Application TransformationDocker, Inc.
Elton Stoneman - Developer Advocate, Docker
Legacy applications weren't designed to run in a modern distributed platform like Docker. They have their own ideas about logging, configuration and health which don't translate to the world of containers and make transformation projects hard.
This session shows you how to bring your old applications into the modern world, and integrate them with Docker - without changing code. We'll cover patterns for all the core application concerns:
* logging
* configuration
* monitoring
* health
* dependency management
The sample applications will be in .NET and Java, and will show you how to turn your existing applications into good Docker citizens.
OSMC 2021 | Use OpenSource monitoring for an Enterprise Grade PlatformNETWAYS
There are many tools and frameworks for monitoring. Usually when you think of an Open Source solution, you don’t think to implement it in a COTS product. Nevertheless, this session will tell you how you can implement tools such as Prometheus, Grafana and ELK into such an Enterprise application platform. Monitoring performance, throughput and error rate is important to be in control of your transactions. If you use a Service Bus or SOA/BPM suite product there are a lot out of the box diagnostics waiting for you. The puzzle here is how to get it out in a useful way. Besides of the many commercial solutions also Open Source tools can help you out with it. You can export runtime diagnostics out of the Diagnostics framework, monitor your SOA Composites and trace down Service Bus statistics using Prometheus and Grafana. The session will elaborate how to set up a proper monitoring using these tools, also in a proactive way where automated monitoring is a must for every application environment.
Maxime Petazzoni, Software Engineer at SignalFx, presents how we use Docker and how we monitor containers in production.
SignalFx has been using using Docker since November 2013. We have running Docker in prod ever since we’ve had a “prod” and back when Docker’s README said “DO NOT RUN IN PRODUCTION”.
Ansiblefest 2018 Network automation journey at robloxDamien Garros
In December 2017, Roblox’s network was managed in a traditional way without automation.
To sustained its growth, the team had to deploy 2 datacenters, a global network and multiple point of presence around the world in few months, the only solution to be able to achieve that was to automate everything.
6 months later, the team has made tremendous progress and many aspects of the network lifecycle has been automated from the routers, switches to the load balancers.
Synopsis
This talk is a retrospective of Roblox’s journey into Network automation:
How we got started and how we automated an existing network.
How we organized the project around Github and an DCIM/IPAM solution (netbox),
How Docker helped us to package Ansible and create a consistent environment.
How we managed many roles and variations of our design in single project
How we have automated the provisioning of our F5 Load Balancers.
For each point, we’ll cover what was successful, what was more challenging and what limitations we had to deal with.
Bringing DevOps to Routing with evolved XR: an overviewCisco DevNet
A session in the DevNet Zone at Cisco Live, Berlin. This session is a fresh perspective on the routing world, focused on the growing influence of DevOps style workflows in routing deployments across Web scale service providers. With the adoption of a 64-bit linux OS, support for Linux containers (LXC/Docker) and an open architecture that enables automated configuration management off the bat, the evolution of IOS-XR has placed it right in the midst of DevOps and SDN. In this session we dive deep into the application-hosting infrastructure, Modular software delivery techniques and support for zero touch provisioning and configuration management tools that integrate seamlessly with the M2M interfaces exposed by IOS XR. We look at deployment techniques of web scale service providers that is gradually influencing the rest of the market and introduce a variety of use cases around automated NetOps, traffic-engineering, Telemetry and data-center cluster schedulers that showcase the power of an open, automatable network operating system.
Netflix Open Source: Building a Distributed and Automated Open Source Programaspyker
Netflix has been using and contributing to open source for several years. Over the years, Netflix has released over one hundred Netflix Open Source (aka NetflixOSS) libraries, servers, and technologies. Netflix engineers benefit by accepting contributions and gathering feedback with key collaborators around the world. Users of NetflixOSS from many industries benefit from our solutions including Big Data, Build and Delivery Tools, Runtime Services and Libraries, Data Persistence, Insight, Reliability and Performance, Security and User Interface. With such a large and mature open source program, Netflix has worked on approaches and tools that help manage and improve the NetflixOSS source offerings and communities. Netflix has taken a different approach to building support for open source as compared to other Internet scale companies. Come to this session to learn about the unique approaches Netflix has taken to both distribute and automate the responsibilities of building a world-class open source program.
Video: https://youtu.be/T0L0JxDaPkc
RSVP Here: https://www.eventbrite.com/e/full-day-workshop-kubeflow-kerastensorflow-20-tf-extended-tfx-kubernetes-pytorch-xgboost-airflow-tickets-63362929227
Description
In this workshop, we build real-world machine learning pipelines using TensorFlow Extended (TFX), KubeFlow, Airflow, and MLflow.
Described in the 2017 paper, TFX is used internally by thousands of Google data scientists and engineers across every major product line within Google.
KubeFlow is a modern, end-to-end pipeline orchestration framework that embraces the latest AI best practices including hyper-parameter tuning, distributed model training, and model tracking.
Airflow is the most-widely used pipeline orchestration framework in machine learning and data engineering.
MLflow is a lightweight experiment-tracking system recently open-sourced by Databricks, the creators of Apache Spark. MLflow supports Python, Java/Scala, and R - and offers native support for TensorFlow, Keras, and Scikit-Learn.
Pre-requisites
Modern browser - and that's it!
Every attendee will receive a cloud instance
Nothing will be installed on your local laptop
Everything can be downloaded at the end of the workshop
Location
Online Workshop
The link will be sent a few hours before the start of the workshop.
Only registered users will receive the link.
If you do not receive the link a few hours before the start of the workshop, please send your Eventbrite registration confirmation to support@pipeline.ai for help.
Agenda
1. Create a Kubernetes cluster
2. Install KubeFlow, Airflow, TFX, and Jupyter
3. Setup ML Training Pipelines with KubeFlow and Airflow
4. Transform Data with TFX Transform
5. Validate Training Data with TFX Data Validation
6. Train Models with Jupyter, Keras/TensorFlow 2.0, PyTorch, XGBoost, and KubeFlow
7. Run a Notebook Directly on Kubernetes Cluster with KubeFlow
8. Analyze Models using TFX Model Analysis and Jupyter
9. Perform Hyper-Parameter Tuning with KubeFlow
10. Select the Best Model using KubeFlow Experiment Tracking
11. Run Multiple Experiments with MLflow Experiment Tracking
12. Reproduce Model Training with TFX Metadata Store
13. Deploy the Model to Production with TensorFlow Serving and Istio
14. Save and Download your Workspace
Key Takeaways
Attendees will gain experience training, analyzing, and serving real-world Keras/TensorFlow 2.0 models in production using model frameworks and open-source tools.
RSVP Here: https://www.eventbrite.com/e/full-day-workshop-kubeflow-kerastensorflow-20-tf-extended-tfx-kubernetes-pytorch-xgboost-airflow-tickets-63362929227
https://youtu.be/T0L0JxDaPkc
NetflixOSS Meetup S3 E1, covering latest components in Distributed Databases, Telemetry systems, Big Data tools and more. Speakers from Netflix, IBM Watson, Pivotal and Nike Digital
OSDC 2018 | Three years running containers with Kubernetes in Production by T...NETWAYS
The talk gives a state of the art update of experiences with deploying applications in Kubernetes on scale. If in clouds or on premises, Kubernetes took over the leading role as a container operating system. The central paradigm of stateless containers connected to storage and services is the core of Kubernetes. However, it can be extended to distributed databases, Machine Learning, Windows VMs in Kubernetes. All these applications have been considered as edge cases a few years ago, however, are going more and more mainstream today.
Storage Spaces Direct - the new Microsoft SDS star - Carsten RachfahlITCamp
Storage Spaces Direct will provide new unseen possibilities for Microsoft Hypervisor Hyper-V. These are on one hand a high performant, high available Scale-Out Fileserver with the possibility to use internal not shared disks like SATA HDDs and SSDs and even NVMe Devices. On the other hand, you can build a Hyper-converged Hyper-V Cluster where the VMs and their Storage are running on the same Servers. And let’s not forget Azure Stack! The first version of Microsoft Private/Hosted Cloud solution will only be supported on the hyper-converged S2D infrastructure. Join this session to learn about this great new technology that will have its role in the future Private and Hosted Cloud infrastructure implementations.
Supporting Digital Media Workflows in the Cloud with Perforce HelixPerforce
Walk through a distributed, non-destructive digital media workflow with graphics, audio and video media from start to finish. Learn the pain points and challenges of versioning increasingly large and varied formats, and see various strategies and best practices for configuring and managing depots in Perforce Helix that facilitate collaborative creative work while minimizing large data transfers. You’ll leave this session with the insights and skills needed to securely support automated digital media workflows in your organization using the Perforce Helix platform with the latest cloud services.
Devops Columbia October 2020 - Gabriel Alix: A Discussion on TerraformDrew Malone
Wonder why you would want to use Terraform vs it competitors? Why not stick with CFNs, you ask? CDK should do the trick right? Come enjoy an opinionated take on using Terraform, for the betterment of your sanity. Also, includes a light intro to Terraform for those who are new to it.
Gabriel is a Cloud Technologist and accomplished Cyber practitioner who has led & built complex workloads across the IC for 20+ years. He's a native New Yorker from Washington Heights, with a boisterous laugh and calm demeanor. Gabriel has built a strong career starting in Federal service and has evolved into CTO and now VP of IC at Applied Insight. In addition to his technical accolades, he's a social leader that believes in building and growing strong teams
Netflix: From Zero to Production-Ready in Minutes (QCon 2017)Tim Bozarth
Slides from Tim Bozarth's (@timbozarth) QCon 2017 presentation (https://qconnewyork.com/ny2017/presentation/zero-production-ready-minutes)
Abstract:
The fabric of Netflix's approach to building new highly-available services is evolving. The Runtime Platform Team is focused on improving developer productivity while simultaneously making it simpler to build and maintain the high-availability services that Netflix expects. Starting with application generation, and leveraging a new approach to communication between services (RPC), we're simplifying what's needed to build a fast, reliable, and optimized service capable of delivering a fantastic customer experience.
We'll be sharing how Netflix is enabling engineers to go from "zero" to "production ready" in minutes - incorporating best-practices learned through years in the cloud. We will also share the story of transitioning from our home-grown RPC machinery to open-source standards, how we recognized when it was the right time to walk away from our own creations, and how our new approach is improving team velocity across Netflix engineering.
Series of Unfortunate Netflix Container Events - QConNYC17aspyker
Project Titus is Netflix's container runtime on top of Amazon EC2. Titus powers algorithm research through massively parallel model training, media encoding, data research notebooks, ad hoc reporting, NodeJS UI services, stream processing and general micro-services. As an update from last year's talk, we will focus on the lessons learned operating one of the largest container runtimes on a public cloud. We'll cover the migration we've seen of applications and frameworks from VM's to containers. We will cover the operational issues with containers that only showed after we reached the large scale (1000's of container hosts, 100's of thousands of containers launched weekly) we are currently supporting. We'll touch base on the unique features we have added to help both batch and microservices run across a variety of runtimes (Java, R, NodeJS, Python, etc) and how higher level frameworks have taken advantage of Titus's scheduling capabilities.
All the fundamental concepts and tools for understanding performance tuning in Java. Garbage collection, memory management and collector types and tools for profiling Java applications.
In any Cloud Native architecture there’s a seemingly endless stream of events that happen at each layer. These events can be used to detect abnormal activity and possible security incidents, as well as providing an audit trail of activity.
In this talk we’ll cover how we extended Falco to ingest events beyond just host system calls, such as Kubernetes audit events or even application level events. We will also show how to create Falco rules to detect behaviors in these new event streams. We show how we implemented Kubernetes audit events in Falco, and how to configure the event stream.
In this training webinar, Samantha Wang will walk you through the basics of Telegraf. Telegraf is the open source server agent which is used to collect metrics from your stacks, sensors and systems. It is InfluxDB’s native data collector that supports nearly 300 inputs and outputs. Learn how to send data from a variety of systems, apps, databases and services in the appropriate format to InfluxDB. Discover tips and tricks on how to write your own plugins. The know-how learned here can be applied to a multitude of use cases and sectors. This one-hour session will include the training and time for live Q&A.
Get Devops Training in Chennai with real-time experts at Besant Technologies, OMR. We believe that learning Devops with practical and theoretical will be the easiest way to understand the technology in quick manner. We designed this Devops from basic level to the latest advanced level
http://www.traininginsholinganallur.in/devops-training-in-chennai.html
OSMC 2021 | Use OpenSource monitoring for an Enterprise Grade PlatformNETWAYS
There are many tools and frameworks for monitoring. Usually when you think of an Open Source solution, you don’t think to implement it in a COTS product. Nevertheless, this session will tell you how you can implement tools such as Prometheus, Grafana and ELK into such an Enterprise application platform. Monitoring performance, throughput and error rate is important to be in control of your transactions. If you use a Service Bus or SOA/BPM suite product there are a lot out of the box diagnostics waiting for you. The puzzle here is how to get it out in a useful way. Besides of the many commercial solutions also Open Source tools can help you out with it. You can export runtime diagnostics out of the Diagnostics framework, monitor your SOA Composites and trace down Service Bus statistics using Prometheus and Grafana. The session will elaborate how to set up a proper monitoring using these tools, also in a proactive way where automated monitoring is a must for every application environment.
Maxime Petazzoni, Software Engineer at SignalFx, presents how we use Docker and how we monitor containers in production.
SignalFx has been using using Docker since November 2013. We have running Docker in prod ever since we’ve had a “prod” and back when Docker’s README said “DO NOT RUN IN PRODUCTION”.
Ansiblefest 2018 Network automation journey at robloxDamien Garros
In December 2017, Roblox’s network was managed in a traditional way without automation.
To sustained its growth, the team had to deploy 2 datacenters, a global network and multiple point of presence around the world in few months, the only solution to be able to achieve that was to automate everything.
6 months later, the team has made tremendous progress and many aspects of the network lifecycle has been automated from the routers, switches to the load balancers.
Synopsis
This talk is a retrospective of Roblox’s journey into Network automation:
How we got started and how we automated an existing network.
How we organized the project around Github and an DCIM/IPAM solution (netbox),
How Docker helped us to package Ansible and create a consistent environment.
How we managed many roles and variations of our design in single project
How we have automated the provisioning of our F5 Load Balancers.
For each point, we’ll cover what was successful, what was more challenging and what limitations we had to deal with.
Bringing DevOps to Routing with evolved XR: an overviewCisco DevNet
A session in the DevNet Zone at Cisco Live, Berlin. This session is a fresh perspective on the routing world, focused on the growing influence of DevOps style workflows in routing deployments across Web scale service providers. With the adoption of a 64-bit linux OS, support for Linux containers (LXC/Docker) and an open architecture that enables automated configuration management off the bat, the evolution of IOS-XR has placed it right in the midst of DevOps and SDN. In this session we dive deep into the application-hosting infrastructure, Modular software delivery techniques and support for zero touch provisioning and configuration management tools that integrate seamlessly with the M2M interfaces exposed by IOS XR. We look at deployment techniques of web scale service providers that is gradually influencing the rest of the market and introduce a variety of use cases around automated NetOps, traffic-engineering, Telemetry and data-center cluster schedulers that showcase the power of an open, automatable network operating system.
Netflix Open Source: Building a Distributed and Automated Open Source Programaspyker
Netflix has been using and contributing to open source for several years. Over the years, Netflix has released over one hundred Netflix Open Source (aka NetflixOSS) libraries, servers, and technologies. Netflix engineers benefit by accepting contributions and gathering feedback with key collaborators around the world. Users of NetflixOSS from many industries benefit from our solutions including Big Data, Build and Delivery Tools, Runtime Services and Libraries, Data Persistence, Insight, Reliability and Performance, Security and User Interface. With such a large and mature open source program, Netflix has worked on approaches and tools that help manage and improve the NetflixOSS source offerings and communities. Netflix has taken a different approach to building support for open source as compared to other Internet scale companies. Come to this session to learn about the unique approaches Netflix has taken to both distribute and automate the responsibilities of building a world-class open source program.
Video: https://youtu.be/T0L0JxDaPkc
RSVP Here: https://www.eventbrite.com/e/full-day-workshop-kubeflow-kerastensorflow-20-tf-extended-tfx-kubernetes-pytorch-xgboost-airflow-tickets-63362929227
Description
In this workshop, we build real-world machine learning pipelines using TensorFlow Extended (TFX), KubeFlow, Airflow, and MLflow.
Described in the 2017 paper, TFX is used internally by thousands of Google data scientists and engineers across every major product line within Google.
KubeFlow is a modern, end-to-end pipeline orchestration framework that embraces the latest AI best practices including hyper-parameter tuning, distributed model training, and model tracking.
Airflow is the most-widely used pipeline orchestration framework in machine learning and data engineering.
MLflow is a lightweight experiment-tracking system recently open-sourced by Databricks, the creators of Apache Spark. MLflow supports Python, Java/Scala, and R - and offers native support for TensorFlow, Keras, and Scikit-Learn.
Pre-requisites
Modern browser - and that's it!
Every attendee will receive a cloud instance
Nothing will be installed on your local laptop
Everything can be downloaded at the end of the workshop
Location
Online Workshop
The link will be sent a few hours before the start of the workshop.
Only registered users will receive the link.
If you do not receive the link a few hours before the start of the workshop, please send your Eventbrite registration confirmation to support@pipeline.ai for help.
Agenda
1. Create a Kubernetes cluster
2. Install KubeFlow, Airflow, TFX, and Jupyter
3. Setup ML Training Pipelines with KubeFlow and Airflow
4. Transform Data with TFX Transform
5. Validate Training Data with TFX Data Validation
6. Train Models with Jupyter, Keras/TensorFlow 2.0, PyTorch, XGBoost, and KubeFlow
7. Run a Notebook Directly on Kubernetes Cluster with KubeFlow
8. Analyze Models using TFX Model Analysis and Jupyter
9. Perform Hyper-Parameter Tuning with KubeFlow
10. Select the Best Model using KubeFlow Experiment Tracking
11. Run Multiple Experiments with MLflow Experiment Tracking
12. Reproduce Model Training with TFX Metadata Store
13. Deploy the Model to Production with TensorFlow Serving and Istio
14. Save and Download your Workspace
Key Takeaways
Attendees will gain experience training, analyzing, and serving real-world Keras/TensorFlow 2.0 models in production using model frameworks and open-source tools.
RSVP Here: https://www.eventbrite.com/e/full-day-workshop-kubeflow-kerastensorflow-20-tf-extended-tfx-kubernetes-pytorch-xgboost-airflow-tickets-63362929227
https://youtu.be/T0L0JxDaPkc
NetflixOSS Meetup S3 E1, covering latest components in Distributed Databases, Telemetry systems, Big Data tools and more. Speakers from Netflix, IBM Watson, Pivotal and Nike Digital
OSDC 2018 | Three years running containers with Kubernetes in Production by T...NETWAYS
The talk gives a state of the art update of experiences with deploying applications in Kubernetes on scale. If in clouds or on premises, Kubernetes took over the leading role as a container operating system. The central paradigm of stateless containers connected to storage and services is the core of Kubernetes. However, it can be extended to distributed databases, Machine Learning, Windows VMs in Kubernetes. All these applications have been considered as edge cases a few years ago, however, are going more and more mainstream today.
Storage Spaces Direct - the new Microsoft SDS star - Carsten RachfahlITCamp
Storage Spaces Direct will provide new unseen possibilities for Microsoft Hypervisor Hyper-V. These are on one hand a high performant, high available Scale-Out Fileserver with the possibility to use internal not shared disks like SATA HDDs and SSDs and even NVMe Devices. On the other hand, you can build a Hyper-converged Hyper-V Cluster where the VMs and their Storage are running on the same Servers. And let’s not forget Azure Stack! The first version of Microsoft Private/Hosted Cloud solution will only be supported on the hyper-converged S2D infrastructure. Join this session to learn about this great new technology that will have its role in the future Private and Hosted Cloud infrastructure implementations.
Supporting Digital Media Workflows in the Cloud with Perforce HelixPerforce
Walk through a distributed, non-destructive digital media workflow with graphics, audio and video media from start to finish. Learn the pain points and challenges of versioning increasingly large and varied formats, and see various strategies and best practices for configuring and managing depots in Perforce Helix that facilitate collaborative creative work while minimizing large data transfers. You’ll leave this session with the insights and skills needed to securely support automated digital media workflows in your organization using the Perforce Helix platform with the latest cloud services.
Devops Columbia October 2020 - Gabriel Alix: A Discussion on TerraformDrew Malone
Wonder why you would want to use Terraform vs it competitors? Why not stick with CFNs, you ask? CDK should do the trick right? Come enjoy an opinionated take on using Terraform, for the betterment of your sanity. Also, includes a light intro to Terraform for those who are new to it.
Gabriel is a Cloud Technologist and accomplished Cyber practitioner who has led & built complex workloads across the IC for 20+ years. He's a native New Yorker from Washington Heights, with a boisterous laugh and calm demeanor. Gabriel has built a strong career starting in Federal service and has evolved into CTO and now VP of IC at Applied Insight. In addition to his technical accolades, he's a social leader that believes in building and growing strong teams
Netflix: From Zero to Production-Ready in Minutes (QCon 2017)Tim Bozarth
Slides from Tim Bozarth's (@timbozarth) QCon 2017 presentation (https://qconnewyork.com/ny2017/presentation/zero-production-ready-minutes)
Abstract:
The fabric of Netflix's approach to building new highly-available services is evolving. The Runtime Platform Team is focused on improving developer productivity while simultaneously making it simpler to build and maintain the high-availability services that Netflix expects. Starting with application generation, and leveraging a new approach to communication between services (RPC), we're simplifying what's needed to build a fast, reliable, and optimized service capable of delivering a fantastic customer experience.
We'll be sharing how Netflix is enabling engineers to go from "zero" to "production ready" in minutes - incorporating best-practices learned through years in the cloud. We will also share the story of transitioning from our home-grown RPC machinery to open-source standards, how we recognized when it was the right time to walk away from our own creations, and how our new approach is improving team velocity across Netflix engineering.
Series of Unfortunate Netflix Container Events - QConNYC17aspyker
Project Titus is Netflix's container runtime on top of Amazon EC2. Titus powers algorithm research through massively parallel model training, media encoding, data research notebooks, ad hoc reporting, NodeJS UI services, stream processing and general micro-services. As an update from last year's talk, we will focus on the lessons learned operating one of the largest container runtimes on a public cloud. We'll cover the migration we've seen of applications and frameworks from VM's to containers. We will cover the operational issues with containers that only showed after we reached the large scale (1000's of container hosts, 100's of thousands of containers launched weekly) we are currently supporting. We'll touch base on the unique features we have added to help both batch and microservices run across a variety of runtimes (Java, R, NodeJS, Python, etc) and how higher level frameworks have taken advantage of Titus's scheduling capabilities.
All the fundamental concepts and tools for understanding performance tuning in Java. Garbage collection, memory management and collector types and tools for profiling Java applications.
In any Cloud Native architecture there’s a seemingly endless stream of events that happen at each layer. These events can be used to detect abnormal activity and possible security incidents, as well as providing an audit trail of activity.
In this talk we’ll cover how we extended Falco to ingest events beyond just host system calls, such as Kubernetes audit events or even application level events. We will also show how to create Falco rules to detect behaviors in these new event streams. We show how we implemented Kubernetes audit events in Falco, and how to configure the event stream.
In this training webinar, Samantha Wang will walk you through the basics of Telegraf. Telegraf is the open source server agent which is used to collect metrics from your stacks, sensors and systems. It is InfluxDB’s native data collector that supports nearly 300 inputs and outputs. Learn how to send data from a variety of systems, apps, databases and services in the appropriate format to InfluxDB. Discover tips and tricks on how to write your own plugins. The know-how learned here can be applied to a multitude of use cases and sectors. This one-hour session will include the training and time for live Q&A.
Get Devops Training in Chennai with real-time experts at Besant Technologies, OMR. We believe that learning Devops with practical and theoretical will be the easiest way to understand the technology in quick manner. We designed this Devops from basic level to the latest advanced level
http://www.traininginsholinganallur.in/devops-training-in-chennai.html
The challenge of application distribution - Introduction to Docker (2014 dec ...Sébastien Portebois
Live recording with the demos: https://www.youtube.com/watch?v=0XRcmJEiZOM
Contents
- The application distribution challenge
- The current solutions
- Introduction to Docker, Containers, and the Matrix from Hell
- Why people care: Separation of Concerns
- Technical Discussion
- Ecosystem, momentum
- How to build Docker images
- How to make containers talk to each other, how to handle data persistence
- Demo 1: isolation
- Demo 2: real case - installing Go Math! Academy, tail –f containers, unit tests
Docker is an open platform for developers and system administrators to build, ship and run distributed applications. Using Docker, companies in Jordan have been able to build powerful system architectures that allow speeding up delivery, easing deployment processes and at the same time cutting major hosting costs.
George Khoury shares his experience at Salalem in building flexible and cost effective architectures using Docker and other tools for infrastructure orchestration. The result allows them to easily and quickly move between different cloud providers.
Triangle Devops Meetup covering Netflix open source, cloud architecture, and what Andrew did in his first year working as a senior software engineer in the cloud platform group.
Microservices in action at the Dutch National Police - Bert Jan Schrijver - C...Codemotion
At the Cloud, Big Data and Internet division of the Dutch National Police, 4 DevOps teams use the latest open source technology to build high tech, cloud native web applications using Spring Boot, Angular 5, Spark, Kafka and Jenkins 2. I'll share our experiences and real-world use cases for microservices. I’ll show how 4 teams work together on one product and I’ll talk about how we apply the principles of DevOps and Continuous Delivery. I’ll show how we handle security, build pipelines, test automation, performance tests, service discovery, automated deployments, monitoring and more!
My college ppt on topic Docker. Through this ppt, you will understand the following:- What is a container? What is Docker? Why its important for developers? and many more!
Scripting experts from Inductive Automation cover general best practices that will help you add flexibility and customization to HMI, SCADA, IIoT, and other industrial applications. Some specific tips about using scripting in the Ignition platform will be included as well.
In this webinar, learn more about:
• Common scripting pitfalls and how to avoid them
• The best programming languages to use
• Things to consider before using scripting
• How scripting environments work
• Scripting timesavers
• And more
Time Series Anomaly Detection with Azure and .NETTMarco Parenzan
f you have any device or source that generates values over time (also a log from a service), you want to determine if in a time frame, the time serie is correct or you can detect some anomalies. What can you do as a developer (not a Data Scientist) with .NET o Azure? Let's see how in this session.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
How do we work with customers on Big Data / ML / Analytics Projects using Scr...GetInData
How do we work with our customers ? How does it look? What do the meetings look like ? How do we structure the cooperation? Who does what and when ?
We receive these kinds of questions quite often. They are very important questions as the customer should know the details before we start the project and it’s important for GetInData to be transparent on this so the client is well informed.
During the webinar our Project Lead, Rafał Zalewski talked about Scrum Framework we use in cooperation with our customers.
Watch here:
https://www.youtube.com/watch?v=uOWrgcaKwWo&t=32s
Speaker: Rafał Zalewski, GetInData: https://www.linkedin.com/in/rafalzalewski/
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
Data-Driven Fast Track: Introduction to data-drivenness with Piotr MenclewiczGetInData
Watch video here: https://youtu.be/sfowpU90zFM
Piotr's presentation about GetInData’s Data-Driven Fast Track, the 3-step framework for data transformation.
You will learn:
➡ How to assess how data-driven your company is
➡ How to generate ideas for new initiatives to push your company towards better decisions
➡ How to think about implementing these initiatives to increase your chances of success
If you miss it live don't despair. Watch the video and feel free to diagnose your company by filling out the survey prepared by our team here: https://bit.ly/3fKcRrb! After completing the survey, you will receive a tailored summary report with insights from one of our experts.
Below you'll find links to all the materials mentioned in the workshop needed for exercises.
LINKS TO MATERIALS ABOUT DATA-DRIVEN:
Data-driven fast-track: 3 steps to make your company more data-driven: https://getindata.com/blog/data-drive...
Is my company data-driven? Here’s how you can find out: https://getindata.com/blog/is-my-comp...
If you:
➡ have questions about webinar topic,
➡ want to talk about your data-driven transformation,
➡ want to become more data-driven and you need consultations,
don't hesitate to write to us: hello@getindata.com
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
If you want to stay up to date, subscribe to our newsletter here: https://bit.ly/3tiw1I8
Presentation from the performance given by Piotr Chaberski and Adrian Dembek the Data Science Summit ML Edition.
Authors: Piotr Chaberski, Adrian Dembek
Linkedin: https://www.linkedin.com/in/piotrchaberski/
https://www.linkedin.com/in/adriandembek/
___
Company:
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
How to become good Developer in Scrum Team? GetInData
Speaker:
Rafał Zalewski, GetInData: https://www.linkedin.com/in/rafalzalewski/
Abstract:
To become good Developer in Scrum Team you need to understand not only Scrum Events but also Scrum fundaments like Scrum Pillars and Scrum Values. In this presentation you will learn and understand the mindset expectation from you as Developer in Scrum Team. You will also learn how Scrum mindset helps to achieve better development results.
____
Company:
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
OpenLineage & Airflow - data lineage has never been easierGetInData
If you want to stay up to date, subscribe to our newsletter here: https://bit.ly/3tiw1I8
Presentation from the performance given by Paweł during the Airflow Summit 2022.
Author: Paweł Leszczyński
Linkedin: https://www.linkedin.com/in/pawel-leszczynski/
___
Company:
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
Did you like it? Check out our blog to stay up to date: https://getindata.com/blog
Building your own platform is often ostracized these days. Everyone is encouraged to reuse existing solutions for known reasons. But using a ready-made platform / tool should not be a mindless process. Reusability is an art. During this presentation, you will learn why we decided to build our own MLOps platform while not re-inventing the wheel by using ready-made components with a touch of custom components. What are the benefits of this, but also what limitations and hurdles we have encountered. We hope that our experience will help you make the right decisions in your projects. Sometimes, maybe more risky ones.
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInDataGetInData
If you want to stay up to date, subscribe to our newsletter here: https://bit.ly/3tiw1I8
Presentation from the performance given by Mariusz during the Data Science Summit ML Edition.
Author: Mariusz Strzelecki
Linkedin: https://www.linkedin.com/in/mariusz-strzelecki/
___
Company:
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...GetInData
Did you like it? Check out our blog to stay up to date: https://getindata.com/blog
This workshop focuses on creating a data streaming platform from scratch using an empty Kubernetes (or even Minikube) cluster. During the workshop, we go through the installation process, deploy the basic components for the platform, start Apache Flink, and monitor the process, using SQL to query available data.
Author: Albert Lewandowski
Linkedin: https://www.linkedin.com/in/albert-lewandowski/
___
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
MLOps implemented - how we combine the cloud & open-source to boost data scie...GetInData
Check out more about this presentation here: https://www.youtube.com/watch?v=nSsssYHiylQ&t=17s
Presentation from the performance given by our team during the NSML Summit.
Authors: Krzysztof Zarzycki, Marek Wiewiórka
Linkedin: https://www.linkedin.com/in/kzarzycki/
https://www.linkedin.com/in/marekwiewiorka/
___
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
Read more here: https://getindata.com/blog/machine-learning-features-discovery-feast-amundsen
Author: Mariusz Strzelecki
Linkedin: https://www.linkedin.com/in/mariusz-strzelecki/
___
Company:
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
Kubernetes and real-time analytics - how to connect these two worlds with Apa...GetInData
Did you like it? Check out our blog to stay up to date: https://getindata.com/blog
More and more services are running in Kubernetes so it means that we can migrate our current data pipelines to the new environment. In case of Flink we have multiple ways to do real-time data streaming: use Lyft or GCP operator, go with official deployment and customize it or choose the Ververica Platform or create something on your own. The presentation shows how to choose the right solution for technical requirements and business needs to run Flink in Kubernetes at great scale with no issues.
Author: Albert Lewandowski
Linkedin: https://www.linkedin.com/in/albert-lewandowski/
___
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
Big data trends - Krzysztof Zarzycki, GetInDataGetInData
If you want to stay up to date, subscribe to our newsletter here: https://bit.ly/3tiw1I8
Get more info here: https://getindata.com/blog/6-big-data-trends-2021-bigdata-blog/
Author: Krzysztof Zarzycki
Linkedin: https://www.linkedin.com/in/kzarzycki/
___
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...GetInData
Did you like it? Check out our blog to stay up to date: https://getindata.com/blog
The talk is focused on administration, development and monitoring platform with Apache Spark, Apache Flink and Kubeflow in which the monitoring stack is based on Prometheus stack.
Author: Albert Lewandowski
Linkedin: https://www.linkedin.com/in/albert-lewandowski/
___
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...GetInData
Check out more about this presentation here: https://www.youtube.com/watch?v=eqNToHn4yB0
The webinar was organized by GetinData on 2020. During the webinar we explaned what does it mean to build a data-driven company.
Watch more here: https://www.youtube.com/watch?v=eqNToHn4yB0
Speaker: Rafał Małanij
___
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataGetInData
Did you like it? Check out our blog to stay up to date: https://getindata.com/blog
The webinar was organized by GetinData on 2020. During the webinar we explaned the concept of monitoring and observability with focus on data analytics platforms.
Watch more here: https://www.youtube.com/watch?v=qSOlEN5XBQc
Whitepaper - Monitoring ang Observability for Data Platform: https://getindata.com/blog/white-paper-big-data-monitoring-observability-data-platform/
Speaker: Albert Lewandowski
Linkedin: https://www.linkedin.com/in/albert-lewandowski/
___
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
Complex event processing platform handling millions of users - Krzysztof Zarz...GetInData
If you want to learn more about it, check out our webinar here: https://www.youtube.com/watch?v=EfGPY_NyYQ8&t=77s
The webinar was organized by GetinData on 2020. During the webinar, we shared our lessons learnt from building and running stream processing platform in production for over 2 years.
Watch more here: https://www.youtube.com/watch?v=EfGPY_NyYQ8
Author: Krzysztof Zarzycki
Linkedin: https://www.linkedin.com/in/kzarzycki/
___
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
Predicting Startup Market Trends based on the news and social media - Albert ...GetInData
Did you like it? Check out our blog to stay up to date: https://getindata.com/blog
Nowadays, one tweet can have impact on the value of the company or cryptocurrency. It becomes important for companies to be able to know everything what's happening in the market, especially for startups or when entering the new market. The presentation is about presenting the complex platform used for creating and verifying the strategy for a startup from the Wellbeing market. We go through web scraping-based data ingestion to ElasticSearch, NLP pipelines to understand what people write and what is the possible future of each market predicted by PySpark job.
Author: Albert Lewandowski
Linkedin: https://www.linkedin.com/in/albert-lewandowski/
___
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
Managing Big Data projects in a constantly changing environment - Rafał Zalew...GetInData
Watch our full performance given by our team during the Big Data Technology Warsaw Summit: https://www.youtube.com/watch?v=CBrq7z8ikaM
The nature of Big Data projects are nowadays one of its kind - they are not like the data warehousing initiatives in the old days, nor like cloud native applications projects, at least not yet. Variety of technologies, complicated architectures and rapidly changing landscape are just a few challenges that the IT Department is facing in such projects. When you add the number of stakeholders from different departments involved and that Big Data project is sometimes more like an R&D with unpredictable outcome, this makes a mix where the objectives can be easily lost. It is not a surprise that up to 85% of Big Data projects were pure failures (Gartner 2016).
In this talk we will share our experience in planning and executing Big Data initiatives in the organisations, with some use cases and good practices in mind
Watch our webinar here: https://www.youtube.com/watch?v=CBrq7z8ikaM
Speakers:
Rafał Małanij
Rafał Zalewski
Linkedin: https://www.linkedin.com/in/rafalzalewski/
___
Company:
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...GetInData
Did you like it? Check out our blog to stay up to date: https://getindata.com/blog
Currently there are more and more created videos distributed via multiple social media channels. It becomes more and more important to monitor all of them by companies to verify their customers' feedback, reviews, opinions. During the talk, we talk about extracting text from videos, analyzing language and prepare robust, scalable infrastructure for it. The idea behind platform is about having the mix between managed and self-managed service for Big Data processing. The keynote shows the case study of the MVP of the platform for marketing companies.
Author: Albert Lewandowski
Linkedin: https://www.linkedin.com/in/albert-lewandowski/
___
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
2. • Big Data DevOps Engineer in Getindata
• Smart City Consultant in Almine
• Editor in Antyweb
• Focused on infrastructure, cloud, Internet of
Things and Big Data
Who am I?
4. Infrastructure as Code
• Code everything because it means readable
infrastructure for everyone
• Do not rely on local copies of script, someone
has invented code repositories and it was a
great day for IT stuff
• Cloud-ready scripts – all public cloud vendors
provide their tools for deploying
infrastructure
5. Infrastructure as Code in
practice
• Amazon Web Services
• CloudFormation
• Google Cloud Platform
• Deployment Manager
• Microsoft Azure
• Azure Templates
• Azure DevOps
Or use one tool for all environment, like
Terraform
7. Documentation
• Add README if you won’t be
disliked by your team.
• You will forget about
important things faster than
you think.
• Add description and comments
to your tasks and merge
requests. Make work easier
not harder.
8. Monitoring
• Scraping metrics is done everywhere.
• Getting information about CPU utilisation, disk space usage,
amount of free RAM, etc.
• Multiple available tools so which one should you choose?
• Do not forget about learning about new tools.
• Understand which metrics are valuable and provide information
about status of data pipelines or data ingestion
9. Prometheus’ stories
• Use service discovery, it’s great
• Discover where Flink JM and TM expose their metrics.
• How to provide HA?
• Think of using long-term storage like Thanos or M3
or Cortex
• Do you need archived data?
• Monitor Prometheus even if it’s a monitoring
tool.
11. How to monitor and visualize
metrics?
All available services for monitoring visualisation
are quite similar.
• Think about it like a part of the complex solution
that has to be compatible with metrics exporters
and log exporters.
• Think of security, ease-of-use for operations team
and adding any own modules that may be required.
• Simpler visualisation = more readable (often, not
always).
• Understand value of metrics.
12. Don’t forget about alerts
Alerts signify that a human needs to take action
immediately in response to something that is
either happening or about to happen, in order to
improve the situation.
Do not overuse alerts, some issues should be
fixed by automation scripts.
14. Exercise: One to rull them all
• Demo from Alerta’s team: https://try.alerta.io/
git clone https://github.com/alerta/docker-alerta
docker-compose up
15. Log analytics
• Discover what is inside your log files.
• Useful for operations team and for developers to understand what
happened with their applications.
• Read log files in the dedicated tool, not use less or tail when you have
several machines to check.
• You can take wise actions based on the log content when you see
what is going on with your services.
18. Make it simple with Loki
Like Prometheus, but for logs!
• Write simple queries with LogQL that is similar
to PromQL
Examples:
• {instance=~"kafka-[23]",name="kafka"} !=
kafka.server:type=ReplicaManager
• topk(10,sum(rate({region="us-east1"}[5m])) by (name))
• Ingest log files with Promtail or Fluentd or
Fluentbit
• Relabeling log files if needed
• Designed for clusters in Kubernetes
19. Our experience with Loki
• Two environments: development and production stages.
• Migration from ELK stack that didn’t provide enough good
performance and had issues with scraping log files.
• Loki in Grafana: metrics and logs can be verified in one tool.
• Stable solution that provides all metrics and enables counting
interesting values from jobs’ logs
22. Exercise: Glance at Loki
helm repo add loki https://grafana.github.io/loki/charts
helm repo update
helm upgrade --install loki --namespace=loki-stack loki/loki-stack
helm install stable/grafana -n loki-grafana
kubectl get secret --namespace <YOUR-NAMESPACE> loki-grafana -o
jsonpath="{.data.admin-password}" | base64 --decode ; echo
kubectl port-forward --namespace <YOUR-NAMESPACE> service/loki-
grafana 3000:80
Go to: http://localhost:3000
Add Loki as data source.
23. SRE & DevOps
If a human operator needs to touch your system during normal
operations, you have a bug. The definition of normal changes as your
systems grow.
Carla Geisser, Google SRE
24. DevOps vs. SRE
DevOps SRE
Reduce organization silos Share ownership with developers by
using the same tools and
techniques across the stack
Accept failure as normal Have a formula for balancing
accidents and failures against new
releases
Implement gradual change Encourage moving quickly by
reducing costs of failure
Leverage tooling & automation Encourages "automating this year's
job away" and minimizing manual
systems work to focus on efforts
that bring long-term value to the
system
Measure everything Believes that operations is a
software problem, and defines
25. CICD pipelines
Besides black art, there is only automation and mechanization.
Federico García Lorca (1898–1936), Spanish poet and playwright
Source: AWS
26. Improve, commit, test, deploy
• Define which applications or jobs can be
deployed automatically to the production
environment.
• Test everything.
• Remember about CI tools that are really useful.
• Teach others how they should use automation
tools.
• Discuss, improve, make
27. Automate and drink coffee
• Automate boring stuff.
• Make Everything as Code.
• Run tested Ansible playbooks
and forget about manual
changes.
• More well thought automation,
less problems.
28. Useful tools
• Ansible
• Jenkins
• Rundeck
• Automate all ops
tasks
• Run Ansible from
one place where
you can set up
variables easily.
• It supports LDAP.
29. Test your Ansible
• Use Molecule.
• Molecule provides support for testing with multiple instances,
operating systems and distributions, virtualization providers, test
frameworks and testing scenarios.
• Use Tox.
• Tox is a generic virtualenv management, and test command line
tool. Tox can be used in conjunction with Factors and Molecule, to
perform scenario tests.
• Use them in your CI tool.
30. Molecule tests
Have installed working Docker, installed molecule
and tox (by pip).
Scenarios – they are a test suite for your newly created role.
What is inside scenario directory?
• Dockerfile.j2
• INSTALL.rst – it contains instructions on what
additional software or setup steps.
• molecule.yml – here you add dependencies, etc.
• playbook.yml – it’ll be invoked by Molecule.
• Tests – here you can add specific tests.