This document discusses using Terraform to manage cloud infrastructure as code. Terraform allows infrastructure to be defined using declarative configuration files that can be treated as code and versioned. It uses a provider model to interact with different cloud APIs to deploy and manage resources. Key features discussed include idempotency, the Terraform graph, modules for abstraction, variables, and linking dependent resources.
Low latency scalable web crawling on Apache StormJulien Nioche
In this talk I will introduce Storm-Crawler https://github.com/DigitalPebble/storm-crawler, a collection of resources for building low-latency, large scale web crawlers on Apache Storm. We will compare with similar projects like Apache Nutch and present several use cases where the storm-crawler is being used. In particular we will see how the Storm-crawler can be used with ElasticSearch and Kibana for crawling and indexing web pages.
Low latency scalable web crawling on Apache StormJulien Nioche
In this talk I will introduce Storm-Crawler https://github.com/DigitalPebble/storm-crawler, a collection of resources for building low-latency, large scale web crawlers on Apache Storm. We will compare with similar projects like Apache Nutch and present several use cases where the storm-crawler is being used. In particular we will see how the Storm-crawler can be used with ElasticSearch and Kibana for crawling and indexing web pages.
adaptTo() 2014 - Integrating Open Source Search with CQ/AEMtherealgaston
A presentation by Gaston Gonzalez at adaptTo() 2014 describing several approaches for integrating Apache Solr with AEM. It starts with an introduction to various pull and push indexing strategies (e.g., Sling Eventing, content publishing and web crawling). The topic of content ingestion is followed by an approach for delivering rapid search front-end experiences using AEM Solr Search.
Kubernetes is fast becoming the operating system for the Cloud and brings a ubiquity that has the potential for massive benefits for technology organizations. Applications/Microservices are moved to orchestration tools like Kubernetes to leverage features like horizontal autoscaling, fault tolerance, CICD, and more. Apache Solr is an open-source search engine platform built on an Apache Lucene library. It offers Apache Lucene's search capabilities in a user-friendly way. Lucidworks runs over a thousand distributed-mode Apache Solr Clusters spread across several machines for a plethora of use-cases around Search and Analytics. The traffic demands a massive scale which creates scenarios of in-depth micro-management like operating systems upgrade, scaling cluster dynamically, etc, affecting the overall search experience. This talk is focussed on the journey taken by Lucidworks on addressing scaling clusters horizontally and vertically, on the basis of query traffic load, data ingestion throughput or any other relevant metrics by extending capabilities of Kubernetes and Apache Solr to achieve true physical and logical autoscaling, satisfying modern era SLAs and infrastructure cost. The talk concludes with how the solution adopted opens up the future scope of fine-grained scaling of search clusters.
Leveraging the Power of Solr with SparkQAware GmbH
Lucene Revolution 2016, Boston: Talk by Johannes Weigend (@JohannesWeigend, CTO at QAware).
Abstract: Solr is a distributed NoSQL database with impressive search capabilities. Spark is the new megastar in the distributed computing universe. In this code-intense session we show you how to combine both to solve real-time search and processing problems. We will show you how to set up a Solr/Spark combination from scratch and develop first jobs with runs distributed on shared Solr data. We will also show you how to use this combination for your next-generation BI platform.
Leveraging Hadoop in Heterogeneous environments - I will share our experience in leveraging the power of Hadoop to reach multiple business goals. The talk will also focus on the tools that help in addressing concerns related to polyglot architectures such as interoperability, multi-tenancy, schema evolution and standardization. I will also talk about some frameworks and packages that help in codifying best patterns and practices in integrating Hadoop with other systems such as traditional Business Intelligence systems, Web Analytics and other distributed computing technologies like Apache Spark
(Bill Bejeck, Confluent) Kafka Summit SF 2018
Apache Kafka added a powerful stream processing library in mid-2016, Kafka Streams, which runs on top of Apache Kafka. The community has embraced Kafka Streams with many early adopters, and the adoption rate continues to grow. Large to mid-size organizations have come to rely on Kafka Streams in their production environments. Kafka Streams has many advanced features to make applications more robust.
The point of this presentation is to show users of Kafka Streams some of the latest and greatest features, as well as some that may be advanced, that can make streams applications more resilient. The target audience for this talk are those users already comfortable writing Kafka Streams applications and want to go from writing their first proof-of-concept applications to writing robust applications that can withstand the rigor that running in a production environment demands.
The talk will be a technical deep dive covering topics like:
-Best practices on configuring a Kafka Streams application
-How to meet production SLAs by minimizing failover and recovery times: configuring standby tasks and the pros and cons of having standby replicas for local state
-How to improve resiliency and 24×7 operability: the use of different configurable error handlers, callbacks and how they can be used to see what’s going on inside the application
-How to achieve efficient scalability: a thorough review of the relationship between the number of instances, threads and state stores and how they relate to each other
While this is a technical deep dive, the talk will also present sample code so that attendees can view the concepts discussed in practice. Attendees of this talk will walk away with a deeper understanding of how Kafka Streams works, and how to make their Kafka Streams applications more robust and efficient. There will be a mix of discussion.
Big data lambda architecture - Streaming Layer Hands Onhkbhadraa
This presentation describes Hands on guide BIG Data Streaming Pipeline AWS Cloud Platform using Apache Kafka, Apache Hadoop, Apache Spark and Apache Cassandra.
Keeping Spark on Track: Productionizing Spark for ETLDatabricks
ETL is the first phase when building a big data processing platform. Data is available from various sources and formats, and transforming the data into a compact binary format (Parquet, ORC, etc.) allows Apache Spark to process it in the most efficient manner. This talk will discuss common issues and best practices for speeding up your ETL workflows, handling dirty data, and debugging tips for identifying errors.
Speakers: Kyle Pistor & Miklos Christine
This talk was originally presented at Spark Summit East 2017.
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandFrançois Garillot
Swisscom is the leading mobile-service provider in Switzerland, with a market share high enough to enable us to model and understand the collective mobility in every area of the country. To accomplish that, we built an urban planning tool that helps cities better manage their infrastructure based on data-based insights, produced with Apache Spark, YARN, Kafka and a good dose of machine learning. In this talk, we will explain how building such a tool involves mining a massive amount of raw data (1.5E9 records/day) to extract fine-grained mobility features from raw network traces. These features are obtained using different machine learning algorithms. For example, we built an algorithm that segments a trajectory into mobile and static periods and trained classifiers that enable us to distinguish between different means of transport. As we sketch the different algorithmic components, we will present our approach to continuously run and test them, which involves complex pipelines managed with Oozie and fuelled with ground truth data. Finally, we will delve into the streaming part of our analytics and see how network events allow Swisscom to understand the characteristics of the flow of people on roads and paths of interest. This requires making a link between network coverage information and geographical positioning in the space of milliseconds and using Spark streaming with libraries that were originally designed for batch processing. We will conclude on the advantages and pitfalls of Spark involved in running this kind of pipeline on a multi-tenant cluster. Audiences should come back from this talk with an overall picture of the use of Apache Spark and related components of its ecosystem in the field of trajectory mining.
From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and...Landoop Ltd
Presentation on "Big Data and Kafka, Kafka-Connect and the modern days of stream processing" For @Argos - @Accenture Development Technology Conference - London Science Museum (IMAX)
Terraform is an Infrastructure Automation tools. This can work equally good for on-premises, public cloud, private cloud, hybrid-cloud and multi-cloud infrastructure.
Visit us for more at www.zekeLabs.com
Natural Language Query and Conversational Interface to Apache SparkDatabricks
Apache Spark has been a great technology for processing and analyzing Big Data. However, it is not accessible to business users, who don’t have technical or programming skills. In this talk, I’ll talk about recent efforts in the space of “Conversational analytics”. This paradigm allows any user to ask text and voice questions, in natural language, of their data to a bot and receive back a natural language and visual result. A key technology is natural language to SQL translation, where we translate natural language queries from a user into Spark SQL queries that can go against a Databricks system, and that can be easily trained on different schemas and databases.
This NLP technology needs to be further combined with dialog management, natural-language generation/narration, data understanding and modeling, augmented analytics and automated visualization generation in order to achieve the goal of “Conversational Analytics”. Using such a technology, a user can ask, in plain English, “How many cases of Covid were there in the last 2 months in states that had no social distancing mandates by type of transmission”, and then dig deeper into the results in a conversational manner to uncover hidden insights from Covid datasets in a Spark instance. We believe that having access to such data and insights at their fingertips can help users make appropriate decisions quickly, improve data literacy and even overcome the scourge of fake news for the general public.
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN Flink Forward
http://flink-forward.org/kb_sessions/multi-tenant-flink-as-a-service-on-yarn/
Since June 2016, Flink-as-a-service has been available to researchers and companies in Sweden from the Swedish ICT SICS Data Center at www.hops.site using the HopsWorks platform. Flink applications can be either deployed as jobs (batch or streaming) or written and run directly from Apache Zeppelin on YARN. Flink applications are run within a project on a YARN cluster with the novel property that Flink applications are metered and charged to projects. Projects are also securely isolated from each other and include support for project-specific Kafka topics that are protected from access by users that are not members of the project. Hopsworks is entirely UI-driven, is open-source, and Flink applications that include Kafka topics can be created in a few mouse clicks. In this talk we will discuss the challenges in building a metered version of Flink-as-a-Service for YARN, experiences with Flink-on-YARN, and some of the possibilities that Hopsworks opens up for building secure, multi-ten
APIs are must nowadays. We'll see how API Platform can help us bringing functional api platforms into production quickly. We will identify the key concepts of the framework, we will understand how to instruct it according to our needs and how it naturally integrates into the Symfony ecosystem.
Cloud Native Night, April 2018, Mainz: Workshop led by Jörg Schad (@joerg_schad, Technical Community Lead / Developer at Mesosphere)
Join our Meetup: https://www.meetup.com/de-DE/Cloud-Native-Night/
PLEASE NOTE:
During this workshop, Jörg showed many demos and the audience could participate on their laptops. Unfortunately, we can't provide these demos. Nevertheless, Jörg's slides give a deep dive into the topic.
DETAILS ABOUT THE WORKSHOP:
Kubernetes has been one of the topics in 2017 and will probably remain so in 2018. In this hands-on technical workshop you will learn how best to deploy, operate and scale Kubernetes clusters from one to hundreds of nodes using DC/OS. You will learn how to integrate and run Kubernetes alongside traditional applications and fast data services of your choice (e.g. Apache Cassandra, Apache Kafka, Apache Spark, TensorFlow and more) on any infrastructure.
This workshop best suits operators focussed on keeping their apps and services up and running in production and developers focussed on quickly delivering internal and customer facing apps into production.
You will learn how to:
- Introduction to Kubernetes and DC/OS (including the differences between both)
- Deploy Kubernetes on DC/OS in a secure, highly available, and fault-tolerant manner
- Solve operational challenges of running a large/multiple Kubernetes cluster
- One-click deploy big data stateful and stateless services alongside a Kubernetes cluster
adaptTo() 2014 - Integrating Open Source Search with CQ/AEMtherealgaston
A presentation by Gaston Gonzalez at adaptTo() 2014 describing several approaches for integrating Apache Solr with AEM. It starts with an introduction to various pull and push indexing strategies (e.g., Sling Eventing, content publishing and web crawling). The topic of content ingestion is followed by an approach for delivering rapid search front-end experiences using AEM Solr Search.
Kubernetes is fast becoming the operating system for the Cloud and brings a ubiquity that has the potential for massive benefits for technology organizations. Applications/Microservices are moved to orchestration tools like Kubernetes to leverage features like horizontal autoscaling, fault tolerance, CICD, and more. Apache Solr is an open-source search engine platform built on an Apache Lucene library. It offers Apache Lucene's search capabilities in a user-friendly way. Lucidworks runs over a thousand distributed-mode Apache Solr Clusters spread across several machines for a plethora of use-cases around Search and Analytics. The traffic demands a massive scale which creates scenarios of in-depth micro-management like operating systems upgrade, scaling cluster dynamically, etc, affecting the overall search experience. This talk is focussed on the journey taken by Lucidworks on addressing scaling clusters horizontally and vertically, on the basis of query traffic load, data ingestion throughput or any other relevant metrics by extending capabilities of Kubernetes and Apache Solr to achieve true physical and logical autoscaling, satisfying modern era SLAs and infrastructure cost. The talk concludes with how the solution adopted opens up the future scope of fine-grained scaling of search clusters.
Leveraging the Power of Solr with SparkQAware GmbH
Lucene Revolution 2016, Boston: Talk by Johannes Weigend (@JohannesWeigend, CTO at QAware).
Abstract: Solr is a distributed NoSQL database with impressive search capabilities. Spark is the new megastar in the distributed computing universe. In this code-intense session we show you how to combine both to solve real-time search and processing problems. We will show you how to set up a Solr/Spark combination from scratch and develop first jobs with runs distributed on shared Solr data. We will also show you how to use this combination for your next-generation BI platform.
Leveraging Hadoop in Heterogeneous environments - I will share our experience in leveraging the power of Hadoop to reach multiple business goals. The talk will also focus on the tools that help in addressing concerns related to polyglot architectures such as interoperability, multi-tenancy, schema evolution and standardization. I will also talk about some frameworks and packages that help in codifying best patterns and practices in integrating Hadoop with other systems such as traditional Business Intelligence systems, Web Analytics and other distributed computing technologies like Apache Spark
(Bill Bejeck, Confluent) Kafka Summit SF 2018
Apache Kafka added a powerful stream processing library in mid-2016, Kafka Streams, which runs on top of Apache Kafka. The community has embraced Kafka Streams with many early adopters, and the adoption rate continues to grow. Large to mid-size organizations have come to rely on Kafka Streams in their production environments. Kafka Streams has many advanced features to make applications more robust.
The point of this presentation is to show users of Kafka Streams some of the latest and greatest features, as well as some that may be advanced, that can make streams applications more resilient. The target audience for this talk are those users already comfortable writing Kafka Streams applications and want to go from writing their first proof-of-concept applications to writing robust applications that can withstand the rigor that running in a production environment demands.
The talk will be a technical deep dive covering topics like:
-Best practices on configuring a Kafka Streams application
-How to meet production SLAs by minimizing failover and recovery times: configuring standby tasks and the pros and cons of having standby replicas for local state
-How to improve resiliency and 24×7 operability: the use of different configurable error handlers, callbacks and how they can be used to see what’s going on inside the application
-How to achieve efficient scalability: a thorough review of the relationship between the number of instances, threads and state stores and how they relate to each other
While this is a technical deep dive, the talk will also present sample code so that attendees can view the concepts discussed in practice. Attendees of this talk will walk away with a deeper understanding of how Kafka Streams works, and how to make their Kafka Streams applications more robust and efficient. There will be a mix of discussion.
Big data lambda architecture - Streaming Layer Hands Onhkbhadraa
This presentation describes Hands on guide BIG Data Streaming Pipeline AWS Cloud Platform using Apache Kafka, Apache Hadoop, Apache Spark and Apache Cassandra.
Keeping Spark on Track: Productionizing Spark for ETLDatabricks
ETL is the first phase when building a big data processing platform. Data is available from various sources and formats, and transforming the data into a compact binary format (Parquet, ORC, etc.) allows Apache Spark to process it in the most efficient manner. This talk will discuss common issues and best practices for speeding up your ETL workflows, handling dirty data, and debugging tips for identifying errors.
Speakers: Kyle Pistor & Miklos Christine
This talk was originally presented at Spark Summit East 2017.
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandFrançois Garillot
Swisscom is the leading mobile-service provider in Switzerland, with a market share high enough to enable us to model and understand the collective mobility in every area of the country. To accomplish that, we built an urban planning tool that helps cities better manage their infrastructure based on data-based insights, produced with Apache Spark, YARN, Kafka and a good dose of machine learning. In this talk, we will explain how building such a tool involves mining a massive amount of raw data (1.5E9 records/day) to extract fine-grained mobility features from raw network traces. These features are obtained using different machine learning algorithms. For example, we built an algorithm that segments a trajectory into mobile and static periods and trained classifiers that enable us to distinguish between different means of transport. As we sketch the different algorithmic components, we will present our approach to continuously run and test them, which involves complex pipelines managed with Oozie and fuelled with ground truth data. Finally, we will delve into the streaming part of our analytics and see how network events allow Swisscom to understand the characteristics of the flow of people on roads and paths of interest. This requires making a link between network coverage information and geographical positioning in the space of milliseconds and using Spark streaming with libraries that were originally designed for batch processing. We will conclude on the advantages and pitfalls of Spark involved in running this kind of pipeline on a multi-tenant cluster. Audiences should come back from this talk with an overall picture of the use of Apache Spark and related components of its ecosystem in the field of trajectory mining.
From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and...Landoop Ltd
Presentation on "Big Data and Kafka, Kafka-Connect and the modern days of stream processing" For @Argos - @Accenture Development Technology Conference - London Science Museum (IMAX)
Terraform is an Infrastructure Automation tools. This can work equally good for on-premises, public cloud, private cloud, hybrid-cloud and multi-cloud infrastructure.
Visit us for more at www.zekeLabs.com
Natural Language Query and Conversational Interface to Apache SparkDatabricks
Apache Spark has been a great technology for processing and analyzing Big Data. However, it is not accessible to business users, who don’t have technical or programming skills. In this talk, I’ll talk about recent efforts in the space of “Conversational analytics”. This paradigm allows any user to ask text and voice questions, in natural language, of their data to a bot and receive back a natural language and visual result. A key technology is natural language to SQL translation, where we translate natural language queries from a user into Spark SQL queries that can go against a Databricks system, and that can be easily trained on different schemas and databases.
This NLP technology needs to be further combined with dialog management, natural-language generation/narration, data understanding and modeling, augmented analytics and automated visualization generation in order to achieve the goal of “Conversational Analytics”. Using such a technology, a user can ask, in plain English, “How many cases of Covid were there in the last 2 months in states that had no social distancing mandates by type of transmission”, and then dig deeper into the results in a conversational manner to uncover hidden insights from Covid datasets in a Spark instance. We believe that having access to such data and insights at their fingertips can help users make appropriate decisions quickly, improve data literacy and even overcome the scourge of fake news for the general public.
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN Flink Forward
http://flink-forward.org/kb_sessions/multi-tenant-flink-as-a-service-on-yarn/
Since June 2016, Flink-as-a-service has been available to researchers and companies in Sweden from the Swedish ICT SICS Data Center at www.hops.site using the HopsWorks platform. Flink applications can be either deployed as jobs (batch or streaming) or written and run directly from Apache Zeppelin on YARN. Flink applications are run within a project on a YARN cluster with the novel property that Flink applications are metered and charged to projects. Projects are also securely isolated from each other and include support for project-specific Kafka topics that are protected from access by users that are not members of the project. Hopsworks is entirely UI-driven, is open-source, and Flink applications that include Kafka topics can be created in a few mouse clicks. In this talk we will discuss the challenges in building a metered version of Flink-as-a-Service for YARN, experiences with Flink-on-YARN, and some of the possibilities that Hopsworks opens up for building secure, multi-ten
APIs are must nowadays. We'll see how API Platform can help us bringing functional api platforms into production quickly. We will identify the key concepts of the framework, we will understand how to instruct it according to our needs and how it naturally integrates into the Symfony ecosystem.
Cloud Native Night, April 2018, Mainz: Workshop led by Jörg Schad (@joerg_schad, Technical Community Lead / Developer at Mesosphere)
Join our Meetup: https://www.meetup.com/de-DE/Cloud-Native-Night/
PLEASE NOTE:
During this workshop, Jörg showed many demos and the audience could participate on their laptops. Unfortunately, we can't provide these demos. Nevertheless, Jörg's slides give a deep dive into the topic.
DETAILS ABOUT THE WORKSHOP:
Kubernetes has been one of the topics in 2017 and will probably remain so in 2018. In this hands-on technical workshop you will learn how best to deploy, operate and scale Kubernetes clusters from one to hundreds of nodes using DC/OS. You will learn how to integrate and run Kubernetes alongside traditional applications and fast data services of your choice (e.g. Apache Cassandra, Apache Kafka, Apache Spark, TensorFlow and more) on any infrastructure.
This workshop best suits operators focussed on keeping their apps and services up and running in production and developers focussed on quickly delivering internal and customer facing apps into production.
You will learn how to:
- Introduction to Kubernetes and DC/OS (including the differences between both)
- Deploy Kubernetes on DC/OS in a secure, highly available, and fault-tolerant manner
- Solve operational challenges of running a large/multiple Kubernetes cluster
- One-click deploy big data stateful and stateless services alongside a Kubernetes cluster
Session talk presented at Innosoft 2022.11.11 University of Sevilla.
Presented the concept of Infrastructure as Core and its practical approach using Hashicorp Terraform a a tool to provision in the cloud. Examples with AWS are provided in a Guthub repository.
MongoDB World 2019: Terraform New Worlds on MongoDB Atlas MongoDB
MongoDB Atlas, MongoDB's database as a service platform, has made it faster and easier than ever to use MongoDB and as teams find their Atlas "flow" they smartly want to automate it to increase developer velocity. Many are creating this kind of automation with HashiCorp's Terraform so let's bring these two great platforms together! We'll look at the resources provided by the Atlas API and then I'll show how to automate a flow securely with a Terraform Provider for Atlas. We will end by covering how MongoDB is making this experience even better going forward.
Sebastien Thomas, System Architect at Coyote Amerique, gave a presentation on operator frameworks. His talk covered how Operator SDK can be used to create Kubernetes Operators with Go.
https://www.youtube.com/watch?v=IeweKUdHJc4
My presentation from Hashiconf 2017, discussing our use of Terraform, and our techniques
to help make it safe and accessible.
Tear It Down, Build It Back Up: Empowering Developers with Amazon CloudFormationJames Andrew Vaughn
As a product grows, and the infrastructure becomes more complex, the Operations team traditionally shoulders the burden of maintaining this infrastructure while deploying code from Software Engineers. Code is sometimes given to Operations with little to no information regarding how it should run or what the criteria for successful deployment is. This is not due to lack of caring, Software Engineers often lack the context themselves to provide production deployment instructions. To Software Engineers, production can be like a walled off city, filled with pathways and rooms not to be explored, guarded by Operations.
This presentation aims to provide a solution to this problem. We will address how the traditional separation of Operations and Software Engineers slows innovation, and redefine their relationship -- blending responsibilities. We will examine the transition of two real teams, an Operations team and Engineering team, from complete isolation, to closer environments through virtual machines, to one cloud environment shared by all and managed with CloudFormation.
Lessons Learnt from Running Thousands of On-demand Spark ApplicationsItai Yaffe
Ada Sharoni (Software Engineering Architect) @ Hunters:
Imagine you had to manage thousands of Spark applications that are automatically spinning up on-demand upon every customer interaction.
Our unique constraints in Hunters have led us to adopt an architecture and concepts that we believe many other companies will find useful.
In this lecture we will share our solutions and insights in running many lightweight, cheap Spark applications on Kubernetes, that can easily survive frequent restarts and smartly share resources on Spot EC2 instances.
Building and deploying LLM applications with Apache AirflowKaxil Naik
Behind the growing interest in Generate AI and LLM-based enterprise applications lies an expanded set of requirements for data integrations and ML orchestration. Enterprises want to use proprietary data to power LLM-based applications that create new business value, but they face challenges in moving beyond experimentation. The pipelines that power these models need to run reliably at scale, bringing together data from many sources and reacting continuously to changing conditions.
This talk focuses on the design patterns for using Apache Airflow to support LLM applications created using private enterprise data. We’ll go through a real-world example of what this looks like, as well as a proposal to improve Airflow and to add additional Airflow Providers to make it easier to interact with LLMs such as the ones from OpenAI (such as GPT4) and the ones on HuggingFace, while working with both structured and unstructured data.
In short, this shows how these Airflow patterns enable reliable, traceable, and scalable LLM applications within the enterprise.
https://airflowsummit.org/sessions/2023/keynote-llm/
Speaker: Jacob Aae Mikkelsen
Once you have successfully developped your application in Grails, Ratpack or your other favorite framework, you would like to see it deployed as fast and painless as possible, right?
This talk will cover some of the supporting cast members of a succesful modern infrastructure, that developers can understand and use efficiently, and with good DevOps practices.
Key elements are
Docker
Infrastructure as Code
Container Orchestration
The demo-goods will hopefully be on our side, as this talk includes quite some live demos!
MLOps pipelines using MLFlow - From training to productionFabian Hadiji
This talk was given at the Cologne AI and Machine Learning Meetup on April 13, 2023 (https://www.meetup.com/de-DE/cologne-ai-and-machine-learning-meetup/events/291513393/) by Dr. Andreas Weiden, Co-Lead Cloud / Data Engineering at skillbyte: MLOps pipelines using MLFlow - From training to production
In this talk we explore the world of MLOps pipelines and how MLFlow can be used to facilitate workflows for getting your machine learning models from training to production. We will briefly delve into the tracking aspects of MLFlow and how to store experiments and runs. Next, we will move on to an actual use case that involves managing artefacts generated by multiple training pipelines running on a daily schedule. These artefacts are used in prediction services but also in managed vector search engines such as ElasticSearch and Google VertexAI. A simple microservice that polls the MLFlow registry is used to update both REST-APIs running in Kubernetes and to ingest the models into the vector search services. Finally, we will compare different alternatives that were considered.
OroCRM Partner Technical Training: September 2015Oro Inc.
OroCRM Partner Technical Training
September 2015
Schedule:
Day 1 - Monday 9/14
Define your Entities
--Environment and Project Setup
--Packages Management
--Entities and DB Schema Management
--Entity CRUD Implementation
Day 2 - Tuesday 9/15
Security and Productivity
--ACL
--Entity Activities
--System Configuration
Day 3 - Wednesday 9/16
User Interface
--Layouts and Templates
--CSS and JavaScript
--Widgets
--Navigation
--Localizations
Day 4 - Thursday 9/17
Integrate your Solution
--Job Queue
--Import and Export
--Integrations
--Automated Processes
--WEB API
Day 5 - Friday 9/18
Work with Data
--Workflow
--Reports
--Analytics and Marketing
--Tests
Similar to Infrastructure as Code with Terraform (20)
ER(Entity Relationship) Diagram for online shopping - TAEHimani415946
https://bit.ly/3KACoyV
The ER diagram for the project is the foundation for the building of the database of the project. The properties, datatypes, and attributes are defined by the ER diagram.
1.Wireless Communication System_Wireless communication is a broad term that i...JeyaPerumal1
Wireless communication involves the transmission of information over a distance without the help of wires, cables or any other forms of electrical conductors.
Wireless communication is a broad term that incorporates all procedures and forms of connecting and communicating between two or more devices using a wireless signal through wireless communication technologies and devices.
Features of Wireless Communication
The evolution of wireless technology has brought many advancements with its effective features.
The transmitted distance can be anywhere between a few meters (for example, a television's remote control) and thousands of kilometers (for example, radio communication).
Wireless communication can be used for cellular telephony, wireless access to the internet, wireless home networking, and so on.
This 7-second Brain Wave Ritual Attracts Money To You.!nirahealhty
Discover the power of a simple 7-second brain wave ritual that can attract wealth and abundance into your life. By tapping into specific brain frequencies, this technique helps you manifest financial success effortlessly. Ready to transform your financial future? Try this powerful ritual and start attracting money today!
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesSanjeev Rampal
Talk presented at Kubernetes Community Day, New York, May 2024.
Technical summary of Multi-Cluster Kubernetes Networking architectures with focus on 4 key topics.
1) Key patterns for Multi-cluster architectures
2) Architectural comparison of several OSS/ CNCF projects to address these patterns
3) Evolution trends for the APIs of these projects
4) Some design recommendations & guidelines for adopting/ deploying these solutions.
9. “codifiesAPIs into declarative configuration
files that can be shared amongst team
members, treated as code, edited, reviewed,
and versioned.”
terraform.io
13. Topology: Hashicorp Configuration Language files
“The goal of HCL is to build a
structured configuration
language that is both human
and machine friendly for use
with command-line tools, but
specifically targeted towards
DevOps tools, servers, etc.”
terraform.io
14. Resource
- Unitary element deployed through Provider API
resource "aws_instance" "web" {
ami = "ami-12345"
instance_type = "t2.micro"
...
tags {
Name = "HelloWorld"
}
}
Resource type
Resource name
Parameters
18. Terraform apply and idempotency
terraform-cli
Provider
TfState1. Get resources Ids existing in tfstate
2. Get Data from Provider from the ids in
the tfsate
3. Generate graph from Code and 2.
What needs to be created / Modified /
Deleted ?
4. Apply
29. If the resource was created before Terraform you can
reference it !
30. Datasource
# topology.tf
data "aws_vpc" "my_vpc" {
tags {
Name = "My VPC"
}
}
resource "aws_subnet" "example" {
vpc_id = "${data.aws_vpc.my_vpc.id}"
availability_zone = "us-west-2a"
cidr_block =
"${cidrsubnet(data.aws_vpc.selected.cidr_block, 4, 1)}"
}
- Number of
Datasources types
depends on provider
- Multiple fields can be
available
- Datasources are refresh
on each apply
38. Abstraction with modules
Topology:
Application A
Topology
Application B
Module: Application
Parameters:
- ami
- LoadBalancer Name
Resources:
- Instance
- Attach Instance to
LoadBalancer
Output:
- Instance ID
43. A resource is never alone
VMSecurity Group
VM needs Security
Group Id as
Parameter
Deployment
timelapse
1) 2)
44. Linked resources
resource "aws_security_group" "allow_all" {
name = "allow_all"
ingress {
from_port = 0
to_port = 65535
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
}
resource "aws_instance" "web" {
ami = "ami-12345"
instance_type = "t2.micro"
vpc_security_group_ids = [“${aws_security_group.allow_all.id}”]
tags {
Name = "HelloWorld"
}
}
Referenced resource
A resource has output
values like :
● Id
● DNS name
● Ip
● ...
45. Variable # topology.tf
variable "ubuntu_ami" {
default = "ami_123456"
type = "string"
}
resource "aws_instance" "web" {
ami = "${var.ubuntu_ami}"
instance_type = "t2.micro"
tags {
Name = "HelloWorld"
}
}
Variable can be set
through
- Default value
- tfvars file
- Environment variable
- Command Line option
# terraform.tfvars
ubuntu_ami = "ami_23456"
46. HCL Language features
- Count
- Condition through ternary operation
- Functions around Map / List / String
- CIDR Range manipulation
- Math functions
- ...
47. 1. tfstate output
Topology A
Tfstate A Tfstate B
Topology B
Get output from Tfstate A
to inject in resources