Máy học (Machine learning) đang trở thành một trong những xu hướng lớn nhất trong phát triển hệ thống hiện đại, với khả năng đem đến những hiểu biết chiến lược, các dự đoán & cái nhìn chuyên sâu cho doanh nghiệp. Tuy nhiên, xây dựng & tích hợp 1 hệ thống máy học không phải lúc nào cũng dễ dàng, đặc biệt với những hệ thống lớn & hệ thống phân tán - khi mà các khuôn phép về phát triển máy học còn chưa đạt đến độ phát triển bằng hệ thống phần mềm.
Trong buổi thảo luận này, chúng ta sẽ cùng tìm hiểu cách Amazon Web Services (AWS) đã thiết kế & xây dựng 1 trong những nền tảng MLOps được ứng dụng rộng rãi nhất trên thế giới - Amazon SageMaker.
- Về diễn giả: My Nguyễn hiện là Kiến trúc sư giải pháp tại AWS Việt Nam, chuyên sâu vào hỗ trợ các giải pháp xây dựng hệ thống Máy học.
Introdution to Dataops and AIOps (or MLOps)Adrien Blind
This presentation introduces the audience to the DataOps and AIOps practices. It deals with organizational & tech aspects, and provide hints to start you data journey.
MLOps with serverless architectures (October 2018)Julien SIMON
Talk @ AWS Loft Stockholm, 23/10/2018
But why?
A quick recap on Amazon SageMaker
A quick recap on serverless architectures
Open Source tools: AWS Chalice, Serverless Framework
Demos
Resources
MLOps Bridging the gap between Data Scientists and Ops.Knoldus Inc.
Through this session we're going to introduce the MLOps lifecycle and discuss the hidden loopholes that can affect the MLProject. Then we are going to discuss the ML Model lifecycle and discuss the problem with training. We're going to introduce the MLFlow Tracking module in order to track the experiments.
Deploying and managing machine learning models at scale introduces new complexities. Fortunately, there are tools that simplify this process. In this talk we walk you through an end-to-end hands on example showing how you can go from research to production without much complexity by leveraging the Seldon Core and MLflow frameworks. We will train a set of ML models, and we will showcase a simple way to deploy them to a Kubernetes cluster through sophisticated deployment methods, including canary deployments, shadow deployments and we’ll touch upon richer ML graphs such as explainer deployments.
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
Looking to build a robust machine learning infrastructure to streamline MLOps? Learn from Provectus experts how to ensure the success of your MLOps initiative by implementing Data QA components in your ML infrastructure.
For most organizations, the development of multiple machine learning models, their deployment and maintenance in production are relatively new tasks. Join Provectus as we explain how to build an end-to-end infrastructure for machine learning, with a focus on data quality and metadata management, to standardize and streamline machine learning life cycle management (MLOps).
Agenda
- Data Quality and why it matters
- Challenges and solutions of Data Testing
- Challenges and solutions of Model Testing
- MLOps pipelines and why they matter
- How to expand validation pipelines for Data Quality
The catalyst for the success of automobiles came not through the invention of the car but rather through the establishment of an innovative assembly line. History shows us that the ability to mass produce and distribute a product is the key to driving adoption of any innovation, and machine learning is no different. MLOps is the assembly line of Machine Learning and in this presentation we will discuss the core capabilities your organization should be focused on to implement a successful MLOps system.
Introdution to Dataops and AIOps (or MLOps)Adrien Blind
This presentation introduces the audience to the DataOps and AIOps practices. It deals with organizational & tech aspects, and provide hints to start you data journey.
MLOps with serverless architectures (October 2018)Julien SIMON
Talk @ AWS Loft Stockholm, 23/10/2018
But why?
A quick recap on Amazon SageMaker
A quick recap on serverless architectures
Open Source tools: AWS Chalice, Serverless Framework
Demos
Resources
MLOps Bridging the gap between Data Scientists and Ops.Knoldus Inc.
Through this session we're going to introduce the MLOps lifecycle and discuss the hidden loopholes that can affect the MLProject. Then we are going to discuss the ML Model lifecycle and discuss the problem with training. We're going to introduce the MLFlow Tracking module in order to track the experiments.
Deploying and managing machine learning models at scale introduces new complexities. Fortunately, there are tools that simplify this process. In this talk we walk you through an end-to-end hands on example showing how you can go from research to production without much complexity by leveraging the Seldon Core and MLflow frameworks. We will train a set of ML models, and we will showcase a simple way to deploy them to a Kubernetes cluster through sophisticated deployment methods, including canary deployments, shadow deployments and we’ll touch upon richer ML graphs such as explainer deployments.
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
Looking to build a robust machine learning infrastructure to streamline MLOps? Learn from Provectus experts how to ensure the success of your MLOps initiative by implementing Data QA components in your ML infrastructure.
For most organizations, the development of multiple machine learning models, their deployment and maintenance in production are relatively new tasks. Join Provectus as we explain how to build an end-to-end infrastructure for machine learning, with a focus on data quality and metadata management, to standardize and streamline machine learning life cycle management (MLOps).
Agenda
- Data Quality and why it matters
- Challenges and solutions of Data Testing
- Challenges and solutions of Model Testing
- MLOps pipelines and why they matter
- How to expand validation pipelines for Data Quality
The catalyst for the success of automobiles came not through the invention of the car but rather through the establishment of an innovative assembly line. History shows us that the ability to mass produce and distribute a product is the key to driving adoption of any innovation, and machine learning is no different. MLOps is the assembly line of Machine Learning and in this presentation we will discuss the core capabilities your organization should be focused on to implement a successful MLOps system.
Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and PrometheusManasi Vartak
These are slides from Manasi Vartak's Strata Talk in March 2020 on Robust MLOps with Open-Source.
* Introduction to talk
* What is MLOps?
* Building an MLOps Pipeline
* Real-world Simulations
* Let’s fix the pipeline
* Wrap-up
Wonder what this data mesh stuff is all about? What are the principles of data mesh? Can you or should you consider data mesh as the approach for your analytics platform? And most important - how can Snowflake help?
Given in Montreal on 14-Dec-2021
Grokking Techtalk #42: Engineering challenges on building data platform for M...Grokking VN
Đến với Techtalk #42, các bạn sẽ được chia sẻ về cách thiết kế và hiện thực một platform phục vụ các bài toán về machine learning thông qua một case study về việc phân tích các bình luận của người dùng.
Nội dung chủ đề lần này sẽ xoay quanh một số thách thức trong quá trình xây dựng bao gồm các khó khăn về mặt kỹ thuật và phân tích khi:
+ Cần phải thu thập lượng lớn bình luận của người dùng
+ Tổ chức lưu trữ và xử lý dữ liệu để dễ dàng mở rộng, thuận tiện cho việc giám sát, vận hành
+ Thiết kế các thành phần trong hệ thống đảm báo tính tái sử dụng cao, tránh lãng phí tài nguyên
Ngôn ngữ: Tiếng Việt
---
Speakers:
- Anh Hiền Hoàng - Principal Big Data Engineer & TPP
- Anh Hiếu Hoàng - Data Scientist & TPP
DataOps: An Agile Method for Data-Driven OrganizationsEllen Friedman
DataOps expands DevOps philosophy to include data-heavy roles (data engineering & data science). DataOps uses better cross-functional collaboration for flexibility, fast time to value and an agile workflow for data-intensive applications including machine learning pipelines. (Strata Data San Jose March 2018)
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
MLflow is an MLOps tool that enables data scientist to quickly productionize their Machine Learning projects. To achieve this, MLFlow has four major components which are Tracking, Projects, Models, and Registry. MLflow lets you train, reuse, and deploy models with any library and package them into reproducible steps. MLflow is designed to work with any machine learning library and require minimal changes to integrate into an existing codebase. In this session, we will cover the common pain points of machine learning developers such as tracking experiments, reproducibility, deployment tool and model versioning. Ready to get your hands dirty by doing quick ML project using mlflow and release to production to understand the ML-Ops lifecycle.
What’s New with Databricks Machine LearningDatabricks
In this session, the Databricks product team provides a deeper dive into the machine learning announcements. Join us for a detailed demo that gives you insights into the latest innovations that simplify the ML lifecycle — from preparing data, discovering features, and training and managing models in production.
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...Ed Fernandez
Adoption of ML at scale in the Enterprise, Machine Learning Platforms & AutoML
[1] Definitions & Context
• Machine Learning Platforms, Definitions
• ML models & apps as first class assets in the Enterprise
• Workflow of an ML application
• ML Algorithms, overview
• Architecture of a ML platform
• Update on the Hype cycle for ML & predictive apps
[2] Adopting ML at Scale
• The Problem with Machine Learning - Scaling ML in the
Enterprise
• Technical Debt in ML systems
• How many models are too many models
• The need for ML platforms
[3] The Market for ML Platforms
• ML platform Market References - from early adopters to
mainstream
• Custom Build vs Buy: ROI & Technical Debt
• ML Platforms - Vendor Landscape
[4] Custom Built ML Platforms
• ML platform Market References - a closer look
Facebook - FBlearner
Uber - Michelangelo
AirBnB - BigHead
• ML Platformization Going Mainstream: The Great Enterprise Pivot
[5] From DevOps to MLOps
• DevOps <> ModelOps
• The ML platform driven Organization
• Leadership & Accountability (labour division)
[6] Automated ML - AutoML
• Scaling ML - Rapid Prototyping & AutoML:
• Definition, Rationale
• Vendor Comparison
• AutoML - OptiML: Use Cases
[7] Future Evolution for ML Platforms
Appendix I: Practical Recommendations for ML onboarding in the Enterprise
Appendix II: List of References & Additional Resources
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
Enterprise Architecture (EA) provides a visual blueprint of the organization, and shows key interrelationships between data, process, applications, and more. By abstracting these assets in a graphical view, it’s possible to see key interrelationships, particularly as they relate to data and its business impact across the organization. Join us for a discussion on how data architecture is a key component of an overall enterprise architecture for enhanced business value and success.
Presentation on Data Mesh: The paradigm shift is a new type of eco-system architecture, which is a shift left towards a modern distributed architecture in which it allows domain-specific data and views “data-as-a-product,” enabling each domain to handle its own data pipelines.
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Tristan Baker
Past, present and future of data mesh at Intuit. This deck describes a vision and strategy for improving data worker productivity through a Data Mesh approach to organizing data and holding data producers accountable. Delivered at the inaugural Data Mesh Leaning meetup on 5/13/2021.
Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and PrometheusManasi Vartak
These are slides from Manasi Vartak's Strata Talk in March 2020 on Robust MLOps with Open-Source.
* Introduction to talk
* What is MLOps?
* Building an MLOps Pipeline
* Real-world Simulations
* Let’s fix the pipeline
* Wrap-up
Wonder what this data mesh stuff is all about? What are the principles of data mesh? Can you or should you consider data mesh as the approach for your analytics platform? And most important - how can Snowflake help?
Given in Montreal on 14-Dec-2021
Grokking Techtalk #42: Engineering challenges on building data platform for M...Grokking VN
Đến với Techtalk #42, các bạn sẽ được chia sẻ về cách thiết kế và hiện thực một platform phục vụ các bài toán về machine learning thông qua một case study về việc phân tích các bình luận của người dùng.
Nội dung chủ đề lần này sẽ xoay quanh một số thách thức trong quá trình xây dựng bao gồm các khó khăn về mặt kỹ thuật và phân tích khi:
+ Cần phải thu thập lượng lớn bình luận của người dùng
+ Tổ chức lưu trữ và xử lý dữ liệu để dễ dàng mở rộng, thuận tiện cho việc giám sát, vận hành
+ Thiết kế các thành phần trong hệ thống đảm báo tính tái sử dụng cao, tránh lãng phí tài nguyên
Ngôn ngữ: Tiếng Việt
---
Speakers:
- Anh Hiền Hoàng - Principal Big Data Engineer & TPP
- Anh Hiếu Hoàng - Data Scientist & TPP
DataOps: An Agile Method for Data-Driven OrganizationsEllen Friedman
DataOps expands DevOps philosophy to include data-heavy roles (data engineering & data science). DataOps uses better cross-functional collaboration for flexibility, fast time to value and an agile workflow for data-intensive applications including machine learning pipelines. (Strata Data San Jose March 2018)
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
MLflow is an MLOps tool that enables data scientist to quickly productionize their Machine Learning projects. To achieve this, MLFlow has four major components which are Tracking, Projects, Models, and Registry. MLflow lets you train, reuse, and deploy models with any library and package them into reproducible steps. MLflow is designed to work with any machine learning library and require minimal changes to integrate into an existing codebase. In this session, we will cover the common pain points of machine learning developers such as tracking experiments, reproducibility, deployment tool and model versioning. Ready to get your hands dirty by doing quick ML project using mlflow and release to production to understand the ML-Ops lifecycle.
What’s New with Databricks Machine LearningDatabricks
In this session, the Databricks product team provides a deeper dive into the machine learning announcements. Join us for a detailed demo that gives you insights into the latest innovations that simplify the ML lifecycle — from preparing data, discovering features, and training and managing models in production.
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...Ed Fernandez
Adoption of ML at scale in the Enterprise, Machine Learning Platforms & AutoML
[1] Definitions & Context
• Machine Learning Platforms, Definitions
• ML models & apps as first class assets in the Enterprise
• Workflow of an ML application
• ML Algorithms, overview
• Architecture of a ML platform
• Update on the Hype cycle for ML & predictive apps
[2] Adopting ML at Scale
• The Problem with Machine Learning - Scaling ML in the
Enterprise
• Technical Debt in ML systems
• How many models are too many models
• The need for ML platforms
[3] The Market for ML Platforms
• ML platform Market References - from early adopters to
mainstream
• Custom Build vs Buy: ROI & Technical Debt
• ML Platforms - Vendor Landscape
[4] Custom Built ML Platforms
• ML platform Market References - a closer look
Facebook - FBlearner
Uber - Michelangelo
AirBnB - BigHead
• ML Platformization Going Mainstream: The Great Enterprise Pivot
[5] From DevOps to MLOps
• DevOps <> ModelOps
• The ML platform driven Organization
• Leadership & Accountability (labour division)
[6] Automated ML - AutoML
• Scaling ML - Rapid Prototyping & AutoML:
• Definition, Rationale
• Vendor Comparison
• AutoML - OptiML: Use Cases
[7] Future Evolution for ML Platforms
Appendix I: Practical Recommendations for ML onboarding in the Enterprise
Appendix II: List of References & Additional Resources
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
Enterprise Architecture (EA) provides a visual blueprint of the organization, and shows key interrelationships between data, process, applications, and more. By abstracting these assets in a graphical view, it’s possible to see key interrelationships, particularly as they relate to data and its business impact across the organization. Join us for a discussion on how data architecture is a key component of an overall enterprise architecture for enhanced business value and success.
Presentation on Data Mesh: The paradigm shift is a new type of eco-system architecture, which is a shift left towards a modern distributed architecture in which it allows domain-specific data and views “data-as-a-product,” enabling each domain to handle its own data pipelines.
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Tristan Baker
Past, present and future of data mesh at Intuit. This deck describes a vision and strategy for improving data worker productivity through a Data Mesh approach to organizing data and holding data producers accountable. Delivered at the inaugural Data Mesh Leaning meetup on 5/13/2021.
WhereML a Serverless ML Powered Location Guessing Twitter BotRandall Hunt
Learn how we designed, built, and deployed the @WhereML Twitter bot that can identify where in the world a picture was taken using only the pixels in the image. We'll dive deep on artificial intelligence and deep learning with the MXNet framework and also talk about working with the Twitter Account Activity API. The bot is entirely autoscaling and powered by Amazon API Gateway and AWS Lambda which means, as a customer, you don't manage any infrastructure. Finally we'll close with a discussion around custom authorizers in API Gateway and when to use them.
Integrate Machine Learning into Your Spring Application in Less than an HourVMware Tanzu
SpringOne 2020
Integrate Machine Learning into Your Spring Application in Less than an Hour
Hermann Burgmeier, Senior Software Engineer at Amazon
Qing Lan, Software Developement Engineer at AWS
Mikhail Shapirov, Senior Partner Solutions at Amazon Web Services, Inc
Vaibhav Goel, Sr. Software Development Engineer at Amazon
Modern Applications Development on AWSBoaz Ziniman
Modern Application Development, using Microservices and Serverless, allow you to build and run simpler and more efficient applications, while improving your agility and saving a lot of money.
The ability to deploy your applications without the need for provisioning or managing servers opens new opportunities to build web, mobile, and IoT backends; run stream processing or big data workloads; run chatbots, and more, without the investment in hardware or professional manpower to run this hardware.
In this session, we will learn how to get started with Microservices and Serverless computing with AWS Lambda, which lets you run code without provisioning or managing servers.
Supercharge your Machine Learning Solutions with Amazon SageMakerAmazon Web Services
Amazon SageMaker is a fully-managed service that enables data scientists and developers to quickly and easily build, train, and deploy machine learning models, at scale. This session will introduce you the features of Amazon SageMaker, including a one-click training environment, highly-optimized machine learning algorithms with built-in model tuning, and deployment without engineering effort. With zero-setup required, Amazon SageMaker significantly decreases your training time and overall cost of building production machine learning systems. You'll also hear how and why Intuit is using Amazon SageMaker on AWS for real-time fraud detection.
Build Modern Applications that Align with Twelve-Factor Methods (API303) - AW...Amazon Web Services
Twelve-Factor designs improve component reuse and resilience for developers building large-scale software-as-a-service (SaaS) applications. In recent years, the Twelve-Factor guidelines have become a source of best practices for both developers and operations engineers, regardless of the application’s use case and at nearly any scale. In this workshop, create a modern app to see how the Twelve-Factor Application guidelines align with serverless best practices. Learn how to address those Twelve-Factor guidelines that don’t directly align with serverless architectures or are interpreted differently, and practice by implementing examples using AWS Lambda, AWS Step Functions, Amazon API Gateway, and the AWS Code services. Bring a laptop (Windows/OSX/Linux all supported). Tablets are not appropriate. We also recommend installing the current version of Chrome or Firefox.
Serverless AI with Scikit-Learn (GPSWS405) - AWS re:Invent 2018Amazon Web Services
Take advantage of serverless technologies for artificial intelligence (AI) by making a prediction on the fly. There is no model hosting and no servers to maintain. In this session, we show how to train a model in scikit-learn, an open source machine learning library for Python. Then we load and call the trained model from an AWS Lambda function, and finally we demonstrate how to load the library and send the data for prediction.
How can you accelerate the delivery of new, high-quality services? How can you be able to experiment and get feedback quickly from your customers? To get the most out of the agility afforded by serverless and containers, it is essential to build CI/CD pipelines that help teams iterate on code and quickly release features. In this talk, we demonstrate how developers can build effective CI/CD release workflows to manage their serverless or containerized deployments on AWS. We cover infrastructure-as-code (IaC) application models, such as AWS Serverless Application Model (AWS SAM) and new imperative IaC tools. We also demonstrate how to set up CI/CD release pipelines with AWS CodePipeline and AWS CodeBuild, and we show you how to automate safer deployments with AWS CodeDeploy.
Mainframe Modernization with AWS: Patterns and Best PracticesAmazon Web Services
In this webinar, learn common mainframe migration patterns and best practices for a successful migration to AWS. Hear experiences and lessons learned based on real-world customer modernization projects to AWS.
Driving Innovation with Serverless Applications (GPSBUS212) - AWS re:Invent 2018Amazon Web Services
Serverless computing enables you to build and run applications without the need to provision, manage servers, or worry about the availability or scalability of your solutions. With serverless computing, you can build web, mobile, and IoT backends, run stream processing or big data workloads, run chatbots, and more. In this session, learn how to get started with serverless computing with AWS Lambda, Amazon API Gateway, Amazon DynamoDB, and more.
How can you accelerate the delivery of new, high-quality services? How can you be able to experiment and get feedback quickly from your customers? To get the most out of the agility afforded by serverless and containers, it is essential to build CI/CD pipelines that help teams iterate on code and quickly release features. In this talk, we demonstrate how developers can build effective CI/CD release workflows to manage their serverless or containerized deployments on AWS. We cover infrastructure-as-code (IaC) application models, such as AWS Serverless Application Model (AWS SAM) and new imperative IaC tools. We also demonstrate how to set up CI/CD release pipelines with AWS CodePipeline and AWS CodeBuild, and we show you how to automate safer deployments with AWS CodeDeploy.
[AWS Innovate 온라인 컨퍼런스] Kubernetes와 SageMaker를 활용하여 Machine Learning 워크로드 관리하...Amazon Web Services Korea
발표자료 다시보기: https://youtu.be/6sogVHw9jZ4
Machine Learning 워크로드를 실제 운영환경에서 사용하기 위하여 다양한 툴들과 방법들이 시도되고 있습니다. 본 세션에서는 ML 운영을 위해 어떤 툴들이 활용되고 있는지를 살펴보고, 그 중 엔터프라이즈 환경에서 많이 선택하고 았는 Kubernetes와 Kubeflow를 사용하여, 어떻게 Machine Learning 전처리와 Training 작업을 관리하고 운영환경에 배포할 수 있는지를 데모와 함께 알아봅니다.
How_to_build_your_cloud_enablement_engine_with_the_people_you_already_haveAmazon Web Services
One of the biggest misconceptions we hear from IT leaders is the belief that not having the right people on staff stops you from moving faster, saving money, and expanding your business on the cloud. You already have the people you need to succeed in the cloud, and these highly skilled, experience and dedicated employees have the ability to learn AWS cloud skills and become certified experts. Transforming your talent has a profound impact on workforce productivity and satisfaction, and in this session we will walk through best practices and AWS capabilities to help you along the way.
Similar to Grokking Techtalk #40: AWS’s philosophy on designing MLOps platform (20)
Grokking Techtalk #46: Lessons from years hacking and defending Vietnamese banksGrokking VN
Trong những năm gần đây, Việt Nam luôn là một trong những quốc gia có tỉ lệ nhiễm mã độc và hứng chịu các cuộc tấn công mạng thuộc nhóm cao trên thế giới. Bên cạnh đó, mức độ sử dụng máy tính và các thiết bị thông minh tại Việt Nam tăng đột biến do ảnh hưởng của COVID-19, và đây cũng chính là môi trường lý tưởng để virus bùng phát, lây lan mạnh. Điều nay làm dấy lên mối lo ngại về an ninh trên không gian mạng, một vấn đề mà ít người Việt quan tâm đến nhưng lại có tầm quan trọng cao và sức ảnh hưởng lớn.
Chính vì lí do đó, ở số Techtalk #46 này, Grokking Việt Nam xin giới thiệu với các bạn chủ đề “Những bài học về xâm nhập và bảo vệ hệ thống mạng Việt Nam” do anh Dương Ngọc Thái trình bày. Anh Thái hiện đang làm việc tại Google, anh thường được biết đến thông qua blog cá nhân vnhacker@blogspot.
"Từ năm 2016, cùng với vài người bạn, tôi đã xâm nhập vào hệ thống mạng máy tính của nhiều ngân hàng, bệnh viện, startup ở Việt Nam (với sự đồng ý của họ). Đối với các ngân hàng, chúng tôi đã có thể đánh cắp được lượng tiền lớn và nhiều dữ liệu nhạy cảm. Đối với các bệnh viện, chúng tôi đã có thể đánh cắp toàn bộ dữ liệu khách hàng và thậm chí có thể thay đổi hồ sơ bệnh án.
Trong bài nói chuyện này, tôi chia sẻ những gì chúng tôi đã học được, cung cấp thông tin về hiện trạng an ninh mạng ở Việt Nam. Tôi cũng đưa ra một cẩm nang giúp các doanh nghiệp và tổ chức bảo vệ tài sản và dữ liệu, tạo ra những sản phẩm được khách hàng tin tưởng." - Anh Thái chia sẻ về mục đích của bài talk.
Grokking Techtalk #45: First Principles ThinkingGrokking VN
Bạn có từng nghe ai đó nói về First Principles Thinking? Nó là gì và engineers chúng ta có thể sử dụng như thế nào cho công việc của mình?
---
First Principles Thinking là một trong những phương pháp mà chúng ta có thể vận dụng để phân chia những vấn đề phức tạp thành những vấn đề nhỏ và cơ bản hơn có thể giải quyết được, cuối cùng tổng hợp lại thành một giải pháp có thể giải quyết được vấn đề phức tạp ban đầu.
Nối tiếp về chủ đề Problem Solving, trong Techtalk lần này, Grokking Vietnam cùng Gambaru sẽ mang đến cho các bạn thêm một góc nhìn về tư duy giải quyết vấn đề. Chúng ta sẽ cùng gặp gỡ anh Hùng Đoàn - exFacebook và hiện đang là Software Engineer tại Coda và cùng nhau thảo luận sâu hơn về chủ đề First Principles Thinking này nhé.
Nội dung bài talk:
* Analogy thinking
* Breaking a problem space down to its building blocks
* Techniques to arrive at first principles thinking
* Application in Programming
---
Ngôn ngữ: Tiếng Việt
---
Speaker:
- Hùng Đoàn - Software Engineer @ Coda.io, Ex-Facebook SWE
Anh Hùng có nhiều năm kinh nghiệm trong các lĩnh vực thuộc software engineering. Anh từng thi quốc gia tin học quốc tế và đoạt huy chương vào 2007
Đối với các hệ thống thương mại điện tử, việc tích hợp với một cổng thanh toán trực tuyến (payment gateway) sẽ là yêu cầu cơ bản nhất, dịch vụ thanh toán này ngoài việc cần phải chính xác, chúng còn phải mang lại trải nghiệm tốt cho người sử dụng, xử lý được những sự cố có thể xảy ra trong quá trình thực hiện và đặc biệt là phải bảo mật. Đây là một bài toán khó về mặt kỹ thuật để có thể thiết kế và xây dựng một cách hiệu quả!
Trong Techtalk #43 này, các bạn tham gia sẽ được chia sẻ về những thành phần của một payment gateway, quá trình xử lý một transaction, cách thức lưu trữ thông tin thanh toán, xử lý hoàn tiền,.. và những vấn đề gặp phải khác khi xây dựng một cổng thanh toán trực tuyến. Chủ đề sẽ đi qua các nội dung sau:
- Payment Domain Knowledge
- Payment Gateway Integration
+ Create Order
+ Check Order Amount (Optional)
+ Browser Redirect
+ Instant Payment Notification (IPN)
+ Payment Query (QueryDR)
- Advance Concept
+ Tokenization
+ Credit Card Authorization/Reversal/Settle
---
Ngôn ngữ: Tiếng Việt
---
Speakers:
- Nguyễn Văn Lợi - Technical Architect @ Vexere
Anh Nguyễn Văn Lợi là một kỹ sư phần mềm với hơn 10 năm kinh nghiệm thực tế từ các công ty có hệ thống lớn trong các mảng VoIP, Ecommerce, Big Data, Logistics. Tại Vexere, anh luôn đề cao tinh thần tự học hỏi, phát triển và chia sẻ để team member liên tục tích lũy kiến thức, kỹ năng, nhằm tăng hiệu quả công việc và mang lại sản phẩm có trải nghiệm tốt nhất cho người dùng
Grokking Techtalk #40: Consistency and Availability tradeoff in database clusterGrokking VN
Những năm gần đây, cùng với sự bùng nổ của các startup cùng các loại công nghệ như máy học, lượng dữ liệu phát sinh cần thu thập và xử lý trong các hệ thống ngày càng tăng cao.
Chính vì vậy, đối với các hệ thống lớn thì việc lưu trữ và xử lý dữ liệu trên một node database đã không đáp ứng được nữa, đòi hỏi phải sử dụng nhiều node kết nối với nhau để hình thành database cluster.
Đối với các database cluster nói riêng và hệ thống Distributed System nói chung, có khá nhiều chủ đề thú vị để đào sâu. Trong buổi thảo luận này, chúng ta sẽ giới hạn trong việc khảo sát về cách ba hệ thống Redis, Elastic Search và Cassandra tổ chức cluster cũng như sự trade-off giữa tính nhất quán (consistency) và khả năng đáp ứng (availability) của ba hệ thống này.
- Speaker: Lộc Võ - Lead Software Engineer @ Grab
Grokking Techtalk #39: Gossip protocol and applicationsGrokking VN
Gossip là một giao thức trao đổi thông tin phổ biến trong các hệ thống phân tán giúp cho các máy chủ duy trì trạng thái đồng nhất với nhau cũng như thực hiện các nhiệm vụ có chủ đích. Điểm mạnh của nó là khả năng phát tán thông tin ở tốc độ cao cũng như không hề có single point of failure. Trong bài talk này, Anh Nguyễn Anh Tú, thành viên của Grokking sẽ chia sẻ một số thông tin về giao thức Gossip cũng như điểm qua một vài ứng dụng thực tiễn của nó.
- Về diễn giả: Anh Nguyễn Anh Tú hiện đang là Staff Software Engineer tại Axon Vietnam, đồng thời là thành viên của Grokking Vietnam.
Grokking Techtalk #39: How to build an event driven architecture with Kafka ...Grokking VN
Bài talk chia sẻ về quá trình 2 năm ứng dụng Kafka và Kafka Connect để chuyển đổi mô hình hệ thống của Vexere từ Monolithic thành Microservice, Event Driven:
+ Event driven architecture là gì?
+ Làm thế nào để xây dựng 1 hệ thống event driven architecture một cách hiệu qủa bằng Kafka và Kafka Connect
+ Các use case hữu ích với Kafka & Kafka Connect
+ Kinh nghiệm thực tế và các bài học rút ra
- Về diễn giả: Anh Nguyễn Văn Lợi là một kỹ sư phần mềm với hơn 9 năm kinh nghiệm thực tế từ các công ty có hệ thống lớn trong các mảng VoIP, Ecommerce, Big Data, Logistics. Tại Vexere, anh luôn đề cao tinh thần tự học hỏi, phát triển và chia sẻ để team member liên tục tích lũy kiến thức, kỹ năng, nhằm tăng hiệu quả công việc và mang lại sản phẩm có trải nghiệm tốt nhất cho người dùng.
Grokking Techtalk #38: Escape Analysis in Go compilerGrokking VN
Trong quá trình phân tích hiệu năng, hiểu và nắm vững ngôn ngữ lập trình cũng như cách thiết kế của nó là rất hữu ích. Go là một trong những ngôn ngữ được sử dụng phổ biến trong các hệ thống phân tán có hiệu năng cao. Để hiểu rõ hơn cách mà Go compiler phân tích cách cấp phát bộ nhớ khi biên dịch chương trình, hãy nghe những chia sẻ của anh Cường về Escape Analysis trong Go compiler.
Về diễn giả:
Anh Lê Mạnh Cường là một kĩ sư phần mềm có 8 năm kinh nghiệm chuyên sâu trong backend và Quản trị hệ thống Linux. Là một OSS contributor tích cực, anh Cường đã có nhiều cống hiến vào cộng đồng mã nguồn mở, đặc biệt là Go và ecosystem của Go.
Grokking Techtalk #37: Data intensive problemGrokking VN
At some point in your software engineer career, you will have to deal with data and your success depends on how big the data that your software can deal with. From a simple problem that requires processing a large amount of data, this talk will present to you how to approach this kind of issue and how to design and choose an efficient solution.
About speaker:
Hồ is Senior Software Engineer at AXON where he helps design and develops complex distributed systems, including image and video encoding, distributed file conversion system. Besides coding, Ho likes to read manga and meet friends in his free time.
Grokking Techtalk #37: Software design and refactoringGrokking VN
Even though software engineering has been around for decades, there is still no clear ways to assess the strengths and weaknesses of software design.
This talk introduces a framework to assess the strength of any specific software design and steps to refactor and improve it. Both object-oriented and functional programming will be discussed as ways to improve the design.
In the talk, the speaker also proposes a software architecture that incorporates all the ideas presented as the conclusion.
About speaker:
Thành currently works at Holistics Software as Co-founder and Chief Engineer architecting the next generation DataOps driven BI platform.
Before joining Holistics as co-founder, Thanh had 8 years of experience as a software engineer and big-data consultant from multiple companies, notably Revolution Analytics which was acquired by Microsoft in 2015.
Thanh graduated from National University of Singapore in 2009 majoring in Computer Engineering with a minor in Technopreneurship.
- Speaker: Servey Bochenkov - Head of Search @ TIKI
Search là một trong những feature quan trọng nhất đối với các website thương mại điện tử giúp khách hàng có thể dễ dàng tìm kiếm được sản phẩm mà mình mong muốn. Nhưng việc xây dựng một hệ thống search chất lượng nhưng vẫn đảm bảo tối ưu performance, resource sử dụng như RAM, CPU là một thách thức không hề nhỏ.
Đến với TechTalk #35 lần này, anh Sergey Bochenkov - với hơn 7 năm làm việc tại Cốc Cốc, hiện đang là Head of Search @ Tiki - sẽ chia sẻ cho chúng ta những ý tưởng cũng như khó khăn khi xây dựng language model dựa trên dữ liệu sản phẩm và search queries của Tiki cùng những dữ liệu khác được crawl từ các website để xây dựng Tiki spellchecker và autocorrection với một số nội dung nổi bật như:
- Quality optimizations idea
- Performance optimizations problems
- Giúp tăng 3-9% lượng mua hàng.
Speaker: Châu Nguyễn Nhật Thanh - Head of MEP @ ZaloPay
Khi phát triển hệ thống dựa trên kiến trúc monolithic, chúng ta thường gặp phải những khó khăn ảnh hưởng đến tốc độ delivery features, scaling những resources như databases,.. và những rủi ro khi thay đổi, nâng cấp sản phẩm.
Microservice là một trong những lựa chọn phổ biến hiện nay để giải quyết những khó khăn trên kiến trúc monolithic khi hệ thống scale phức tạp hơn, cần tốc độ delivery nhanh hơn, dễ dàng lựa chọn, triển khai nhiều technologies khác nhau cùng lúc,...
Nhưng có phải khi triển khai Microservice là chúng ta có thể tránh được những vấn đề trên?
- Chúng ta thường nghe nói đến việc scale API (compute) bằng cách sử dụng microservice dùng docker on k8s, nhưng làm thế nào để scale databases (storage) tránh SPOF?
- Làm thế nào để triển khai microservice trên hệ thống máy vật lý (on-premise) trên hạ tầng sẵn có?
- Làm thế nào để triển khai CI/CD cho hệ thống một cách hiệu quả?
- Làm sao để tracing/debug khi gặp sự cố?
- Và làm thế nào để monitor hệ thống đã triển khai?
Đến với Grokking TechTalk #34, các bạn sẽ được anh Châu Nguyễn Nhật Thanh - Head of MEP @ ZaloPay - chia sẻ về những kinh nghiệm và những vấn đề cũng như đau thương khi sử dụng microservices cho hệ thống ZaloPay Merchant Platform sử dụng Kubernetes on-premise.
Grokking TechTalk #33: High Concurrency Architecture at TIKIGrokking VN
- Speaker: Nguyễn Hoàng Bách - Senior Principal Engineer @ TIKI
Trải qua 9 năm xây dựng và phát triển hệ thống, đội ngũ engineer TIKI lần lượt phải giải quyết từng bài toán kỹ thuật khó khăn để hệ thống phát triển theo kịp tốc độ tăng trưởng của business. Đặc thù của hệ thống Ecommerce có một thách thức lớn là phải đảm bảo tính chính xác của dữ liệu nhưng đồng thời vẫn phải đáp ứng lượng truy cập lớn. Do đó High Concurrency Architecture có vai trò quan trọng trong kiến trúc tổng thể của TIKI. Nó cũng là bước tiến lớn của các kỹ sư TIKI trong 6 tháng qua.
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking VN
- Speaker: Hervé Vũ Roussel - CEO & Co-founder @ QuodAI
- Vài nét về speaker: Hervé Vũ Roussel trước đây đã từng là CTO của một công ty phần mềm ở Silicon Valley Mỹ. Anh đã và đang là advisor và mentor cho nhiều tổ chức như IBM AI XPRIZE, PlatoHQ (YC'16), RMIT, AngelHack, ... Anh cũng là một trong các diễn giả thường xuyên cho chủ đề AI và Software engineer cũng như đã tư vấn cho nhiều trường đại học, công ty về các chương trình đào tạo khoa học máy tính và kỹ sư phần mềm. Hiện tại, Hervé đang là CEO của Quod AI, một nền tảng giúp giải thích source code bằng ngôn ngữ tự nhiên.
Đến với talk lần này anh sẽ chia sẻ kinh nghiệm của mình trong việc thiết kế một kiến trúc chịu tải cao và dễ mở rộng (highly scalable architecture) cho các nền tảng AI bao gồm:
- Những nguyên tắc nền tảng trong xây dựng kiến trúc phần mềm
- Cách lựa chọn công nghệ lưu trữ dữ liệu
- Xây dựng data pipelines bất đồng bộ
Design patterns là một đề tài "khó nuốt" với đa số lập trình viên ở buổi đầu tìm hiểu. Lý do là vì design patterns được xây dựng trên những khái niệm trừu tượng cũng như phải tuân thủ các nguyên tắc lập trình hướng đối tượng.
Đến với TechTalk #32: SOLID & Design Patterns, các bạn sẽ được giới thiệu những nguyên tắc cần phải tuân thủ này và cách thức áp dụng những design patterns quen thuộc vào giải quyết các bài toán một các ngắn gọn, xúc tích và hiệu quả thông qua các ví dụ thực tế.
Speaker: Khôi Nguyễn - Senior Software Engineer @ KMS Technology
Trong talk lần này của Grokking, anh Huy sẽ chia sẻ về điểm hay và tác hại của văn hoá chat ở công sở, và đưa ra thêm những lựa chọn khác phù hợp hơn cho từng trường hợp cụ thể. Đối tượng là dành cho các team khi gặp các vấn đề sau:
1. Bạn cảm thấy bỏ rất nhiều thời gian, nhưng lại không làm được gì nhiều vì luôn bị đồng nghiệp nhờ/hỏi khi có công việc gấp
2. Cuối ngày nhìn lại bạn chả nhớ mình làm được gì quan trọng
3. Bạn dành thời gian trao đổi với team rất hăng hái để đưa ra quyết định, nhưng 3 tháng sau lại quên mất tại sao hồi đó quyết định như vậy..
Bài talk sẽ nói về thói quen giao tiếp bất đồng bộ, thói quen ghi lại những gì mình cần nói & cách xây dựng wiki cho team của mình nhằm mục tiêu hạn chế những ảnh hưởng không tốt của việc sử dụng chat.
Speaker: Huy Nguyen - CTO @ Holistics
Grokking TechTalk #30: From App to Ecosystem: Lessons Learned at ScaleGrokking VN
When we were faced with the challenge of going from one to multiple apps, we had to make significant changes to the way we did frontend development. Learn about the tooling and architecture we use to manage a suite of apps, and how you can apply the same principles to your own frontend.
Speaker: Kristian Randall - Frontend Engineering Manager @ Axon
Grokking TechTalk #29: Building Realtime Metrics Platform at LinkedInGrokking VN
Bài techtalk của anh Khải Trần nói về hệ thống data pipeline của LinkedIn được dùng để thu thập hàng chục tỷ messages mỗi ngày, và cách họ chạy hệ thống real-time processing để thống kê lượng dữ liệu này cho mục đính metrics monitoring.
1 số điểm bài talk sẽ chia sẻ:
- Giới thiệu về hệ thống unified metrics platform của LinkedIn
- Cách LinkedIn setup hệ thống BigData pipeline dùng Kafka, HDFS, Apache Calcite và Apache Samza.
- Khái niệm nearline storage, và cách LinkedIn chuyển từ offline architecture sang nearline architecture.
Speaker: Khai Tran, Staff Software Engineer - LinkedIn.
- Hiện đang là staff software engineer ở LinkedIn, phụ trách hệ thống metrics monitoring system. Trước đây từng làm ở Amazon AWS và Oracle.
- PhD, University of Wisconsin-Madison, nghiên cứu về Database Systems.
Cây nhị phân tìm kiếm là 1 cấu trúc dữ liệu quen thuộc với chúng ta. Có rất nhiều nghiên cứu và các thuật toán xoay quanh cấu trúc dữ liệu này. Trong talk này, xin giới thiệu một kỹ thuật giúp tối ưu cây nhị phân tìm kiếm dựa trên tần suất tìm kiếm, qua đó giúp giảm chi phí tìm kiếm xuống mức thấp nhất.
- Speaker: Phong Vu - Software Engineer
Grokking TechTalk #26: Kotlin, Understand the MagicGrokking VN
- Discuss and understand how Kotlin's core feature works, compare with its ancestor.
Speaker: Ngô Minh Hiền
- Senior Android Developer
- Android Mobile Team lead @ Wizeline
Grokking TechTalk #26: Compare ios and android platformGrokking VN
- It's quite popular these days for one mobile app to be built in both platform iOS and Android. Despites the fact that the hybrid technology is becoming more popular these days, these hybrid technologies are still built based on the core components of each platform, which is why understanding core components of each platform is a required for building mobile app these days.
- In this talk, I will discuss about the similarities and differences between these two platforms in some aspects: Application life cycles, Views animation mechanism, Security, Push Notification mechanism,...
Speaker: Lộc Võ
- Freelance Mobile Developer
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Code versioning controls
Shared environments, IDE – Jupyter Note/Lab
Infrastructure as code
Self-service environment
SaaS
Most importantly: training & processing
Separation of source, environments, etc.
Security
Experiment lifecycles
Pricing
Efficiency
Reproduceability is hard
End-to-end tracability
Dashboard ->
Netflix built metaflow
Lyft build Flyte
Kubeflow
Apache Airflow
Important factor: skill set & enforce
Metaflow
Netflix built metaflow
Netflix is a huge customer of AWS
In production since 2018
Made open source by Netflix & AWS in 2019
What is it?
Basic concepts of metaflow
Deploying to AWS is easy
Flyte
A K8s native distributed workflow orchestrator used at Lyft for:
Data science
Pricing
Fraud detection
Locations
ETA and more
Enables highly concurrent, scalable workflows for ML and data processing
Core concepts of Flyte – task, DAG, workflows, control flow specification.
Actual task can be in any language – tasks executed as containers.
Provisions necessary resources dynamically, executes tasks as docker containers, and de-provisions resources when tasks are complete to control costs.
Supports execution across 100s of machines e.g. production model training
Kubeflow, Airflow are fairly popular
Airflow
Amazon SageMaker with Apache Airflow 1.10.1. If you use Airflow, you can use SageMaker Workflow in Apache Airflow
More details from https://sagemaker.readthedocs.io/en/stable/using_workflow.html
Many customers want to use the fully managed capabilities of Amazon SageMaker for machine learning, but also want platform and infrastructure teams to continue using Kubernetes for orchestration and managing pipelines. SageMaker addresses this requirement by letting Kubernetes users train and deploy models in SageMaker using SageMaker-Kubeflow operations and pipelines. With operators and pipelines, Kubernetes users can access fully managed SageMaker ML tools and engines, natively from Kubeflow. This eliminates the need to manually manage and optimize ML infrastructure in Kubernetes while still preserving control of overall orchestration through Kubernetes. Using SageMaker operators and pipelines for Kubernetes, you can get the benefits of a fully managed service for machine learning in Kubernetes, without migrating workloads.
If you use Kubernetes, you can use SageMaker Operators for Kubernetes
You can install the Sagemaker Operator for Kubernetes using the provided Helm Chart
Once you have this operator installed, K8s users can natively invoke SageMaker features like model training, Hyperparameter Tuning and Batch Transform jobs
They can also setup model serving using SageMaker Model Hosting Services
https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_operators_for_kubernetes.html#what-is-an-operator
https://eksworkshop.com/advanced/420_kubeflow/pipelines/
We see customers build serverless ML workflows using AWS Step Functions
Open source - Step Functions Data Science SDK for SageMaker
Create workflows to pre-process data, train/deploy models using SageMaker
Data pre-processing can be done using AWS Glue
SageMaker functionality like model training, HPO and end point creation is accessible
Use the SDK to create and visualize the workflows
Scale workflows without having to worry about infrastructure
https://aws.amazon.com/about-aws/whats-new/2019/11/introducing-aws-step-functions-data-science-sdk-amazon-sagemaker/
Many good tools exist. You can run any of the tools we saw earlier on AWS.
Remember - Tools are meant to make your life easier
Don’t get fixated on the tools.
Work backwards from the problem you are trying to solve.
So think about your existing s/w engg workflows and tools
Ask yourself, which tools will best augment what you already have
Ask yourself, which tools are your people most comfortable with
AWS approach is use the tools that work for you
Easy to think of SageMaker as Notebook.
The key thing to remember is that the notebook UI we see a lot in the demos is just a part of the SageMaker platform – and an optional part at that!
The notebook is the front-end environment in which we’ll experiment with our data and code.
Keep that instance low-cost resource. Value of separation…
When we’re ready to try and train or deploy a model, we’ll be spinning up separate, dedicated infrastructure in the SageMaker container runtime – which means we have lots of flexibility to choose resources cost-effectively and only pay for what we need.
All managed
The orchestration that SageMaker gives us to make this happen is closely integrated to these other two services:
The images defining our containers will need to be stored in Amazon ECR (there’s not currently an integration for external registries like DockerHub – but if you have a particular technology in mind our service team would appreciate the feedback!
…And the preferred storage platform for not just our input data but also model artifacts and other stuff generated in the workflow will be Amazon S3. Why? <The generic S3 pitch – it’s got everything you need for a data lake> Most integrated service, arguably most mature, tiers, security models, high durability
Recaping: 4 things
…So let’s look at how that end-to-end process works.
To start with I have:
The data that I want to train on (prepared and loaded to S3) – pre-processed already, in Notebook, but also option for other services like Glue or Processing Jobs to …
The training script I’d like to run (e.g. defining neural network shape and fitting routine – on the notebook instance where I’m working) minimum code
One of the pre-prepared SageMaker framework container images somewhere in Amazon ECR – maybe TensorFlow, PyTorch, or MXNet repeatable, controlled, re-producable
So what’s happening when we start a training job by calling “estimator.fit()” in those examples from before?
We’re gonna start seeing a lot of arrows here, so the cool thing to remember is that all of the arrows are things *SageMaker is doing for you* - not things you need to do yourself!
First, assuming you provide a custom code script (or folder of code), the SageMaker SDK is going to zip that up and upload it to a new location in S3. So you can’t forget to check your working version in to git, and you won’t lose track of that version that worked well in the middle of your experiments: The results are going to be traceable to the code that created them.
Next, SageMaker is going to spin up whatever infrastructure you asked for in the fit() request, and pull down the docker image to run on it
SageMaker will also start downloading your source data from S3 into the container – no messing about with S3 API calls in your script – your code can read it from folder, just as if you were running locally. Env params…
As the container fires up, that framework application does a load of helpful prep but one particularly important thing: It installs any additional inline dependencies specified for your custom code, then starts it up and passes in the parameters of the training job.
Your code runs, prints status to the console, and saves the trained model to disk just like you normally would… But SageMaker takes care of zipping and uploading that final model to S3 – and also other output mechanisms like sending the logs to CloudWatch and collecting metrics. Pay only for …
So the benefit we’ve gained here is that our custom code can be quite simple: Load a CSV from file, make a random forest, save it to file, etc. We can even add specify additional dependencies via a requirements.txt file… and SageMaker plus the framework container will orchestrate these overhead tasks to give us this nice lineage-traceable workflow with all of the cool features we talked about earlier – with no extra code complexity required on our part.
When it’s time to deploy that model to an inference endpoint, we simply reference:
Our model artifact tarball from S3
An inference container (which might be the same one as for training, or might be a different image because the dependencies could be differently optimized for run-time)
And maybe some custom code again: This time just defining some helper functions that we might want to customize from the built-in inference flow, such as how to de/serialize requests and responses, or how the model file(s) need to be loaded from disk into memory if the process is different from standard. How it’s optimized
As in training, SageMaker will handle the creation of infrastructure and loading of these components for us. If we used the ‘estimator’ pattern from the high-level SageMaker SDK, all we need to call is a single estimator.deploy(…) function to make it happen.
Again here the intent is that any custom code needed can be small: Just providing a few optional functions for serialization, model loading, etc… Rather than writing and having to maintain a model server, integrations with TorchServe or TensorFlow Serving, etc.
Custom input format (JSON)…
Not today, but…
In SageMaker, batch transform jobs function pretty much identically to real time inference endpoints from a user code point of view: The batch transform engine handles reading your source data from S3, feeding it through your model, storing the results back to S3, and shutting down the resources again as soon as the job is done.
Pay only for…
Mechanism: how easiest for different personas?
Skillset dependency – learning curve
…So that’s our overview picture for framework containers:
You write pretty minimal code just as you usually would for experimenting in your notebook. But instead of running that code locally, which can make things like infrastructure optimization, experiment tracking, and inference deployment tricky… SageMaker provides some nice streamlined, high-level APIs to trigger containerized training and inference jobs (or deploy endpoints) on separate infrastructure.
At the fundamental level, the system is super flexible because you can make fully custom container images and model artifact tarballs… But the framework container images together with the SageMaker SDK library (for your notebook) enable this higher-level, container-plus-custom-code workflow.
Same as the morning, just diff drawing
Solve problems on experimenting, tracking, etc.
Also lession learnt & best practices
The Repeatable stage is generally focused on applying automation as the number of machine learning workloads running in production increases. In general, at this stage many of the activities in building, training and deploying machine learning models is automated. The introduction of automation reduces manual hand-offs between teams and reduces the operational overhead of previously manual/ad-hoc tasks. The ability to orchestrate machine learning workflows into automated machine learning also depends on having a data strategy and automated data processing tasks.
Queue Management: Ability to manage, schedule, and prioritize tasks
Resource Management: Access to horizontally scalable compute that can scale based on workflow task requirements
Workflow Operators: Error handling, retry and conditional logic functions
Workflow Logs: Centralized logs and configuration parameters for execution and task level logs
The Reliable stage builds on the automation from the Repeatable stage but aims to ensure automation is balanced with practices aimed to increase quality, enable end-to-end traceability, increase reliability through automatic rollbacks, increase visibility into development and operational health, and ensure repeatability. In general, at this stage MLOps practices of Infrastructure-as-Code/Configuration-as-Code, Continuous Integration, Continuous Delivery/Deployment, and Continuous Monitoring are introduced.