Machine Learning in Production: Manu Mukerji, Strata CA March 2018

•Download as PPT, PDF•

3 likes•1,018 views

Manu Mukerji walks you through Acme Corporation’s machine learning example for universal catalogs, explaining how the training and test sets are generated and annotated; how they were created when there is no public training data available; how the model is pushed to production, automatically evaluated, and used; how Acme Corporation built a Hadoop/Spark pipeline using different types of models predicting various values; production issues that arise when applying ML at scale in production; and lessons learned along the way.

Engineering

Machine Learning in Production
Manu Mukerji

What is this talk about?
 Agenda:
- Introduction to the business problem
- Normal ML Flow
- Training Data
- Test Data
- Model Creation
- Testing
- How it ties together: The production flow
- Team Setup
- ML Production Examples
- Questions?
- Live Demo!

Typical ML Example
 This is the ”hello world” equivalent of what you find online:

Generalized ML Flow
1. Gather Data
2. Train Model
3. Test for accuracy (most examples end here)
4. Save model for external consumption
5. Use saved model for prediction

Our use case
 Categorization of products
 Categorization of products into ~4800 categories
 Categorization of 6B products into ~4800 categories across 30 countries
 Categorization of 6B products into ~4800 categories across 30 countries every day!

Gathering Training Data
Annotation UI
HDFSElasticsearch
• Annotate at category level (Less than 10K/country)
• Try and map customer category to Google Category
• Expand training data and infer GCAT from mapping (this brings
it into millions of products)
• What else did we try:
• Mechanical Turk
• External Companies that provide data

More About Training Data
 Bad data, bad predictions
 Overfitting: if you have a hammer, everything looks like a nail
 When to retrain?
 When to add more data?

Test Set
 Test set selection
- Normal method: split 10-20% from training data
- Production method: custom test set based on business value
 Scoring of test set
- Normal method correct/total
- Problem that can occur with test set scoring
- Advanced version: negative points for negative customer value

Manual Overrides… Why?
 Prediction will never be 100% accurate
 When its wrong it impacts business

Testing your model
 Training takes time, this is why your test set is really important
 Automate the build pipeline to run evaluation and deploy your model only if its better than
existing one
 Canary test the whole pipeline
 Advanced Resource:
- Chase Roberts: https://medium.com/@keeper6928/how-to-unit-test-machine-learning-code-57cf6fd81765

Scaling Out
 Now we need to do this in 30 countries
 Easier with Latin alphabet languages
 Deep learning to the rescue: this isn't just a cool thing that folks are talking about
4.7 英寸 Retina HD 示器。一款采用显 64 位面架的桌级构 A8 芯片。焦点像素带
的 8MP iSight 相机。触摸 ID ...
4.7 inch Retina HD display. A 64-bit desktop architecture with the A8 chip.
8MP iSight with focus pixel. Touch ID

Accuracy over time
 70% accuracy ….1 month
 80% accuracy ….3-6 months
 ~90% accuracy …. 1 year
 The last mile……. Rest of your life!

Team setup for ML
 Team setup
- UI/API Team
- ML/AI Engineers
- ML/AI Research

Knowing when to stop?
 This is hard!
 Get something out!
 Don’t work in a vacuum
 Get the circular data flow working
 Remember business value, don’t over engineer it

Examples of interesting AI in production
 Self driving cars
- Some Components:
- Lots of sensors, cameras
- Object detection, and distinguishing what can move vs not
- Lane detection
- Red light vs green

Self Driving..continued
 If I trained a self driving car model with just 50hrs of data would you trust it?
 Probably not…
 According to the DMV in order to get a license: “Have completed 50 hours of practice with
an adult 25 of age years or older.”1
 50 Hours at 60 MPH is 3000 miles…
 Would you trust 1,300,000,000 miles?
1
https://www.dmv.ca.gov/portal/dmv/detail/teenweb/permit_btn1/permit
2
https://www.bloomberg.com/news/articles/2016-12-20/the-tesla-advantage-1-3-billion-miles-of-data

Live Demo: Help me with training data
 You remember how I said Training data is hard to get!

Thank You
 TensorFlow
 Pandas, Scikit-learn
 Andrew Ng!

Honey provides a shopping platform that allows users to shop across multiple stores online. In order to allow products to be more easily discoverable, a unified categorization taxonomy of all products across all stores is required. To accomplish this, we utilize deep learning across tens of millions of products. Multiple models are built, with the model scores compared and a final voting scheme that determines whether we can automatically categorize new products and add them to our product catalog. Products with too much disagreement between the models are passed on to crowdsourcing for further validation.In this talk, the following will be covered:- Model architectures for each of the models along with how text/image features are processed for training- The infrastructure utilized to allow automated model re-training and scoring with Google Cloud Services- The decision-making process behind whether to automatically categorize products or utilize crowdsourcing- How the unified categorization taxonomy was developed- Lessons learned from this process and plans for the future

How to Build an AI/ML Product and Sell it by SalesChoice CPO

Product School

Ispim ottawa 2019

Ala Abu Alkheir

Machine learning by AI

Irfan Abbas

This document discusses machine learning and was created by a group of 4 students. It defines machine learning as systems that can learn and improve automatically from experience without being explicitly programmed. It provides examples like chess games and e-commerce recommendations. The document then discusses common machine learning applications like spam filtering, translation, and virtual assistants. It lists the 7 steps of machine learning as gathering data, preparing data, choosing a model, training, evaluation, parameter tuning, and prediction. Finally, it discusses trends in machine learning like improving algorithms and data gathering, and applications such as object detection, medical diagnosis, and fraud detection.

#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...

Agile Testing Alliance

Fusion - Improving ancillary performance through AI

Jason Keough

This document discusses how an ancillary revenue optimization company uses artificial intelligence to improve ancillary performance and customer experience for clients in the global travel sector. It explains how machine learning and AI techniques like reinforcement learning using multi-armed bandits and contextual bandits can enable personalization at scale, dynamic pricing, and content optimization to test more options simultaneously and adapt offers to changing customer behavior. This leads to significantly improved ancillary revenue performance through higher response and purchase rates compared to traditional A/B testing methods. Prerequisites for successful machine learning implementation include robust data platforms, testing infrastructure, and talent.

AMCAT

Sayalee Sabne

This document is an AMCAT feedback report for Sayalee Pradeep Sabne who took the test on April 13, 2012. The report provides an analysis of their performance on different modules of the AMCAT test and provides eligibility for various job profiles. It analyzes the individual scores and percentiles for English, Quantitative Ability, Computer Fundamentals, Mechanical Engineering and Logical Ability. The report also provides tips for improvement, book references, a suggested study schedule and compares the candidate's scores to minimum cut-offs for different job profiles to determine eligibility.

Projects

Learnbay Datascience

Learnbay provides industry accredited data science courses in Bangalore. We understand the conjugation of technology in the field of Data science hence we offer significant courses like Machine learning, Tensor flow, IBM watson, Google Cloud platform, Tableau, Hadoop, time series, R and Python. With authentic real time industry projects. Students will be efficient by being certified by IBM. Around hundreds of students are placed in promising companies for data science role. Choosing Learnbay you will reach the most aspiring job of present and future. Learnbay data science course covers Data Science with Python,Artificial Intelligence with Python, Deep Learning using Tensor-Flow. These topics are covered and co-developed with IBM.

스폰서 발표 세션 | Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 홍운표 데이터 사이언티스트, DataRobot 데이터로봇은 기존 분석 소프트웨어와 달리 자동화된 분석 플랫폼입니다. 현업 담당자는 데이터 정의만 완료되면 자신의 업무에 AI를 적용하여 업무 효율을 얻을 수 있고, 데이터 과학자도 기존 분석업무 대비 수십배의 효율성을 얻을 수 있습니다. 데이터로봇은 이렇게 기업 업무에 AI를 쉽게 적용하여, 비지니스 가치를 실현하도록 도와드릴 수 있습니다. 본 세션에서는 데이터로봇이 제공하는 자동화된 분석의 세부 기능을 살펴보고 제품 데모를 통해 자동화된 분석이 어떻게 분석 결과물의 품질을 높이고, 기존 분석 작업보다 훨씬 효율적인 업무를 수행할 수 있게 도와드리는지 확인하실 수 있습니다.

2024-02-24_Session 1 - PMLE_UPDATED.pptx

gdgsurrey

Reviewing progress in the machine learning certification journey 𝗦𝗽𝗲𝗰𝗶𝗮𝗹 𝗔𝗱𝗱𝗶𝘁𝗶𝗼𝗻 - Short tech talk on How to Network by Qingyue(Annie) Wang C𝗼𝗻𝘁𝗲𝗻𝘁 𝗿𝗲𝘃𝗶𝗲𝘄 𝗼𝗻 AI and ML on Google Cloud by Margaret Maynard-Reid 𝗔 𝗳𝗼𝗰𝘂𝘀𝗲𝗱 𝗰𝗼𝗻𝘁𝗲𝗻𝘁 𝗿𝗲𝘃𝗶𝗲𝘄 𝗼𝗻 𝗠𝗟 𝗽𝗿𝗼𝗯𝗹𝗲𝗺 𝗳𝗿𝗮𝗺𝗶𝗻𝗴, 𝗺𝗼𝗱𝗲𝗹 𝗲𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻, 𝗮𝗻𝗱 𝗳𝗮𝗶𝗿𝗻𝗲𝘀𝘀 by Sowndarya Venkateswaran. A discussion on sample questions to aid certification exam preparation. An interactive Q&A session to clarify doubts and questions. Previewing next steps and topics, including course completions and material reviews.

Webinar - AI Powered Recommendation Engine for Businesses

JK Tech

The document discusses artificial intelligence (AI) and machine learning, providing an overview of key concepts and a demonstration of an AI-powered job recommendation engine. It notes that global business value from AI is projected to reach $1.6 trillion by 2022. The document outlines different types of machine learning, factors to consider in developing an AI strategy, and suitable toolsets. It then demonstrates a job recommendation engine built using Spark MLib, Prediction IO, and a web application to deliver AI-powered recommendations to job seekers.

Making Netflix Machine Learning Algorithms Reliable

Justin Basilico

This document discusses making Netflix machine learning algorithms reliable. It describes how Netflix uses machine learning for tasks like personalized ranking and recommendation. The goals are to maximize member satisfaction and retention. The models and algorithms used include regression, matrix factorization, neural networks, and bandits. The key aspects of making the models reliable discussed are: automated retraining of models, testing training pipelines, checking models and inputs online for anomalies, responding gracefully to failures, and training models to be resilient to different conditions and failures.

Data Science Introduction by Emerging India Analytics

AyeshaSharma29

CRO analytics - How to Continually Optimise

Phil Pearce

Machine Learning for Finance Master Class

QuantUniversity

This document provides an agenda for a presentation on AI and machine learning for financial professionals. The presentation will be given by Sri Krishnamurthy, founder and CEO of QuantUniversity. The agenda includes introductions of the speaker and an overview of QuantUniversity. It then covers key trends in AI/ML, the basics of machine learning in 30 minutes, building a machine learning application in 10 steps, and case studies of how AI/ML are used in finance from companies like Bank of America, Ravenpack, and Northfield.

Preparing for AI - Measurefest

Guido X Jansen

The document discusses Randstad's experimentation with using evolutionary algorithms and artificial intelligence through the company Sentient Ascend to optimize their conversion rates. They tested Sentient on product detail pages in the Netherlands, Norway, and Sweden, testing various elements. While some variants performed better, the improvements were not always statistically significant due to small sample sizes. Validating the best variant in a traditional A/B test confirmed an uplift. Overall Sentient showed promise but had some issues, and the author questions if it delivers fully on its promises and is worth the costs compared to traditional testing.

infoShare AI Roadshow 2018 - Adam Karwan (Groupon) - Jak wykorzystać uczenie ...

Infoshare

Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta

ISPMAIndia

Presenters: Bhaskaran Srinivasan, Senior Strategy Consultant Ashish Gupta, Senior Product Manager, Google Abstract: This workshop is designed to introduce participants to the opportunities that Generative AI offers through the process steps of a standard NPI. The program provides insights into the capabilities and limitations of Generative AI, offering a hands-on exploration of Gen AI tools tailored for product managers. Attendees will learn how to seamlessly integrate Generative AI into their daily product management workflows, identifying opportunities and prioritizing them based on impact and feasibility. The workshop introduces a robust framework for developing Generative AI-powered products, taking into account crucial factors such as customer pain points, market segment, data and algorithm biases, transparency, user control, and privacy. To enhance the learning experience, the workshop incorporates interactive talks, case study coverage, and group-based hands-on exercises. Geared towards mid-level product managers with a foundational understanding of product management best practices, the workshop is facilitated by two seasoned speakers with expertise in product innovation.

Barga Data Science lecture 10

Roger Barga

This document discusses various techniques for machine learning when labeled training data is limited, including semi-supervised learning approaches that make use of unlabeled data. It describes assumptions like the clustering assumption, low density assumption, and manifold assumption that allow algorithms to learn from unlabeled data. Specific techniques covered include clustering algorithms, mixture models, self-training, and semi-supervised support vector machines.

Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)

Brian Brazil

Brian Brazil, an engineer passionate about running software reliably in production, gave a workshop on provisioning and capacity planning. He taught attendees how to estimate spare capacity and runway by measuring the bottleneck resource, calculating utilization, and determining peak traffic. Brian also covered how to provision new machines based on queries per second per machine. While acknowledging real-world complexities, he emphasized the importance of monitoring for making operational decisions.

Amazon SageMaker 內建機器學習演算法 (Level 400)

Amazon Web Services

This document provides an overview of machine learning algorithms, including supervised and unsupervised learning algorithms. It discusses linear regression, boosted decision trees, factorization machines, sequence-to-sequence models for machine translation, image classification using ResNet, time series forecasting with DeepAR, K-means clustering, principal component analysis (PCA), and neural topic modeling. It also describes how these algorithms are implemented and optimized in Amazon SageMaker for performance and scalability.

L'évolution du métier du DAF induite par la transformation digitale

Microsoft Ideas

"A l’heure de la surinformation et de la multitude des données, de plus en plus d'outils sont à disposition des Directeurs financiers. Comment intégrer le ""Digital Work"" pour la direction financière, un retour d'expérience avec Mathilde Bluteau Chief Financiel Officer pour Microsoft France autour de l'optimisation de la collaboration & l'impact des outils d'analyse prédictive sur les organisations. "

Service industry metrics

Dan Wilson

The document discusses key metrics for service industry support centers, including who is responsible for metrics, why metrics are reported, and what metrics should be measured. It covers common metrics like call volume, abandonment rates, average speed to answer, and financial metrics. It also provides formulas for calculating metrics like abandonment rates, average speed to answer, cost per call, and first call resolution. Tools for reporting metrics include reporting from service management systems, Crystal Reports, and Excel pivot tables.

2021 02 23 MVP Fusion Getting Started with Machine Learning.Net and AutoML

Bruno Capuano

2021 06 19 ms student ambassadors nigeria ml net 01 slide-share

Bruno Capuano

Serverless Machine Learning - Hanoi Google Next 2019

Vũ Đào

This document summarizes serverless machine learning capabilities on Google Cloud Platform (GCP). It discusses machine learning workflows including understanding problems, data collection, preprocessing, modeling, evaluation and deployment. Specific GCP services are mentioned like Cloud Vision, Translate API, AutoML for automated model building, and AI Platform (ML Engine) for custom model development. Data processing techniques are reviewed such as data cleansing, exploration, profiling and issue resolution using tools like Datalab, BigQuery, and Dataflow. The document concludes with contact information for the presenter's company Eway, which provides big data services including recruitment applications using machine learning.

Xpanse-Manufacturing-2023.pdf

NiallWalsh25

The document discusses how AI, ML, and advanced analytics can be used to provide insights from industrial IoT data to solve business challenges. It notes that simply collecting and consolidating raw data provides limited value and that the data must be prepared for machine learning through activities like data cleaning, feature engineering, and model building which typically takes 3-4 months of effort by data scientists and engineers. The document then summarizes how Xpanse is able to rapidly deliver AI and analytics solutions in 5-10 days by leveraging their AI platform and expertise to prepare data and build models, providing faster time to value over traditional internal approaches.

Automated machine learning - Global AI night 2019

Marco Zamana

Automated machine learning aims to simplify and accelerate the machine learning process by automatically identifying the optimal machine learning pipelines for labelled datasets. It does this by intelligently testing multiple models in parallel and optimizing hyperparameters without needing to see the underlying data. This approach was developed by Microsoft Research and is now available through Azure Machine Learning, where it can recommend pipelines for classification and regression tasks on numeric and text data with automated feature engineering. It allows both experts and novices to benefit from machine learning without extensive data science expertise.

TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM

HODECEDSIET

Time Division Multiplexing (TDM) is a method of transmitting multiple signals over a single communication channel by dividing the signal into many segments, each having a very short duration of time. These time slots are then allocated to different data streams, allowing multiple signals to share the same transmission medium efficiently. TDM is widely used in telecommunications and data communication systems. ### How TDM Works 1. **Time Slots Allocation**: The core principle of TDM is to assign distinct time slots to each signal. During each time slot, the respective signal is transmitted, and then the process repeats cyclically. For example, if there are four signals to be transmitted, the TDM cycle will divide time into four slots, each assigned to one signal. 2. **Synchronization**: Synchronization is crucial in TDM systems to ensure that the signals are correctly aligned with their respective time slots. Both the transmitter and receiver must be synchronized to avoid any overlap or loss of data. This synchronization is typically maintained by a clock signal that ensures time slots are accurately aligned. 3. **Frame Structure**: TDM data is organized into frames, where each frame consists of a set of time slots. Each frame is repeated at regular intervals, ensuring continuous transmission of data streams. The frame structure helps in managing the data streams and maintaining the synchronization between the transmitter and receiver. 4. **Multiplexer and Demultiplexer**: At the transmitting end, a multiplexer combines multiple input signals into a single composite signal by assigning each signal to a specific time slot. At the receiving end, a demultiplexer separates the composite signal back into individual signals based on their respective time slots. ### Types of TDM 1. **Synchronous TDM**: In synchronous TDM, time slots are pre-assigned to each signal, regardless of whether the signal has data to transmit or not. This can lead to inefficiencies if some time slots remain empty due to the absence of data. 2. **Asynchronous TDM (or Statistical TDM)**: Asynchronous TDM addresses the inefficiencies of synchronous TDM by allocating time slots dynamically based on the presence of data. Time slots are assigned only when there is data to transmit, which optimizes the use of the communication channel. ### Applications of TDM - **Telecommunications**: TDM is extensively used in telecommunication systems, such as in T1 and E1 lines, where multiple telephone calls are transmitted over a single line by assigning each call to a specific time slot. - **Digital Audio and Video Broadcasting**: TDM is used in broadcasting systems to transmit multiple audio or video streams over a single channel, ensuring efficient use of bandwidth. - **Computer Networks**: TDM is used in network protocols and systems to manage the transmission of data from multiple sources over a single network medium. ### Advantages of TDM - **Efficient Use of Bandwidth**: TDM all

Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...

IJECEIAES

Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to precisely delineate tumor boundaries from magnetic resonance imaging (MRI) scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The model is rigorously trained and evaluated, exhibiting remarkable performance metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical image analysis and enhance healthcare outcomes. This research paves the way for future exploration and optimization of advanced CNN models in medical imaging, emphasizing addressing false positives and resource efficiency.

Similar to Machine Learning in Production: Manu Mukerji, Strata CA March 2018

Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Sum...

Amazon Web Services Korea

2024-02-24_Session 1 - PMLE_UPDATED.pptx

gdgsurrey

Webinar - AI Powered Recommendation Engine for Businesses

JK Tech

Making Netflix Machine Learning Algorithms Reliable

Justin Basilico

Data Science Introduction by Emerging India Analytics

AyeshaSharma29

CRO analytics - How to Continually Optimise

Phil Pearce

Machine Learning for Finance Master Class

QuantUniversity

Preparing for AI - Measurefest

Guido X Jansen

infoShare AI Roadshow 2018 - Adam Karwan (Groupon) - Jak wykorzystać uczenie ...

Infoshare

Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta

ISPMAIndia

Barga Data Science lecture 10

Roger Barga

Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)

Brian Brazil

Amazon SageMaker 內建機器學習演算法 (Level 400)

Amazon Web Services

L'évolution du métier du DAF induite par la transformation digitale

Microsoft Ideas

Service industry metrics

Dan Wilson

2021 02 23 MVP Fusion Getting Started with Machine Learning.Net and AutoML

Bruno Capuano

2021 06 19 ms student ambassadors nigeria ml net 01 slide-share

Bruno Capuano

Serverless Machine Learning - Hanoi Google Next 2019

Vũ Đào

Xpanse-Manufacturing-2023.pdf

NiallWalsh25

Automated machine learning - Global AI night 2019

Marco Zamana

Similar to Machine Learning in Production: Manu Mukerji, Strata CA March 2018 (20)

Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Sum...

2024-02-24_Session 1 - PMLE_UPDATED.pptx

Webinar - AI Powered Recommendation Engine for Businesses

Making Netflix Machine Learning Algorithms Reliable

Data Science Introduction by Emerging India Analytics

CRO analytics - How to Continually Optimise

Machine Learning for Finance Master Class

Preparing for AI - Measurefest

infoShare AI Roadshow 2018 - Adam Karwan (Groupon) - Jak wykorzystać uczenie ...

Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta

Barga Data Science lecture 10

Provisioning and Capacity Planning Workshop (Dogpatch Labs, September 2015)

Amazon SageMaker 內建機器學習演算法 (Level 400)

L'évolution du métier du DAF induite par la transformation digitale

Service industry metrics

2021 02 23 MVP Fusion Getting Started with Machine Learning.Net and AutoML

2021 06 19 ms student ambassadors nigeria ml net 01 slide-share

Serverless Machine Learning - Hanoi Google Next 2019

Xpanse-Manufacturing-2023.pdf

Automated machine learning - Global AI night 2019

Recently uploaded

TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM

HODECEDSIET

Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...

IJECEIAES

ML Based Model for NIDS MSc Updated Presentation.v2.pptx

JamalHussainArman

Engine Lubrication performance System.pdf

mamamaam477

basic-wireline-operations-course-mahmoud-f-radwan.pdf

NidhalKahouli2

132/33KV substation case study Presentation

kandramariana6

Computational Engineering IITH Presentation

co23btech11018

Generative AI leverages algorithms to create various forms of content

Hitesh Mohapatra

Harnessing WebAssembly for Real-time Stateless Streaming Pipelines

Christina Lin

Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.

哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样

insn4465

原版一模一样【微信：741003700 】【(csu毕业证书)查尔斯特大学毕业证硕士学历】【微信：741003700 】学位证，留信认证（真实可查，永久存档）offer、雅思、外壳等材料/诚信可靠,可直接看成品样本，帮您解决无法毕业带来的各种难题！外壳，原版制作，诚信可靠，可直接看成品样本。行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备。十五年致力于帮助留学生解决难题，包您满意。本公司拥有海外各大学样板无数，能完美还原海外各大学 Bachelor Diploma degree, Master Degree Diploma 1:1完美还原海外各大学毕业材料上的工艺：水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠。文字图案浮雕、激光镭射、紫外荧光、温感、复印防伪等防伪工艺。材料咨询办理、认证咨询办理请加学历顾问Q/微741003700 留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才

官方认证美国密歇根州立大学毕业证学位证书原版一模一样

171ticu

原版一模一样【微信：741003700 】【美国密歇根州立大学毕业证学位证书】【微信：741003700 】学位证，留信认证（真实可查，永久存档）offer、雅思、外壳等材料/诚信可靠,可直接看成品样本，帮您解决无法毕业带来的各种难题！外壳，原版制作，诚信可靠，可直接看成品样本。行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备。十五年致力于帮助留学生解决难题，包您满意。本公司拥有海外各大学样板无数，能完美还原海外各大学 Bachelor Diploma degree, Master Degree Diploma 1:1完美还原海外各大学毕业材料上的工艺：水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠。文字图案浮雕、激光镭射、紫外荧光、温感、复印防伪等防伪工艺。材料咨询办理、认证咨询办理请加学历顾问Q/微741003700 留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才

Engineering Drawings Lecture Detail Drawings 2014.pdf

abbyasa1014

Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play

enizeyimana36

this notes have been created by Eric36 at nyamata ttc after readings pe book for y2sme .

Manufacturing Process of molasses based distillery ppt.pptx

Madan Karki

Casting-Defect-inSlab continuous casting.pdf

zubairahmad848137

学校原版美国波士顿大学毕业证学历学位证书原版一模一样

171ticu

原版一模一样【微信：741003700 】【美国波士顿大学毕业证学历学位证书】【微信：741003700 】学位证，留信认证（真实可查，永久存档）offer、雅思、外壳等材料/诚信可靠,可直接看成品样本，帮您解决无法毕业带来的各种难题！外壳，原版制作，诚信可靠，可直接看成品样本。行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备。十五年致力于帮助留学生解决难题，包您满意。本公司拥有海外各大学样板无数，能完美还原海外各大学 Bachelor Diploma degree, Master Degree Diploma 1:1完美还原海外各大学毕业材料上的工艺：水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠。文字图案浮雕、激光镭射、紫外荧光、温感、复印防伪等防伪工艺。材料咨询办理、认证咨询办理请加学历顾问Q/微741003700 留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才

Understanding Inductive Bias in Machine Learning

SUTEJAS

This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models. The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees. By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.

A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS

IJNSA Journal

The smart irrigation system represents an innovative approach to optimize water usage in agricultural and landscaping practices. The integration of cutting-edge technologies, including sensors, actuators, and data analysis, empowers this system to provide accurate monitoring and control of irrigation processes by leveraging real-time environmental conditions. The main objective of a smart irrigation system is to optimize water efficiency, minimize expenses, and foster the adoption of sustainable water management methods. This paper conducts a systematic risk assessment by exploring the key components/assets and their functionalities in the smart irrigation system. The crucial role of sensors in gathering data on soil moisture, weather patterns, and plant well-being is emphasized in this system. These sensors enable intelligent decision-making in irrigation scheduling and water distribution, leading to enhanced water efficiency and sustainable water management practices. Actuators enable automated control of irrigation devices, ensuring precise and targeted water delivery to plants. Additionally, the paper addresses the potential threat and vulnerabilities associated with smart irrigation systems. It discusses limitations of the system, such as power constraints and computational capabilities, and calculates the potential security risks. The paper suggests possible risk treatment methods for effective secure system operation. In conclusion, the paper emphasizes the significant benefits of implementing smart irrigation systems, including improved water conservation, increased crop yield, and reduced environmental impact. Additionally, based on the security analysis conducted, the paper recommends the implementation of countermeasures and security approaches to address vulnerabilities and ensure the integrity and reliability of the system. By incorporating these measures, smart irrigation technology can revolutionize water management practices in agriculture, promoting sustainability, resource efficiency, and safeguarding against potential security threats.

Textile Chemical Processing and Dyeing.pdf

NazakatAliKhoso2

Electric vehicle and photovoltaic advanced roles in enhancing the financial p...

IJECEIAES

Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network

Recently uploaded (20)

TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM

Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...

ML Based Model for NIDS MSc Updated Presentation.v2.pptx

Engine Lubrication performance System.pdf

basic-wireline-operations-course-mahmoud-f-radwan.pdf

132/33KV substation case study Presentation

Computational Engineering IITH Presentation

Generative AI leverages algorithms to create various forms of content

Harnessing WebAssembly for Real-time Stateless Streaming Pipelines

哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样

官方认证美国密歇根州立大学毕业证学位证书原版一模一样

Engineering Drawings Lecture Detail Drawings 2014.pdf

Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play

Manufacturing Process of molasses based distillery ppt.pptx

Casting-Defect-inSlab continuous casting.pdf

学校原版美国波士顿大学毕业证学历学位证书原版一模一样

Understanding Inductive Bias in Machine Learning

A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS

Textile Chemical Processing and Dyeing.pdf

Electric vehicle and photovoltaic advanced roles in enhancing the financial p...

Machine Learning in Production: Manu Mukerji, Strata CA March 2018

1. Machine Learning in Production Manu Mukerji

2. What is this talk about?  Agenda: - Introduction to the business problem - Normal ML Flow - Training Data - Test Data - Model Creation - Testing - How it ties together: The production flow - Team Setup - ML Production Examples - Questions? - Live Demo!

3. Typical ML Example  This is the ”hello world” equivalent of what you find online:

4. Generalized ML Flow 1. Gather Data 2. Train Model 3. Test for accuracy (most examples end here) 4. Save model for external consumption 5. Use saved model for prediction

5. Our use case  Categorization of products  Categorization of products into ~4800 categories  Categorization of 6B products into ~4800 categories across 30 countries  Categorization of 6B products into ~4800 categories across 30 countries every day!

6. Training Data Data is the asset!

7. Training data …lack of training data

8. Gathering Training Data Annotation UI HDFSElasticsearch • Annotate at category level (Less than 10K/country) • Try and map customer category to Google Category • Expand training data and infer GCAT from mapping (this brings it into millions of products) • What else did we try: • Mechanical Turk • External Companies that provide data

9. More About Training Data  Bad data, bad predictions  Overfitting: if you have a hammer, everything looks like a nail  When to retrain?  When to add more data?

10. Test Set  Test set selection - Normal method: split 10-20% from training data - Production method: custom test set based on business value  Scoring of test set - Normal method correct/total - Problem that can occur with test set scoring - Advanced version: negative points for negative customer value

11. Manual Overrides… Why?  Prediction will never be 100% accurate  When its wrong it impacts business

12. Testing your model  Training takes time, this is why your test set is really important  Automate the build pipeline to run evaluation and deploy your model only if its better than existing one  Canary test the whole pipeline  Advanced Resource: - Chase Roberts: https://medium.com/@keeper6928/how-to-unit-test-machine-learning-code-57cf6fd81765

13. Scaling Out  Now we need to do this in 30 countries  Easier with Latin alphabet languages  Deep learning to the rescue: this isn't just a cool thing that folks are talking about 4.7 英寸 Retina HD 示器。一款采用显 64 位面架的桌级构 A8 芯片。焦点像素带的 8MP iSight 相机。触摸 ID ... 4.7 inch Retina HD display. A 64-bit desktop architecture with the A8 chip. 8MP iSight with focus pixel. Touch ID

14. Accuracy over time  70% accuracy ….1 month  80% accuracy ….3-6 months  ~90% accuracy …. 1 year  The last mile……. Rest of your life!

15. Team setup for ML  Team setup - UI/API Team - ML/AI Engineers - ML/AI Research

16. Knowing when to stop?  This is hard!  Get something out!  Don’t work in a vacuum  Get the circular data flow working  Remember business value, don’t over engineer it

17. How it all ties in together

18. Examples of interesting AI in production  Self driving cars - Some Components: - Lots of sensors, cameras - Object detection, and distinguishing what can move vs not - Lane detection - Red light vs green

19. Self Driving..continued  If I trained a self driving car model with just 50hrs of data would you trust it?  Probably not…  According to the DMV in order to get a license: “Have completed 50 hours of practice with an adult 25 of age years or older.”1  50 Hours at 60 MPH is 3000 miles…  Would you trust 1,300,000,000 miles? 1 https://www.dmv.ca.gov/portal/dmv/detail/teenweb/permit_btn1/permit 2 https://www.bloomberg.com/news/articles/2016-12-20/the-tesla-advantage-1-3-billion-miles-of-data

20.

21. The future

22. Questions?

23. Live Demo: Help me with training data  You remember how I said Training data is hard to get!

24. Thank You  TensorFlow  Pandas, Scikit-learn  Andrew Ng!

Editor's Notes

About me
This is a typical ML tutorial you can find online… there is no problem with this per say but this isn’t what happens in production, my goal in this talk is to bridge that gap between ML in production and ML on papers
Data is the asset! In the world of online advertising we have a saying, “If you are not buying a product online, you are the product being sold” and no where is this more true than in Machine Learning… Alexa is a good example of this… apple had a head start on the personal assistant with Siri and Amazon was able to come in and break that monopoly… and now products say Works with Alexa not works with Siri Now imagine you have a great idea about a text to speech engine using deep learning, your algorithm is way better than Amazons but without that data your model will never perform as well..
These are the kind of labels we need, this is hard to get correct, for example if you see a photo of a phone case how do you know if case is its own category or its under phone accessories
When a user would come in they would see 50 random products from the category and we would ask to confirm if everything looks fine.. We would do this 3times per category If its not correct they can remove a product or change a category
Bad data bad predictions: wrong categories from customers would mess up category mappings Overfitting: frys example.. Hard drive vs laptop When to retrain: seasonality example sweaters When to add more data: category coverage is more important than total volume, if we can lable all 6b products we wouldn’t need ML… if you realize that you are not getting good results in a particular category you can try adding more data, but this doesn’t always work and its expensive
Test set selection: normal is a random sample, what we did was we based it on revenue.. You still want to get a good mix but have more products in that group.. Scoring of test set: example with earthquake warning app…4/5 points Advanced version: apple fruit vs apple products apple wont care… sansung/apple but if you categorize an galaxy s8 as an iphone they will care
Click to example: buying soup
Training: takes a long time, if you are using DL it can take days, the code doesn’t break or complain you just see poor results Automate the pipeline Canary test.. In our case we made a fake retailer with know products and some overrides and make sure that when we run those we get the expected results Advance resource: I wont get into it here, but Chase talks about some very good ways of adding unit tests..
I don’t reallly have data points for these numbers… Walking is an example Driving is an example..
Front end work, annotation UI, elastic, getting data in and out, ML eng: working on model pipeline, scale issues, data engineering Research: feature engineering, trying new weights etc, new types of models, new papers… image learning is an example.. Pizza box team … with a side of salad.. The important distinction is their work doesn’t fit into a sprint…
Get something out Don’t work in a vacuum: researchers working on same dataset for years.. Iterrate after.. Get the circular data flow working
Self driving is getting a lot of press… its overhyped on one end and on the other end people are scared of it.. This is my attempt to generalize it..
Child Walking Experience is training your neural net with data Matrix helicopter example

Machine Learning in Production: Manu Mukerji, Strata CA March 2018

Recommended

Recommended

More Related Content

Similar to Machine Learning in Production: Manu Mukerji, Strata CA March 2018

Similar to Machine Learning in Production: Manu Mukerji, Strata CA March 2018 (20)

Recently uploaded

Recently uploaded (20)

Machine Learning in Production: Manu Mukerji, Strata CA March 2018

Editor's Notes