This document provides an introduction to boosted trees. It reviews key concepts in supervised learning such as loss functions, regularization, and the bias-variance tradeoff. Regression trees are described as a model that partitions data and assigns a prediction score to each partition. Gradient boosting is presented as a method for learning an ensemble of regression trees additively to minimize a given loss function. The learning process is formulated as optimizing an objective function that balances training loss and model complexity.
This document discusses XGBoost, an optimized distributed gradient boosting library. It begins by explaining what problems XGBoost can solve like binary classification, regression, and ranking. It then discusses the key concepts in XGBoost including boosted trees, GBDT, tree ensembles, and additive training. XGBoost builds an ensemble of trees using gradient boosting and additive training to minimize loss. It provides efficient algorithms for split finding to construct trees level-by-level to maximize the loss drop at each step.
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)Sri Ambati
Dr. Trevor Hastie of Stanford University discusses the data science behind Gradient Boosted Regression and Classification
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Tong is a data scientist in Supstat Inc and also a master students of Data Mining. He has been an active R programmer and developer for 5 years. He is the author of the R package of XGBoost, one of the most popular and contest-winning tools on kaggle.com nowadays.
Agenda:
Introduction of Xgboost
Real World Application
Model Specification
Parameter Introduction
Advanced Features
Kaggle Winning Solution
Dueling Network Architectures for Deep Reinforcement LearningYoonho Lee
This document summarizes reinforcement learning algorithms like Deep Q-Network (DQN), Double DQN, prioritized experience replay, and the Dueling Network architecture. DQN uses a deep neural network to estimate the Q-function and select actions greedily during training. Double DQN decouples action selection from evaluation to reduce overestimation. Prioritized replay improves sampling to focus on surprising transitions. The Dueling Network separately estimates the state value and state-dependent action advantages to better determine the optimal action. It achieves state-of-the-art performance on Atari games by implicitly splitting credit assignment between choosing now versus later actions.
PR-207: YOLOv3: An Incremental ImprovementJinwon Lee
YOLOv3 makes the following incremental improvements over previous versions of YOLO:
1. It predicts bounding boxes at three different scales to detect objects more accurately at a variety of sizes.
2. It uses Darknet-53 as its feature extractor, which provides better performance than ResNet while being faster to evaluate.
3. It predicts more bounding boxes overall (over 10,000) to detect objects more precisely, as compared to YOLOv2 which predicts around 800 boxes.
Deep Learning: Recurrent Neural Network (Chapter 10) Larry Guo
This Material is an in_depth study report of Recurrent Neural Network (RNN)
Material mainly from Deep Learning Book Bible, http://www.deeplearningbook.org/
Topics: Briefing, Theory Proof, Variation, Gated RNNN Intuition. Real World Application
Application (CNN+RNN on SVHN)
Also a video (In Chinese)
https://www.youtube.com/watch?v=p6xzPqRd46w
Our fall 12-Week Data Science bootcamp starts on Sept 21st,2015. Apply now to get a spot!
If you are hiring Data Scientists, call us at (1)888-752-7585 or reach info@nycdatascience.com to share your openings and set up interviews with our excellent students.
---------------------------------------------------------------
Come join our meet-up and learn how easily you can use R for advanced Machine learning. In this meet-up, we will demonstrate how to understand and use Xgboost for Kaggle competition. Tong is in Canada and will do remote session with us through google hangout.
---------------------------------------------------------------
Speaker Bio:
Tong is a data scientist in Supstat Inc and also a master students of Data Mining. He has been an active R programmer and developer for 5 years. He is the author of the R package of XGBoost, one of the most popular and contest-winning tools on kaggle.com nowadays.
Pre-requisite(if any): R /Calculus
Preparation: A laptop with R installed. Windows users might need to have RTools installed as well.
Agenda:
Introduction of Xgboost
Real World Application
Model Specification
Parameter Introduction
Advanced Features
Kaggle Winning Solution
Event arrangement:
6:45pm Doors open. Come early to network, grab a beer and settle in.
7:00-9:00pm XgBoost Demo
Reference:
https://github.com/dmlc/xgboost
This document discusses XGBoost, an optimized distributed gradient boosting library. It begins by explaining what problems XGBoost can solve like binary classification, regression, and ranking. It then discusses the key concepts in XGBoost including boosted trees, GBDT, tree ensembles, and additive training. XGBoost builds an ensemble of trees using gradient boosting and additive training to minimize loss. It provides efficient algorithms for split finding to construct trees level-by-level to maximize the loss drop at each step.
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)Sri Ambati
Dr. Trevor Hastie of Stanford University discusses the data science behind Gradient Boosted Regression and Classification
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Tong is a data scientist in Supstat Inc and also a master students of Data Mining. He has been an active R programmer and developer for 5 years. He is the author of the R package of XGBoost, one of the most popular and contest-winning tools on kaggle.com nowadays.
Agenda:
Introduction of Xgboost
Real World Application
Model Specification
Parameter Introduction
Advanced Features
Kaggle Winning Solution
Dueling Network Architectures for Deep Reinforcement LearningYoonho Lee
This document summarizes reinforcement learning algorithms like Deep Q-Network (DQN), Double DQN, prioritized experience replay, and the Dueling Network architecture. DQN uses a deep neural network to estimate the Q-function and select actions greedily during training. Double DQN decouples action selection from evaluation to reduce overestimation. Prioritized replay improves sampling to focus on surprising transitions. The Dueling Network separately estimates the state value and state-dependent action advantages to better determine the optimal action. It achieves state-of-the-art performance on Atari games by implicitly splitting credit assignment between choosing now versus later actions.
PR-207: YOLOv3: An Incremental ImprovementJinwon Lee
YOLOv3 makes the following incremental improvements over previous versions of YOLO:
1. It predicts bounding boxes at three different scales to detect objects more accurately at a variety of sizes.
2. It uses Darknet-53 as its feature extractor, which provides better performance than ResNet while being faster to evaluate.
3. It predicts more bounding boxes overall (over 10,000) to detect objects more precisely, as compared to YOLOv2 which predicts around 800 boxes.
Deep Learning: Recurrent Neural Network (Chapter 10) Larry Guo
This Material is an in_depth study report of Recurrent Neural Network (RNN)
Material mainly from Deep Learning Book Bible, http://www.deeplearningbook.org/
Topics: Briefing, Theory Proof, Variation, Gated RNNN Intuition. Real World Application
Application (CNN+RNN on SVHN)
Also a video (In Chinese)
https://www.youtube.com/watch?v=p6xzPqRd46w
Our fall 12-Week Data Science bootcamp starts on Sept 21st,2015. Apply now to get a spot!
If you are hiring Data Scientists, call us at (1)888-752-7585 or reach info@nycdatascience.com to share your openings and set up interviews with our excellent students.
---------------------------------------------------------------
Come join our meet-up and learn how easily you can use R for advanced Machine learning. In this meet-up, we will demonstrate how to understand and use Xgboost for Kaggle competition. Tong is in Canada and will do remote session with us through google hangout.
---------------------------------------------------------------
Speaker Bio:
Tong is a data scientist in Supstat Inc and also a master students of Data Mining. He has been an active R programmer and developer for 5 years. He is the author of the R package of XGBoost, one of the most popular and contest-winning tools on kaggle.com nowadays.
Pre-requisite(if any): R /Calculus
Preparation: A laptop with R installed. Windows users might need to have RTools installed as well.
Agenda:
Introduction of Xgboost
Real World Application
Model Specification
Parameter Introduction
Advanced Features
Kaggle Winning Solution
Event arrangement:
6:45pm Doors open. Come early to network, grab a beer and settle in.
7:00-9:00pm XgBoost Demo
Reference:
https://github.com/dmlc/xgboost
발표자: 곽동현(서울대 박사과정, 현 NAVER Clova)
강화학습(Reinforcement learning)의 개요 및 최근 Deep learning 기반의 RL 트렌드를 소개합니다.
발표영상:
http://tv.naver.com/v/2024376
https://youtu.be/dw0sHzE1oAc
MLPfit is a tool for designing and training multi-layer perceptrons (MLPs) for tasks like function approximation and classification. It implements stochastic minimization as well as more powerful methods like conjugate gradients and BFGS. MLPfit is designed to be simple, precise, fast and easy to use for both standalone and integrated applications. Documentation and source code are available online.
Winning Kaggle 101: Introduction to StackingTed Xiao
This document provides an introduction to stacking, an ensemble machine learning method. Stacking involves training a "metalearner" to optimally combine the predictions from multiple "base learners". The stacking algorithm was developed in the 1990s and improved upon with techniques like cross-validation and the "Super Learner" which combines models in a way that is provably asymptotically optimal. H2O implements an efficient stacking method called H2O Ensemble which allows for easily finding the best combination of algorithms like GBM, DNNs, and more to improve predictions.
I have implemented various optimizers (gradient descent, momentum, adam, etc.) based on gradient descent using only numpy not deep learning framework like TensorFlow.
This document summarizes gradient boosting algorithms XGBoost and LightGBM. It covers decision trees, overfitting, regularization, feature engineering, parameter tuning, evaluation metrics, and comparisons between XGBoost and LightGBM. Key aspects discussed include XGBoost and LightGBM's tolerance of outliers, non-standardized features, collinear features, and NaN values. Parameter tuning, using RandomizedSearchCV and GridSearchCV, and ensembling models to optimize multiple metrics are also covered.
Basic of Decision Tree Learning. This slide includes definition of decision tree, basic example, basic construction of a decision tree, mathlab example
Gradient descent optimization with simple examples. covers sgd, mini-batch, momentum, adagrad, rmsprop and adam.
Made for people with little knowledge of neural network.
In this talk, Dmitry shares his approach to feature engineering which he used successfully in various Kaggle competitions. He covers common techniques used to convert your features into numeric representation used by ML algorithms.
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...Simplilearn
This K-Nearest Neighbor Classification Algorithm presentation (KNN Algorithm) will help you understand what is KNN, why do we need KNN, how do we choose the factor 'K', when do we use KNN, how does KNN algorithm work and you will also see a use case demo showing how to predict whether a person will have diabetes or not using KNN algorithm. KNN algorithm can be applied to both classification and regression problems. Apparently, within the Data Science industry, it's more widely used to solve classification problems. It’s a simple algorithm that stores all available cases and classifies any new cases by taking a majority vote of its k neighbors. Now lets deep dive into these slides to understand what is KNN algorithm and how does it actually works.
Below topics are explained in this K-Nearest Neighbor Classification Algorithm (KNN Algorithm) tutorial:
1. Why do we need KNN?
2. What is KNN?
3. How do we choose the factor 'K'?
4. When do we use KNN?
5. How does KNN algorithm work?
6. Use case - Predict whether a person will have diabetes or not
Simplilearn’s Machine Learning course will make you an expert in Machine Learning, a form of Artificial Intelligence that automates data analysis to enable computers to learn and adapt through experience to do specific tasks without explicit programming. You will master Machine Learning concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, hands-on modeling to develop algorithms and prepare you for the role of Machine Learning Engineer
Why learn Machine Learning?
Machine Learning is rapidly being deployed in all kinds of industries, creating a huge demand for skilled professionals. The Machine Learning market size is expected to grow from USD 1.03 billion in 2016 to USD 8.81 billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
You can gain in-depth knowledge of Machine Learning by taking our Machine Learning certification training course. With Simplilearn’s Machine Learning course, you will prepare for a career as a Machine Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms. Those who complete the course will be able to:
1. Master the concepts of supervised, unsupervised and reinforcement learning concepts and modeling.
2. Gain practical mastery over principles, algorithms, and applications of Machine Learning through a hands-on approach which includes working on 28 projects and one capstone project.
3. Acquire thorough knowledge of the mathematical and heuristic aspects of Machine Learning.
4. Understand the concepts and operation of support vector machines, kernel SVM, Naive Bayes, decision tree classifier, random forest classifier, logistic regression, K-nearest neighbors, K-means clustering and more.
Learn more at: https://www.simplilearn.com
Here is my class on the multilayer perceptron where I look at the following:
1.- The entire backproagation algorithm based in the gradient descent
However, I am planning the tanning based in Kalman filters.
2.- The use of matrix computations to simplify the implementations.
I hope you enjoy it.
The document discusses various techniques for tuning the learning rate when training neural networks, including adaptive learning rate methods, learning rate annealing, cyclical learning rates, and using a learning rate finder. It provides examples of implementing learning rate schedules like step decay, linear decay, and polynomial decay in Keras. Cyclical learning rates and the one cycle policy are also covered, with the one cycle policy combining learning rate and momentum scheduling.
Tijmen Blankenvoort, co-founder Scyfer BV, presentation at Artificial Intelligence Meetup 15-1-2014. Introduction into Neural Networks and Deep Learning.
This is the slide from my talk at FULokoja Ingressive meetup.
XGBoost is a decision-tree-based ensemble Machine Learning algorithm that uses a gradient boosting framework. In prediction problems involving unstructured and structured data (images, text, etc.) artificial neural networks tend to outperform all other algorithms or frameworks. However, when it comes to small-to-medium structured/tabular data, decision tree-based algorithms are considered best-in-class right now. XGBoost model has the best combination of prediction performance and processing time compared to other algorithms.
This document provides an introduction to learning Chinese. It includes basic greetings and phrases for self-introductions. Key vocabulary words are defined in both Chinese characters and pinyin romanization. Examples are provided for common greetings like "Hello" and "Thank you." Students are instructed to practice introducing themselves by filling in their own name and details. The homework assigned is to learn over 20 basic words from a YouTube video and do additional online practice tests.
발표자: 곽동현(서울대 박사과정, 현 NAVER Clova)
강화학습(Reinforcement learning)의 개요 및 최근 Deep learning 기반의 RL 트렌드를 소개합니다.
발표영상:
http://tv.naver.com/v/2024376
https://youtu.be/dw0sHzE1oAc
MLPfit is a tool for designing and training multi-layer perceptrons (MLPs) for tasks like function approximation and classification. It implements stochastic minimization as well as more powerful methods like conjugate gradients and BFGS. MLPfit is designed to be simple, precise, fast and easy to use for both standalone and integrated applications. Documentation and source code are available online.
Winning Kaggle 101: Introduction to StackingTed Xiao
This document provides an introduction to stacking, an ensemble machine learning method. Stacking involves training a "metalearner" to optimally combine the predictions from multiple "base learners". The stacking algorithm was developed in the 1990s and improved upon with techniques like cross-validation and the "Super Learner" which combines models in a way that is provably asymptotically optimal. H2O implements an efficient stacking method called H2O Ensemble which allows for easily finding the best combination of algorithms like GBM, DNNs, and more to improve predictions.
I have implemented various optimizers (gradient descent, momentum, adam, etc.) based on gradient descent using only numpy not deep learning framework like TensorFlow.
This document summarizes gradient boosting algorithms XGBoost and LightGBM. It covers decision trees, overfitting, regularization, feature engineering, parameter tuning, evaluation metrics, and comparisons between XGBoost and LightGBM. Key aspects discussed include XGBoost and LightGBM's tolerance of outliers, non-standardized features, collinear features, and NaN values. Parameter tuning, using RandomizedSearchCV and GridSearchCV, and ensembling models to optimize multiple metrics are also covered.
Basic of Decision Tree Learning. This slide includes definition of decision tree, basic example, basic construction of a decision tree, mathlab example
Gradient descent optimization with simple examples. covers sgd, mini-batch, momentum, adagrad, rmsprop and adam.
Made for people with little knowledge of neural network.
In this talk, Dmitry shares his approach to feature engineering which he used successfully in various Kaggle competitions. He covers common techniques used to convert your features into numeric representation used by ML algorithms.
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...Simplilearn
This K-Nearest Neighbor Classification Algorithm presentation (KNN Algorithm) will help you understand what is KNN, why do we need KNN, how do we choose the factor 'K', when do we use KNN, how does KNN algorithm work and you will also see a use case demo showing how to predict whether a person will have diabetes or not using KNN algorithm. KNN algorithm can be applied to both classification and regression problems. Apparently, within the Data Science industry, it's more widely used to solve classification problems. It’s a simple algorithm that stores all available cases and classifies any new cases by taking a majority vote of its k neighbors. Now lets deep dive into these slides to understand what is KNN algorithm and how does it actually works.
Below topics are explained in this K-Nearest Neighbor Classification Algorithm (KNN Algorithm) tutorial:
1. Why do we need KNN?
2. What is KNN?
3. How do we choose the factor 'K'?
4. When do we use KNN?
5. How does KNN algorithm work?
6. Use case - Predict whether a person will have diabetes or not
Simplilearn’s Machine Learning course will make you an expert in Machine Learning, a form of Artificial Intelligence that automates data analysis to enable computers to learn and adapt through experience to do specific tasks without explicit programming. You will master Machine Learning concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, hands-on modeling to develop algorithms and prepare you for the role of Machine Learning Engineer
Why learn Machine Learning?
Machine Learning is rapidly being deployed in all kinds of industries, creating a huge demand for skilled professionals. The Machine Learning market size is expected to grow from USD 1.03 billion in 2016 to USD 8.81 billion by 2022, at a Compound Annual Growth Rate (CAGR) of 44.1% during the forecast period.
You can gain in-depth knowledge of Machine Learning by taking our Machine Learning certification training course. With Simplilearn’s Machine Learning course, you will prepare for a career as a Machine Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms. Those who complete the course will be able to:
1. Master the concepts of supervised, unsupervised and reinforcement learning concepts and modeling.
2. Gain practical mastery over principles, algorithms, and applications of Machine Learning through a hands-on approach which includes working on 28 projects and one capstone project.
3. Acquire thorough knowledge of the mathematical and heuristic aspects of Machine Learning.
4. Understand the concepts and operation of support vector machines, kernel SVM, Naive Bayes, decision tree classifier, random forest classifier, logistic regression, K-nearest neighbors, K-means clustering and more.
Learn more at: https://www.simplilearn.com
Here is my class on the multilayer perceptron where I look at the following:
1.- The entire backproagation algorithm based in the gradient descent
However, I am planning the tanning based in Kalman filters.
2.- The use of matrix computations to simplify the implementations.
I hope you enjoy it.
The document discusses various techniques for tuning the learning rate when training neural networks, including adaptive learning rate methods, learning rate annealing, cyclical learning rates, and using a learning rate finder. It provides examples of implementing learning rate schedules like step decay, linear decay, and polynomial decay in Keras. Cyclical learning rates and the one cycle policy are also covered, with the one cycle policy combining learning rate and momentum scheduling.
Tijmen Blankenvoort, co-founder Scyfer BV, presentation at Artificial Intelligence Meetup 15-1-2014. Introduction into Neural Networks and Deep Learning.
This is the slide from my talk at FULokoja Ingressive meetup.
XGBoost is a decision-tree-based ensemble Machine Learning algorithm that uses a gradient boosting framework. In prediction problems involving unstructured and structured data (images, text, etc.) artificial neural networks tend to outperform all other algorithms or frameworks. However, when it comes to small-to-medium structured/tabular data, decision tree-based algorithms are considered best-in-class right now. XGBoost model has the best combination of prediction performance and processing time compared to other algorithms.
This document provides an introduction to learning Chinese. It includes basic greetings and phrases for self-introductions. Key vocabulary words are defined in both Chinese characters and pinyin romanization. Examples are provided for common greetings like "Hello" and "Thank you." Students are instructed to practice introducing themselves by filling in their own name and details. The homework assigned is to learn over 20 basic words from a YouTube video and do additional online practice tests.
This document lists numbers and their Chinese characters from one to ten trillion, as well as some common quantity-related terms like half, few, many, and their Chinese characters. It provides the Chinese characters, pinyin romanization, and English translation for numbers and quantity words.
HSK (Hanyu Shuiping Kaoshi) is the Chinese Proficiency Test. It is widely used by academic institutions and employers to evaluate candidate's Mandarin level. It is composed of 6 levels and requires the student to master a total of 5000 vocabulary words.
Authorized by Hanban, HSK Standard Course is developed under the joint efforts of Beijing Language and Culture University Press and Chinese Testing International (CTI). With HSK test papers as its primary source, HSK Standard Course is characterized by a humorous style, familiar topics and a scientific course design. Matching the HSK test in all aspects, from the content, form to the levels, it is a series of new-type course books embodying the idea of “combining testing and teaching, and promoting learning and teaching by testing”. It is suitable for the Confucius Institutes in different countries as well as other Chinese teaching institutions and self-taught learners.
The whole series is divided into six levels matching the HSK test, with one volume for each of Levels 1-3 and two volumes for each of Levels 4-6, totaling nine volumes. With a textbook, a workbook and a teacher’s book in each volume, there are altogether 27 books. The book is illustrated with photos match the style of the test and is printed in full color. An audio CD comes with the book.
This document summarizes the traditions of marriage in ancient and modern China. In ancient China, marriage was arranged by parents and involved strict procedures like matchmaking, betrothal gifts, and bowing ceremonies. Now, Chinese marriage still involves ceremonies but allows more individual choice and expression of love. The traditions are changing with more international influences and acceptance of diverse concepts of marriage.
Chinese Made Easy for Kids is a language learning series published by Joint Publishing Co Ltd. that uses a communicative approach to teach Chinese as a second language to primary school students. The series includes 4 textbooks that introduce pinyin, characters, vocabulary and sentence structures related to everyday topics through listening, speaking, exercises and games. The goal is to help beginners build a solid foundation in Chinese.
The document provides an overview of the major Chinese proficiency tests administered in China - the Chinese Proficiency Test (HSK) and the Youth Chinese Test (YCT). It explains the different levels for each test based on amount of study, vocabulary mastery, and intended students. It also gives details on test structure, fees, dates, registration, and scholarships for outstanding scores.
Chinese weddings follow many traditions and customs. The groom's parents gift the bride's family and a lucky wedding date is chosen using the lunar calendar. Engagements involve formal meetings between families and exchanging of gifts. The night before the wedding, the bride sleeps alone and her new bed is decorated with fruits and seeds. On the wedding day, the bride is bathed and dressed, then carried to the groom's house in a chair. Traditional wedding attire includes a red veil and jacket for the bride and a black silk coat for the groom. Flowers, firecrackers, and money packets are part of the procession. The new couple then enters the nuptial chamber to begin their marriage.
The document discusses the levels of the HSK test, which is used to assess the Chinese language proficiency of non-native Chinese speakers. It describes the six levels of the HSK in terms of the vocabulary sizes and competencies assessed at each level. The levels correspond to levels outlined in other frameworks for language proficiency. The document provides details on the abilities and amount of Chinese that test takers who pass each HSK level are expected to have mastered.
Free Chinese Level Assessment, Take more quiz and know your answer intantly! All the questions for the quiz are carefully designed and reviewed,
Each question is testing a knowledge point, grammar, pinyin, vocabulary....
We are not lenient to create more questions for improving your learning but don't want create rubbish to kill your time as well!
login: www.legoomandarin.com .......More.....
One of the things that many of us look forward to during the Chinese New Year celebration is the wide variety of festive goodies. Before you indulge in your favourite treats, do you know how they came about and what are the auspicious meanings behind them?
The document discusses the benefits of exercise for both physical and mental health. It notes that regular exercise can reduce the risk of diseases like heart disease and diabetes, improve mood, and reduce feelings of stress and anxiety. The document recommends that adults get at least 150 minutes of moderate exercise or 75 minutes of vigorous exercise per week to experience these benefits.
Free Chinese Level Assessment, Take more quiz and know your answer intantly! All the questions for the quiz are carefully designed and reviewed,
Each question is testing a knowledge point, grammar, pinyin, vocabulary....
We are not lenient to create more questions for improving your learning but don't want create rubbish to kill your time as well!
login: www.legoomandarin.com .......More.....
Culture study unit 15 Chinese art worksheetJoanne Chen
The document discusses different aspects of traditional Chinese art, including the four treasures of study used by scholars (brush, ink, paper, and inkstone). It outlines the three main types of traditional Chinese painting (Gongbi, Xieyi, and literati painting) and describes Gongbi style as detailed and realistic paintings. It asks questions about the famous Chinese calligrapher Wang Xizhi and how calligraphy is still practiced and seen in daily life in China, such as public spaces. The document encourages learning about calligraphy styles in one's own culture and trying Chinese calligraphy.
This document discusses XGBoost, an optimized distributed gradient boosting library. It begins by explaining what XGBoost can do, including binary classification, multiclass classification, regression, and learning to rank. It then discusses boosted trees and their variants like GBDT, GBRT, and MART. It explains how tree ensembles work by combining many decision trees to make predictions and describes XGBoost's additive training process of greedily adding trees to minimize loss. It also covers XGBoost's efficient splitting algorithm for growing trees and references for further information.
This presentation discusses decision trees as a machine learning technique. This introduces the problem with several examples: cricket player selection, medical C-Section diagnosis and Mobile Phone price predictor. It discusses the ID3 algorithm and discusses how the decision tree is induced. The definition and use of the concepts such as Entropy, Information Gain are discussed.
Slides covered during Analytics Boot Camp conducted with the help of IBM, Venturesity. Special credits to Kumar Rishabh (Google) and Srinivas Nv Gannavarapu (IBM)
The document provides an introduction to deep learning and how to compute gradients in deep learning models. It discusses machine learning concepts like training models on data to learn patterns, supervised learning tasks like image classification, and optimization techniques like stochastic gradient descent. It then explains how to compute gradients using backpropagation in deep multi-layer neural networks, allowing models to be trained on large datasets. Key steps like the chain rule and backpropagation of errors from the final layer back through the network are outlined.
Deep Learning Introduction - WeCloudDataWeCloudData
This document provides an overview of machine learning and deep learning concepts including:
- Machine learning basics such as supervised vs. unsupervised learning and performance measures.
- A brief history of deep learning and basics such as neural networks.
- Linear algebra concepts from vectors to tensors that are important for machine learning.
- Specific machine learning algorithms including linear regression, logistic regression, and TensorFlow basics for defining and executing computation graphs.
This document provides an overview of machine learning basics and linear regression. It defines machine learning as a program that improves its performance on tasks through experience. Linear regression aims to fit a linear model to training data by minimizing the empirical loss between predicted and true target values. It works by finding the weights that minimize the mean squared error loss on the training data according to the normal equation. The bias term can be incorporated by augmenting features with 1s.
This document provides an overview of decision tree algorithms for machine learning. It discusses key concepts such as:
- Decision trees can be used for classification or regression problems.
- They represent rules that can be understood by humans and used in knowledge systems.
- The trees are built by splitting the data into purer subsets based on attribute tests, using measures like information gain.
- Issues like overfitting are addressed through techniques like reduced error pruning and rule post-pruning.
The document discusses decision trees and decision tree learning algorithms. It defines decision trees as tree-structured models that represent a series of decisions that lead to an outcome. Each node in the tree represents a test on an attribute, and branches represent outcomes of the test. It describes how decision tree learning algorithms work by recursively splitting the data into purer subsets based on attribute values, until a leaf node is reached that predicts the label. The document discusses information gain and Gini impurity as metrics for selecting the best attribute to split on at each node to gain the most information about the label.
This document provides an introduction and overview for a course on machine learning. It outlines the course structure, assignments, and expectations. The course will cover topics including linear regression, classification, model selection, and dimensionality reduction. It will teach students how to analyze data, preprocess it, extract features, train models, and evaluate model performance. The goal is for students to understand core machine learning algorithms and concepts. Required materials include an introduction to statistical learning textbook.
The document discusses using data science to predict Oscar winners. It outlines the data science process of framing a question about predicting Best Picture winners, collecting financial and review data on films, cleaning and formatting the data, then using machine learning techniques like decision trees and random forests to make predictions and explore patterns in the data. Basic tools for data science like Jupyter notebooks, NumPy, Pandas and Scikit-learn are introduced. The models were able to accurately predict most Best Picture winners except for 2017. The document also provides information about an online program for learning data science.
This document provides an introduction to machine learning concepts including supervised learning, unsupervised learning, reinforcement learning, classification, regression, clustering, naive Bayes classifier, k-nearest neighbors algorithm, decision trees, and support vector machines. It defines each concept and technique, provides examples to illustrate how they work, and discusses their advantages and disadvantages. The key machine learning algorithms covered are naive Bayes, k-NN, decision trees, and support vector machines.
A presentation about NGBoost (Natural Gradient Boosting) which I presented in the Information Theory and Probabilistic Programming course at the University of Oklahoma.
The document discusses machine learning concepts including:
1. Supervised learning aims to learn a function that maps inputs to target variables by minimizing error on training data. Decision tree learning is an example approach.
2. Decision trees partition data into purer subsets using information gain, which measures the reduction in entropy when an attribute is used.
3. The greedy decision tree algorithm recursively selects the attribute with highest information gain to split on, growing subtrees until leaves contain only one class.
The document summarizes a lecture on decision trees for classification. It introduces decision trees as a non-linear classification approach that learns directly from data representations. It then describes the basic ID3 algorithm for learning decision trees in a greedy top-down manner by choosing attributes that best split the data based on information gain at each step, recursively building the tree until reaching leaf nodes of single target classifications. The goal is to learn a compact tree representation of the training data.
This document discusses decision trees and random forests for classification problems. It explains that decision trees use a top-down approach to split a training dataset based on attribute values to build a model for classification. Random forests improve upon decision trees by growing many de-correlated trees on randomly sampled subsets of data and features, then aggregating their predictions, which helps avoid overfitting. The document provides examples of using decision trees to classify wine preferences, sports preferences, and weather conditions for sport activities based on attribute values.
This document provides information about the Machine Learning course EC-452 offered in Fall 2023. It is a 3 credit elective course that can be taken by DE-42 (Electrical) students in their 7th semester. Assessment will include a midterm exam, final exam, quizzes and assignments. Topics that will be covered include introduction to machine learning, applications of machine learning, common understanding of machine learning concepts and jargon, supervised learning workflow and notation, data representation, hypothesis space, classes of machine learning algorithms, and algorithm categorization schemes.
This document provides an overview of regression analysis and linear regression. It explains that regression analysis estimates relationships among variables to predict continuous outcomes. Linear regression finds the best fitting line through minimizing error. It describes modeling with multiple features, representing data in vector and matrix form, and using gradient descent optimization to learn the weights through iterative updates. The goal is to minimize a cost function measuring error between predictions and true values.
This document discusses decision trees and entropy. It begins by providing examples of binary and numeric decision trees used for classification. It then describes characteristics of decision trees such as nodes, edges, and paths. Decision trees are used for classification by organizing attributes, values, and outcomes. The document explains how to build decision trees using a top-down approach and discusses splitting nodes based on attribute type. It introduces the concept of entropy from information theory and how it can measure the uncertainty in data for classification. Entropy is the minimum number of questions needed to identify an unknown value.
The document discusses different types of machine learning, including supervised learning where algorithms are trained using labeled examples, unsupervised learning which explores unlabeled data to find structures, semi-supervised learning which uses both labeled and unlabeled data, and reinforcement learning where an agent learns through trial and error interactions with an environment. It also covers topics such as natural language processing, ensemble learning techniques like boosting and bagging, and applications of machine learning like image recognition, medical diagnosis, and fraud detection. The document provides an overview of key concepts in machine learning including how learning systems work and the different steps involved in natural language processing.
Similar to Introduction to Boosted Trees by Tianqi Chen (20)
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
2. Outline
• Review of key concepts of supervised learning
• Regression Tree and Ensemble (What are we Learning)
• Gradient Boosting (How do we Learn)
• Summary
3. Elements in Supervised Learning
• Notations: i-th training example
• Model: how to make prediction given
Linear model: (include linear/logistic regression)
The prediction score can have different interpretations
depending on the task
Linear regression: is the predicted score
Logistic regression: is predicted the probability
of the instance being positive
Others… for example in ranking can be the rank score
• Parameters: the things we need to learn from data
Linear model:
4. Elements continued: Objective Function
• Objective function that is everywhere
• Loss on training data:
Square loss:
Logistic loss:
• Regularization: how complicated the model is?
L2 norm:
L1 norm (lasso):
Training Loss measures how
well model fit on training data
Regularization, measures
complexity of model
5. Putting known knowledge into context
• Ridge regression:
Linear model, square loss, L2 regularization
• Lasso:
Linear model, square loss, L1 regularization
• Logistic regression:
Linear model, logistic loss, L2 regularization
• The conceptual separation between model, parameter,
objective also gives you engineering benefits.
Think of how you can implement SGD for both ridge regression
and logistic regression
6. Objective and Bias Variance Trade-off
• Why do we want to contain two component in the objective?
• Optimizing training loss encourages predictive models
Fitting well in training data at least get you close to training data
which is hopefully close to the underlying distribution
• Optimizing regularization encourages simple models
Simpler models tends to have smaller variance in future
predictions, making prediction stable
Training Loss measures how
well model fit on training data
Regularization, measures
complexity of model
7. Outline
• Review of key concepts of supervised learning
• Regression Tree and Ensemble (What are we Learning)
• Gradient Boosting (How do we Learn)
• Summary
8. Regression Tree (CART)
• regression tree (also known as classification and regression
tree):
Decision rules same as in decision tree
Contains one score in each leaf value
Input: age, gender, occupation, …
age < 15
is male?
+2 -1+0.1
Y N
Y N
Does the person like computer games
prediction score in each leaf
9. Regression Tree Ensemble
age < 15
is male?
+2 -1+0.1
Y N
Y N
Use Computer
Daily
Y N
+0.9
-0.9
tree1 tree2
f( ) = 2 + 0.9= 2.9 f( )= -1 + 0.9= -0.1
Prediction of is sum of scores predicted by each of the tree
10. Tree Ensemble methods
• Very widely used, look for GBM, random forest…
Almost half of data mining competition are won by using some
variants of tree ensemble methods
• Invariant to scaling of inputs, so you do not need to do careful
features normalization.
• Learn higher order interaction between features.
• Can be scalable, and are used in Industry
11. Put into context: Model and Parameters
• Model: assuming we have K trees
Think: regression tree is a function that maps the attributes to the score
• Parameters
Including structure of each tree, and the score in the leaf
Or simply use function as parameters
Instead learning weights in , we are learning functions(trees)
Space of functions containing all Regression trees
12. Learning a tree on single variable
• How can we learn functions?
• Define objective (loss, regularization), and optimize it!!
• Example:
Consider regression tree on single input t (time)
I want to predict whether I like romantic music at time t
t < 2011/03/01
t < 2010/03/20
Y N
Y N
0.2
Equivalently
The model is regression tree that splits on time
1.2
1.0
Piecewise step function over time
13. Learning a step function
• Things we need to learn
• Objective for single variable regression tree(step functions)
Training Loss: How will the function fit on the points?
Regularization: How do we define complexity of the function?
Number of splitting points, l2 norm of the height in each segment?
Splitting Positions
The Height in each segment
15. Coming back: Objective for Tree Ensemble
• Model: assuming we have K trees
• Objective
• Possible ways to define ?
Number of nodes in the tree, depth
L2 norm of the leaf weights
… detailed later
Training loss Complexity of the Trees
16. Objective vs Heuristic
• When you talk about (decision) trees, it is usually heuristics
Split by information gain
Prune the tree
Maximum depth
Smooth the leaf values
• Most heuristics maps well to objectives, taking the formal
(objective) view let us know what we are learning
Information gain -> training loss
Pruning -> regularization defined by #nodes
Max depth -> constraint on the function space
Smoothing leaf values -> L2 regularization on leaf weights
17. Regression Tree is not just for regression!
• Regression tree ensemble defines how you make the
prediction score, it can be used for
Classification, Regression, Ranking….
….
• It all depends on how you define the objective function!
• So far we have learned:
Using Square loss
Will results in common gradient boosted machine
Using Logistic loss
Will results in LogitBoost
18. Take Home Message for this section
• Bias-variance tradeoff is everywhere
• The loss + regularization objective pattern applies for
regression tree learning (function learning)
• We want predictive and simple functions
• This defines what we want to learn (objective, model).
• But how do we learn it?
Next section
19. Outline
• Review of key concepts of supervised learning
• Regression Tree and Ensemble (What are we Learning)
• Gradient Boosting (How do we Learn)
• Summary
20. So How do we Learn?
• Objective:
• We can not use methods such as SGD, to find f (since they are
trees, instead of just numerical vectors)
• Solution: Additive Training (Boosting)
Start from constant prediction, add a new function each time
Model at training round t
New function
Keep functions added in previous round
21. Additive Training
• How do we decide which f to add?
Optimize the objective!!
• The prediction at round t is
• Consider square loss
This is what we need to decide in round t
Goal: find to minimize this
This is usually called residual from previous round
22. Taylor Expansion Approximation of Loss
• Goal
Seems still complicated except for the case of square loss
• Take Taylor expansion of the objective
Recall
Define
• If you are not comfortable with this, think of square loss
• Compare what we get to previous slide
23. Our New Goal
• Objective, with constants removed
where
• Why spending s much efforts to derive the objective, why not
just grow trees …
Theoretical benefit: know what we are learning, convergence
Engineering benefit, recall the elements of supervised learning
and comes from definition of loss function
The learning of function only depend on the objective via and
Think of how you can separate modules of your code when you
are asked to implement boosted tree for both square loss and
logistic loss
24. Refine the definition of tree
• We define tree by a vector of scores in leafs, and a leaf index
mapping function that maps an instance to a leaf
age < 15
is male?
Y N
Y N
Leaf 1 Leaf 2 Leaf 3
q( ) = 1
q( ) = 3
w1=+2 w2=0.1 w3=-1
The structure of the tree
The leaf weight of the tree
25. Define the Complexity of Tree
• Define complexity as (this is not the only possible definition)
age < 15
is male?
Y N
Y N
Leaf 1 Leaf 2 Leaf 3
w1=+2 w2=0.1 w3=-1
Number of leaves L2 norm of leaf scores
26. Revisit the Objectives
• Define the instance set in leaf j as
• Regroup the objective by each leaf
• This is sum of T independent quadratic functions
27. The Structure Score
• Two facts about single variable quadratic function
• Let us define
• Assume the structure of tree ( q(x) ) is fixed, the optimal
weight in each leaf, and the resulting objective value are
This measures how good a tree structure is!
28. The Structure Score Calculation
age < 15
is male?
Y N
Y N
Instance index
1
2
3
4
5
g1, h1
g2, h2
g3, h3
g4, h4
g5, h5
gradient statistics
The smaller the score is, the better the structure is
29. Searching Algorithm for Single Tree
• Enumerate the possible tree structures q
• Calculate the structure score for the q, using the scoring eq.
• Find the best tree structure, and use the optimal leaf weight
• But… there can be infinite possible tree structures..
30. Greedy Learning of the Tree
• In practice, we grow the tree greedily
Start from tree with depth 0
For each leaf node of the tree, try to add a split. The change of
objective after adding the split is
Remaining question: how do we find the best split?
the score of left child
the score of right child
the score of if we do not split
The complexity cost by
introducing additional leaf
31. Efficient Finding of the Best Split
• What is the gain of a split rule ? Say is age
• All we need is sum of g and h in each side, and calculate
• Left to right linear scan over sorted instance is enough to
decide the best split along the feature
g1, h1 g4, h4 g2, h2 g5, h5 g3, h3
a
32. An Algorithm for Split Finding
• For each node, enumerate over all features
For each feature, sorted the instances by feature value
Use a linear scan to decide the best split along that feature
Take the best split solution along all the features
• Time Complexity growing a tree of depth K
It is O(n d K log n): or each level, need O(n log n) time to sort
There are d features, and we need to do it for K level
This can be further optimized (e.g. use approximation or caching
the sorted features)
Can scale to very large dataset
33. What about Categorical Variables?
• Some tree learning algorithm handles categorical variable and
continuous variable separately
We can easily use the scoring formula we derived to score split
based on categorical variables.
• Actually it is not necessary to handle categorical separately.
We can encode the categorical variables into numerical vector
using one-hot encoding. Allocate a #categorical length vector
The vector will be sparse if there are lots of categories, the
learning algorithm is preferred to handle sparse data
34. Pruning and Regularization
• Recall the gain of split, it can be negative!
When the training loss reduction is smaller than regularization
Trade-off between simplicity and predictivness
• Pre-stopping
Stop split if the best split have negative gain
But maybe a split can benefit future splits..
• Post-Prunning
Grow a tree to maximum depth, recursively prune all the leaf
splits with negative gain
35. Recap: Boosted Tree Algorithm
• Add a new tree in each iteration
• Beginning of each iteration, calculate
• Use the statistics to greedily grow a tree
• Add to the model
Usually, instead we do
is called step-size or shrinkage, usually set around 0.1
This means we do not do full optimization in each step and
reserve chance for future rounds, it helps prevent overfitting
36. Outline
• Review of key concepts of supervised learning
• Regression Tree and Ensemble (What are we Learning)
• Gradient Boosting (How do we Learn)
• Summary
37. Questions to check if you really get it
• How can we build a boosted tree classifier to do weighted
regression problem, such that each instance have a
importance weight?
• Back to the time series problem, if I want to learn step
functions over time. Is there other ways to learn the time
splits, other than the top down split approach?
38. Questions to check if you really get it
• How can we build a boosted tree classifier to do weighted
regression problem, such that each instance have a
importance weight?
Define objective, calculate , feed it to the old tree learning
algorithm we have for un-weighted version
Again think of separation of model and objective, how does the
theory can help better organizing the machine learning toolkit
39. Questions to check if you really get it
• Time series problem
• All that is important is the structure score of the splits
Top-down greedy, same as trees
Bottom-up greedy, start from individual points as each group,
greedily merge neighbors
Dynamic programming, can find optimal solution for this case
40. Summary
• The separation between model, objective, parameters can be
helpful for us to understand and customize learning models
• The bias-variance trade-off applies everywhere, including
learning in functional space
• We can be formal about what we learn and how we learn.
Clear understanding of theory can be used to guide cleaner
implementation.
41. Reference
• Greedy function approximation a gradient boosting machine. J.H. Friedman
First paper about gradient boosting
• Stochastic Gradient Boosting. J.H. Friedman
Introducing bagging trick to gradient boosting
• Elements of Statistical Learning. T. Hastie, R. Tibshirani and J.H. Friedman
Contains a chapter about gradient boosted boosting
• Additive logistic regression a statistical view of boosting. J.H. Friedman T. Hastie R. Tibshirani
Uses second-order statistics for tree splitting, which is closer to the view presented in this slide
• Learning Nonlinear Functions Using Regularized Greedy Forest. R. Johnson and T. Zhang
Proposes to do fully corrective step, as well as regularizing the tree complexity. The regularizing trick
is closed related to the view present in this slide
• Software implementing the model described in this slide: https://github.com/tqchen/xgboost