Анализ данных просто и доступно - урок 1Igor Kleiner
Курс лекций: анализ данных просто и доступно
Лекция 1
Что такое данные?
Что такое анализ данных
Мотивация: Челленджер, еАрмони, Своя игра, Моней бол
примеры работы с Оранж
Анализ корона вируса
Анализ данных просто и доступно - урок 1Igor Kleiner
Курс лекций: анализ данных просто и доступно
Лекция 1
Что такое данные?
Что такое анализ данных
Мотивация: Челленджер, еАрмони, Своя игра, Моней бол
примеры работы с Оранж
Анализ корона вируса
This document compares several machine learning algorithms for a binary classification problem using a census dataset:
1. It builds logistic regression, decision tree, random forest, and boosted tree models on a 80% training set and evaluates their performance on a 10% test set.
2. Tuning is performed on decision tree and random forest models which improves their AUC.
3. The best performing models are boosted trees with an AUC of 0.922 and logistic regression with an AUC of 0.91, as evaluated on the held-out test set.
This document contains code for analyzing several datasets using regression and classification models in R. For the PISA dataset, it performs linear regression to predict reading scores and evaluates the model. For an automotive dataset, it builds linear and polynomial regression models to relate sales to economic factors. It also explores logistic regression models for healthcare and loan default datasets.
This document compares several machine learning algorithms for a binary classification problem using a census dataset:
1. It builds logistic regression, decision tree, random forest, and boosted tree models on a 80% training set and evaluates their performance on a 10% test set.
2. Tuning is performed on decision tree and random forest models which improves their AUC.
3. The best performing models are boosted trees with an AUC of 0.922 and logistic regression with an AUC of 0.91, as evaluated on the held-out test set.
This document contains code for analyzing several datasets using regression and classification models in R. For the PISA dataset, it performs linear regression to predict reading scores and evaluates the model. For an automotive dataset, it builds linear and polynomial regression models to relate sales to economic factors. It also explores logistic regression models for healthcare and loan default datasets.