528 presentation-26 feb

BayanAlghuraybi
Krishna Marvaniya
Guojun Xia
Advisor : Prof JongwookWoo
24th Annual Student Symposium
on Research, Scholarship and Creative Activity
Analyze NYCTaxi Data using Hive
and Machine Learning
Friday, February 26, 2016
California State University, Los Angeles

Table of Content
1. Hadoop and Microsoft
2. Our Road Map
3. Demonstration
• Ingest data
• HDInsight Query console
• Visualize Data
• Predictive Analytics

1.Hadoop and Microsoft
• Big Data Not Only volume
• Improve analytic and Statics
• Extract BusinessValue
• Efficient Architecture
Hadoop

2.Road Map
Deploy ModelBuild ModelExplore sample DataLoad Data
Azure Machine LearningHive & Power BI

3. Demonstration : Ingest data
■ Data set source (URL): http://chriswhong.com/open-data/foil_nyc_taxi/
18.7 G
■ Specifiction of experimental equipment:
Number Of Nodes : 4 (Worker node:A7 & Head Node:A3)
Memory Size: 605GB(Worker node) & 285GB(Head node)
CPU/core speed: 56GB(Worker node) & 7GB(Head node)
■ Gitlab/Github information:
■ https://gitlab.com/kmarvan/Analysis.git
■ https://github.com/kmarvan/Analysis-on-NYC-taxi-Data-using-Hive-and-
Machine-Learning

Algorithms for Microsoft Azure Machine
Learning
A.Two-Class Logistic Regression
Two-Class Logistic Regression module is to create a logistic regression model that can be used to
predict one of two states of the target variable. Logistic regression is a well-known statistical
technique that is used for modeling many kinds of outcomes.
B. Boosted DecisionTree Regression
Boosted DecisionTree Regression module is to create an ensemble of regression trees using
boosting. Boosting means that each tree is dependent on prior trees, and boosting in a decision
tree ensemble tends to improve accuracy with some small risk of less coverage.

Evaluating a Multiclass Classification Model:
Creating the Experiment Reader module

References
■ MicrosoftAzure Machine Learning, https://studio.azureml.net
■ Process Data with HIVE, http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-
apache-hive/
■ How to use HDInsight to create cluster, https://azure.microsoft.com/en-
us/documentation/articles/hdinsight-hadoop-tutorial-get-started-windows/

528 presentation-26 feb

Recommended

Recommended

More Related Content

Similar to 528 presentation-26 feb

Similar to 528 presentation-26 feb (20)

Recently uploaded

Recently uploaded (20)

528 presentation-26 feb