SlideShare a Scribd company logo
1 of 37
BIG DATA FINAL
PRESENTATION
Boris Menshikov
Part One:DataAnalytics
CaseStudy:UsedCarAgency
1.Introduction:
1. Manager'sneedfor anonlinesystemtofacilitate workintheagency.
2. Currentuseofspreadsheetsfor datacollection.
3. Objective:PerformBusinessIntelligenceandDataAnalyticsusingR.
2.DataDescription:
1. Datasetcontaininginformation aboutusedcars.
2. FieldsincludeName,Model,Year,SellingPrice,KmDriven,FuelType,SellerType,
3. TransmissionType,Owner,Mileage,EngineCapacity,Horsepower,Torque,NumberofSeats,DealerName.
3.Project Requirements:
1. DataanalysisusingRprogramming andExcel.
4.Step1:StatisticalAnalyticsusingExcel:
1. CleandatasetandanalyzeCarAgencydatausingExcel.
2. Resultspresentedingraphsandtables.
5.Step2:Perspective AnalysisusingR:
1. CleandatasetandanalyzewithR usingggplot2package.
2. CompareresultswithExcelanalysis.
6.Step3:Predictive AnalyticsusingR:
1. Performpredictiveanalysis onCarBusinessusingLinearPredictionorLogistic Regression.
2. Compareresultswithprevioussteps.
7.Step4:PrepareaPower Presentation:
1. Createapresentationsummarizingfindings (10points).
8.AdditionalRequirements:
1. Cleaningandnormalizationofdata.
2. Detailedreportonanalysis.
3. ProvideRcodeandExcelfilesseparately.
9.AnswerQuestions usingR:
1. Identifypopularcarmodels,top-sellingdealers,averageprices,newest/oldestcars,etc.
10.Predict CarPrice:
1. Predictcarpricebasedonodometer.
11.Predict CarPrice withYear:
1. Predictcarpricebasedonodometerandyearofmanufacture.
ANNY IS THE MOST EFFICIENT AGENT
AMONGST OTHERS, DAVID AND HENRY ARE
FAR BEHIND. HENRY IS EXTREMELY
INEFFICIENT AGENT -> CONCLUSION: ANNY
DESERVES RAISE OR BONUS, HENRY CAN BE
FIRED OR SENT TO TRAINING. (AND
PROBABLY DAVID AS WELL)
PERSPECTIVE ANALYSIS
USING R
The code:
• VAST MAJORITY OF CARS ARE
EITHER ON DIESEL OR
PETROL, OTHER TYPES TAKE
UP SIGNIFICANTLY SMALLER
PORTION. PETROL CARS
GAINING MORE POPULARITY
EACH YEAR -> CONCLUSION:
ACQUIRE MORE CARS BASED
ON PETROL FUEL TO MATCH
ONGOING DEMAND.
TABLE 2:
The code:
• MOST POPULAR CAR IS
MARUTI ERTIGA VDI AND
MARTUGA BRAND IS HOLDS
SUBSTANTIAL PART IN TOP
10 CARS -> CONCLUSION:
PURCHASE MORE "MARUTI"
BRAND CARS TO MEET
DEMAND.
TABLE 3:
The code
• MANUAL CARS STOCK IS
MORE DEVELOPED THAN
AUTOMATIC CARS, CARS
WITH MORE ENGINE POWER
ARE MOSTLY MANUAL AND
MORE AFFORDABLE ->
CONCLUSION: PURCHASE
CARS WITH POWERFUL
ENGINE ONLY IN MANUAL
TRANSMISSION TYPE.
TABLE 4:
The code:
• BUSINESS MAKES ALMOST
EVERY SALE BY INDIVIDUAL
AND WITHOUT DEALER
SELLING -> CONCLUSION:
THE NEED OF SEVERAL
EMPLOYEES MUST BE
DOUBTED AND REVIEWED.
TABLE 5:
The code:
TABLE 6:
• MOST OF OUR CARS SOLD ARE
RANGING FROM LOW TO MIDDLE
CLASS CARS ACCORDING TO THEIR
PRICE WITH SOME SMALL NUMBER
OF HIGH-END VEHICLES WHICH
SPIKES THE GRAPH ->
CONCLUSION: FOCUS ON LOW-
MID CLASS TYPE OF CARS.
The code:
FOR OUR FIRST 3 OBJECTIVES WE
HAVE ONLY 2 VARIABLES IN THE
MODEL (Y, X). WE USE THE R
CLASS LM TO CREATE A LINEAR
MODEL THAT FITS ITSELF
THROUGH THE GRADIENT
DESCENT ALGORITHM BASED OF
THE ERROR FUNCTION.
WE PUT AS PREDICTION VARIABLE (Y) OUR
TARGET PREDICTION FEATURES (N_SALES,
AVG_SALE, TOT_SALES) AND AS X THE
TARGET YEAR.
Here we can see the p-value and the R-squared error of
our model, in relation to the final chosen coefficients
(weights of our variables in the equation).
We can visualize our model by plotting it into a graph.
Predict Analytics:
WE CAN SEE THAT OUR
MODEL APPEARS TO BE
UNDERFITTING, AS THE
DATA INDICATES A CURVE
TREND OVER TIME, BUT WE
ARE NOT USING A
POLYNOMIAL MODEL.
THERE IS A MODEL WHERE
WE CAN EVALUATE THE
NUMBER OF FIRSTOWNER
CAR SALES TO BE MADE
ACCORDING TO YEAR AND
AVERAGE YEARLY SALE
AMOUNT. LET’S PLOT THE
CORRELATION BETWEEN
THE 3 VARAIBLES:
• OUR MODEL IS CLEARLY A
CONTINUOUS PLANE GOING
THROUGH OUR 3-
DIMENSIONAL SPACE.
NEXT STEP IS TO
CREATE THE MODEL
AND PLOT THE
RESULT.
The way of building the model is always the same, per
each of them: choose the target variable and the X
features. It is possible to include all features we want to
predict, from 1 to n. Our model formula is
PREDICTS ANALYTICS USING R
The data table:
HERE WE CAN
SEE THE
YEARLY CARS
STATS
IN THIS PART
WE IMPORT THE
FILE INTO R
STUDIO AND
SAVE IT AS A
VARIABLE TO
READ AS A .CSV
THEN VIEW IS
USED TO VIEW
THE DATA IN
TABULAR
FORMAT
GOING FURTHER IS TO
CONVERT THE FILE
FORMAT TO
DATAFRAME, BECAUSE
WE WILL HAVE MORE
POSSIBILITIES AND IT IS
EASIER TO MANIPULATE
THE DATA. NEXT, WE
ALSO VIEW THE RESULT
IN TABULAR FORM
FOR EACH
EXERCISE, THE
LIBRARY DPLYR
WILL BE USED,
BECAUSE OF ITS
GREAT
FUNCTIONALITY
IN HANDLING
THE DATA.
THE FIRST EXERCISE
HERE WE CHOOSE A FILE WITH WHICH I
WILL WORK, THEN COUNT THE NUMBER
OF EACH MODEL AND DISPLAY THE MOST
POPULAR DUE TO THE CUT BY 1, AND
SORT FROM LARGEST TO SMALLEST.
THE SECOND EXERCISE
WE SELECT OUR DATAFRAME, GROUP THE DATA BY DEALER AND COUNT THE
NUMBER OF MODELS OF EACH MODEL AND SELECT SLICE TWO TO GET DATA
THAT DOES NOT RELATE TO INDIVIDUAL SALES
THE THIRD EXERCISE
AT THIS POINT WE DECIDED TO DISPLAY TWO COLUMNS,
THE MODEL OF THE CAR AS WELL AS ITS AVERAGE PRICE,
BY MEANS OF THE FUNCTION AGGREGATE.
THE FORTH EXERCISE
WE FIND THE MINIMUM YEAR AND THEN USING THE FILTER FUNCTION FIND A LINE
WHERE THE YEAR COINCIDES WITH THE MINIMUM YEAR AND EQUATE THE CUT TO
1 TO GET ONLY ONE LINE, WITH ONE MODEL, THIS WILL HELP US IF WE HAVE A
LOT OF CARS WITH THE SAME YEAR, SUCH AS A LITTLE FURTHER WHERE WE USE
EXACTLY THE SAME CODE BUT WE ARE LOOKING FOR THE MAXIMUM YEAR.
BECAUSE THERE ARE A LOT OF 2020 CARS AND WE ONLY WANT TO GET 1 RESULT,
WE MAKE THE SLICE EQUAL TO ONE
THE FIFTH EXERCISE
HERE WE DECIDED TO GIVE TWO POSSIBLE SOLUTIONS TO THIS PROBLEM. WHERE
THE MINIMUM PRICE CORRESPONDS TO THE MILEAGE AND VICE VERSA. THE
MINIMUM ELEMENTS I FOUND WITH THE FUNCTION MIN, AND THEN SUBSTITUTED
THEM IN THE FILTER TO FIND THE NECESSARY ROWS.
THE SIXTH EXERCISE
HERE WE FIND THE AMOUNT EARNED BY THE DEALER WITH
THE NAME INDIVIDUAL DUE TO THE AGGREGATE
FUNCTION AND SPECIFYING THE PRICE AS THE ITEM OVER
WHICH THE SUM FUNCTION WILL BE CONDUCTED AND THE
DEALER AS AN UNDERSTANDING OF WHO EARNED WHAT
AMOUNT.
THE SEVENTH EXERCISE
WE FIND THE SUBSET FOR EACH OF THE DEALERS EXCEPT THE INDIVIDUAL AND
COUNT THE NUMBER OF TRANSMISSIONS FOR EACH TO SEE WHAT TYPE OF CAR
EACH DEALER HAS THE MOST POPULAR
LET’S
MOVE TO
THE
NEXT
TASK
THE MAIN GOAL IS TO PREDICT THE ANY CAR PRICE
BASED ON THE ODOMETER + PREDICT THE ANY
CAR PRICE BASED ON THE ODOMETER AND
YEAR OF MANUFACTURE
FIRST WE CREATE A SUBSET (TABLE) WITH
ONLY THE VALUE WITH THE NAME OF A
PARTICULAR MACHINE (MARUTI SWIFT VDI)
THEN WE GET A SUMMARY OF A FORMULA
A + B * N
AND THEN WE GET THE RELATIONSHIP
RESULT
THEN WE USE THE FUNCTION LM TO DETERMINE THE
RELATIONSHIP BETWEEN THE PRICE AND THE
ODOMETER. OUTPUT THE RESULT AND SEE THE
"ESTIMATE" VALUE. USE THE FORMULA A + B * N TO
DISPLAY THE RESULT.
THIS RESULT IS AN
APPROXIMATE
CALCULATION OF HOW
MUCH THE CAR WILL
COST BASED ON THE
VALUE N(ANY) WE SET.
IN THE NEXT PROBLEM
WE DO THE SAME BUT
ADD THE SECOND
ARGUMENT TO THE
FORMULA A + B*N +
C*N1.
The summary
THE FINAL
RESULT
THANK YOU FOR YOUR
ATTENTION!

More Related Content

Similar to BigData_HW3_Boris_Menshikov_15613416.pptx

Airline Revenue Case Study _200516_Final_Slideshare
Airline Revenue Case Study _200516_Final_SlideshareAirline Revenue Case Study _200516_Final_Slideshare
Airline Revenue Case Study _200516_Final_SlideshareFrank Alfieri
 
Airline Revenue - Case Study and Industry Analysis
Airline Revenue - Case Study and Industry AnalysisAirline Revenue - Case Study and Industry Analysis
Airline Revenue - Case Study and Industry AnalysisFrank A.
 
Benchmarking the Turkish apparel retail industry through data envelopment ana...
Benchmarking the Turkish apparel retail industry through data envelopment ana...Benchmarking the Turkish apparel retail industry through data envelopment ana...
Benchmarking the Turkish apparel retail industry through data envelopment ana...Gurdal Ertek
 
IRJET- Automobile Resale System using Machine Learning
IRJET- Automobile Resale System using Machine LearningIRJET- Automobile Resale System using Machine Learning
IRJET- Automobile Resale System using Machine LearningIRJET Journal
 
Prediction of Car Price using Linear Regression
Prediction of Car Price using Linear RegressionPrediction of Car Price using Linear Regression
Prediction of Car Price using Linear Regressionijtsrd
 
Descriptive Analysis.docx
Descriptive Analysis.docxDescriptive Analysis.docx
Descriptive Analysis.docxPranathi89
 
Benchmarking The Turkish Apparel Retail Industry Through Data Envelopment Ana...
Benchmarking The Turkish Apparel Retail Industry Through Data Envelopment Ana...Benchmarking The Turkish Apparel Retail Industry Through Data Envelopment Ana...
Benchmarking The Turkish Apparel Retail Industry Through Data Envelopment Ana...ertekg
 
Multiple Linear Regression Applications Automobile Pricing
Multiple Linear Regression Applications Automobile PricingMultiple Linear Regression Applications Automobile Pricing
Multiple Linear Regression Applications Automobile Pricinginventionjournals
 
developing-disruptive-business-strategies-with-simulation.pdf
developing-disruptive-business-strategies-with-simulation.pdfdeveloping-disruptive-business-strategies-with-simulation.pdf
developing-disruptive-business-strategies-with-simulation.pdfalwishariff
 
Senior Capstone Business 27Case 2 MotomartINTRODUCTI.docx
Senior Capstone Business 27Case 2 MotomartINTRODUCTI.docxSenior Capstone Business 27Case 2 MotomartINTRODUCTI.docx
Senior Capstone Business 27Case 2 MotomartINTRODUCTI.docxlesleyryder69361
 
Level of-detail-expressions
Level of-detail-expressionsLevel of-detail-expressions
Level of-detail-expressionsYogeeswar Reddy
 
IRJET- A Comprehensive way of finding Top-K Competitors using C-Miner Algorithm
IRJET- A Comprehensive way of finding Top-K Competitors using C-Miner AlgorithmIRJET- A Comprehensive way of finding Top-K Competitors using C-Miner Algorithm
IRJET- A Comprehensive way of finding Top-K Competitors using C-Miner AlgorithmIRJET Journal
 
Software EngineeringBackground for Question 1-7 Kean Universi.docx
Software EngineeringBackground for Question 1-7 Kean Universi.docxSoftware EngineeringBackground for Question 1-7 Kean Universi.docx
Software EngineeringBackground for Question 1-7 Kean Universi.docxwhitneyleman54422
 
Applying data science to sales pipelines -- for fun and profit
Applying data science to sales pipelines -- for fun and profitApplying data science to sales pipelines -- for fun and profit
Applying data science to sales pipelines -- for fun and profitAndy Twigg
 
Applying Data Science - for Fun and Profit
Applying Data Science - for Fun and ProfitApplying Data Science - for Fun and Profit
Applying Data Science - for Fun and ProfitC9 Inc
 

Similar to BigData_HW3_Boris_Menshikov_15613416.pptx (20)

Airline Revenue Case Study _200516_Final_Slideshare
Airline Revenue Case Study _200516_Final_SlideshareAirline Revenue Case Study _200516_Final_Slideshare
Airline Revenue Case Study _200516_Final_Slideshare
 
Airline Revenue - Case Study and Industry Analysis
Airline Revenue - Case Study and Industry AnalysisAirline Revenue - Case Study and Industry Analysis
Airline Revenue - Case Study and Industry Analysis
 
Benchmarking the Turkish apparel retail industry through data envelopment ana...
Benchmarking the Turkish apparel retail industry through data envelopment ana...Benchmarking the Turkish apparel retail industry through data envelopment ana...
Benchmarking the Turkish apparel retail industry through data envelopment ana...
 
IRJET- Automobile Resale System using Machine Learning
IRJET- Automobile Resale System using Machine LearningIRJET- Automobile Resale System using Machine Learning
IRJET- Automobile Resale System using Machine Learning
 
Prediction of Car Price using Linear Regression
Prediction of Car Price using Linear RegressionPrediction of Car Price using Linear Regression
Prediction of Car Price using Linear Regression
 
Descriptive Analysis.docx
Descriptive Analysis.docxDescriptive Analysis.docx
Descriptive Analysis.docx
 
Benchmarking The Turkish Apparel Retail Industry Through Data Envelopment Ana...
Benchmarking The Turkish Apparel Retail Industry Through Data Envelopment Ana...Benchmarking The Turkish Apparel Retail Industry Through Data Envelopment Ana...
Benchmarking The Turkish Apparel Retail Industry Through Data Envelopment Ana...
 
Multiple Linear Regression Applications Automobile Pricing
Multiple Linear Regression Applications Automobile PricingMultiple Linear Regression Applications Automobile Pricing
Multiple Linear Regression Applications Automobile Pricing
 
Weka_10BM60025_VGSOM
Weka_10BM60025_VGSOMWeka_10BM60025_VGSOM
Weka_10BM60025_VGSOM
 
developing-disruptive-business-strategies-with-simulation.pdf
developing-disruptive-business-strategies-with-simulation.pdfdeveloping-disruptive-business-strategies-with-simulation.pdf
developing-disruptive-business-strategies-with-simulation.pdf
 
Senior Capstone Business 27Case 2 MotomartINTRODUCTI.docx
Senior Capstone Business 27Case 2 MotomartINTRODUCTI.docxSenior Capstone Business 27Case 2 MotomartINTRODUCTI.docx
Senior Capstone Business 27Case 2 MotomartINTRODUCTI.docx
 
Level of-detail-expressions
Level of-detail-expressionsLevel of-detail-expressions
Level of-detail-expressions
 
Chapter 04
Chapter 04 Chapter 04
Chapter 04
 
1 chapter 04
1 chapter 041 chapter 04
1 chapter 04
 
Sourcebook 2017
Sourcebook 2017Sourcebook 2017
Sourcebook 2017
 
IRJET- A Comprehensive way of finding Top-K Competitors using C-Miner Algorithm
IRJET- A Comprehensive way of finding Top-K Competitors using C-Miner AlgorithmIRJET- A Comprehensive way of finding Top-K Competitors using C-Miner Algorithm
IRJET- A Comprehensive way of finding Top-K Competitors using C-Miner Algorithm
 
RAES-AAMGA-Mar14
RAES-AAMGA-Mar14RAES-AAMGA-Mar14
RAES-AAMGA-Mar14
 
Software EngineeringBackground for Question 1-7 Kean Universi.docx
Software EngineeringBackground for Question 1-7 Kean Universi.docxSoftware EngineeringBackground for Question 1-7 Kean Universi.docx
Software EngineeringBackground for Question 1-7 Kean Universi.docx
 
Applying data science to sales pipelines -- for fun and profit
Applying data science to sales pipelines -- for fun and profitApplying data science to sales pipelines -- for fun and profit
Applying data science to sales pipelines -- for fun and profit
 
Applying Data Science - for Fun and Profit
Applying Data Science - for Fun and ProfitApplying Data Science - for Fun and Profit
Applying Data Science - for Fun and Profit
 

Recently uploaded

Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制vexqp
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制vexqp
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schscnajjemba
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........EfruzAsilolu
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATIONLakpaYanziSherpa
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制vexqp
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss ConfederationEfruzAsilolu
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 

Recently uploaded (20)

Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 

BigData_HW3_Boris_Menshikov_15613416.pptx

  • 2. Part One:DataAnalytics CaseStudy:UsedCarAgency 1.Introduction: 1. Manager'sneedfor anonlinesystemtofacilitate workintheagency. 2. Currentuseofspreadsheetsfor datacollection. 3. Objective:PerformBusinessIntelligenceandDataAnalyticsusingR. 2.DataDescription: 1. Datasetcontaininginformation aboutusedcars. 2. FieldsincludeName,Model,Year,SellingPrice,KmDriven,FuelType,SellerType, 3. TransmissionType,Owner,Mileage,EngineCapacity,Horsepower,Torque,NumberofSeats,DealerName. 3.Project Requirements: 1. DataanalysisusingRprogramming andExcel. 4.Step1:StatisticalAnalyticsusingExcel: 1. CleandatasetandanalyzeCarAgencydatausingExcel. 2. Resultspresentedingraphsandtables. 5.Step2:Perspective AnalysisusingR: 1. CleandatasetandanalyzewithR usingggplot2package. 2. CompareresultswithExcelanalysis. 6.Step3:Predictive AnalyticsusingR: 1. Performpredictiveanalysis onCarBusinessusingLinearPredictionorLogistic Regression. 2. Compareresultswithprevioussteps. 7.Step4:PrepareaPower Presentation: 1. Createapresentationsummarizingfindings (10points). 8.AdditionalRequirements: 1. Cleaningandnormalizationofdata. 2. Detailedreportonanalysis. 3. ProvideRcodeandExcelfilesseparately. 9.AnswerQuestions usingR: 1. Identifypopularcarmodels,top-sellingdealers,averageprices,newest/oldestcars,etc. 10.Predict CarPrice: 1. Predictcarpricebasedonodometer. 11.Predict CarPrice withYear: 1. Predictcarpricebasedonodometerandyearofmanufacture.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7. ANNY IS THE MOST EFFICIENT AGENT AMONGST OTHERS, DAVID AND HENRY ARE FAR BEHIND. HENRY IS EXTREMELY INEFFICIENT AGENT -> CONCLUSION: ANNY DESERVES RAISE OR BONUS, HENRY CAN BE FIRED OR SENT TO TRAINING. (AND PROBABLY DAVID AS WELL) PERSPECTIVE ANALYSIS USING R The code:
  • 8. • VAST MAJORITY OF CARS ARE EITHER ON DIESEL OR PETROL, OTHER TYPES TAKE UP SIGNIFICANTLY SMALLER PORTION. PETROL CARS GAINING MORE POPULARITY EACH YEAR -> CONCLUSION: ACQUIRE MORE CARS BASED ON PETROL FUEL TO MATCH ONGOING DEMAND. TABLE 2: The code:
  • 9. • MOST POPULAR CAR IS MARUTI ERTIGA VDI AND MARTUGA BRAND IS HOLDS SUBSTANTIAL PART IN TOP 10 CARS -> CONCLUSION: PURCHASE MORE "MARUTI" BRAND CARS TO MEET DEMAND. TABLE 3: The code
  • 10. • MANUAL CARS STOCK IS MORE DEVELOPED THAN AUTOMATIC CARS, CARS WITH MORE ENGINE POWER ARE MOSTLY MANUAL AND MORE AFFORDABLE -> CONCLUSION: PURCHASE CARS WITH POWERFUL ENGINE ONLY IN MANUAL TRANSMISSION TYPE. TABLE 4: The code:
  • 11. • BUSINESS MAKES ALMOST EVERY SALE BY INDIVIDUAL AND WITHOUT DEALER SELLING -> CONCLUSION: THE NEED OF SEVERAL EMPLOYEES MUST BE DOUBTED AND REVIEWED. TABLE 5: The code:
  • 12. TABLE 6: • MOST OF OUR CARS SOLD ARE RANGING FROM LOW TO MIDDLE CLASS CARS ACCORDING TO THEIR PRICE WITH SOME SMALL NUMBER OF HIGH-END VEHICLES WHICH SPIKES THE GRAPH -> CONCLUSION: FOCUS ON LOW- MID CLASS TYPE OF CARS. The code:
  • 13. FOR OUR FIRST 3 OBJECTIVES WE HAVE ONLY 2 VARIABLES IN THE MODEL (Y, X). WE USE THE R CLASS LM TO CREATE A LINEAR MODEL THAT FITS ITSELF THROUGH THE GRADIENT DESCENT ALGORITHM BASED OF THE ERROR FUNCTION. WE PUT AS PREDICTION VARIABLE (Y) OUR TARGET PREDICTION FEATURES (N_SALES, AVG_SALE, TOT_SALES) AND AS X THE TARGET YEAR. Here we can see the p-value and the R-squared error of our model, in relation to the final chosen coefficients (weights of our variables in the equation). We can visualize our model by plotting it into a graph. Predict Analytics:
  • 14. WE CAN SEE THAT OUR MODEL APPEARS TO BE UNDERFITTING, AS THE DATA INDICATES A CURVE TREND OVER TIME, BUT WE ARE NOT USING A POLYNOMIAL MODEL.
  • 15. THERE IS A MODEL WHERE WE CAN EVALUATE THE NUMBER OF FIRSTOWNER CAR SALES TO BE MADE ACCORDING TO YEAR AND AVERAGE YEARLY SALE AMOUNT. LET’S PLOT THE CORRELATION BETWEEN THE 3 VARAIBLES:
  • 16. • OUR MODEL IS CLEARLY A CONTINUOUS PLANE GOING THROUGH OUR 3- DIMENSIONAL SPACE. NEXT STEP IS TO CREATE THE MODEL AND PLOT THE RESULT. The way of building the model is always the same, per each of them: choose the target variable and the X features. It is possible to include all features we want to predict, from 1 to n. Our model formula is
  • 17. PREDICTS ANALYTICS USING R The data table:
  • 18.
  • 19. HERE WE CAN SEE THE YEARLY CARS STATS
  • 20. IN THIS PART WE IMPORT THE FILE INTO R STUDIO AND SAVE IT AS A VARIABLE TO READ AS A .CSV THEN VIEW IS USED TO VIEW THE DATA IN TABULAR FORMAT
  • 21. GOING FURTHER IS TO CONVERT THE FILE FORMAT TO DATAFRAME, BECAUSE WE WILL HAVE MORE POSSIBILITIES AND IT IS EASIER TO MANIPULATE THE DATA. NEXT, WE ALSO VIEW THE RESULT IN TABULAR FORM
  • 22. FOR EACH EXERCISE, THE LIBRARY DPLYR WILL BE USED, BECAUSE OF ITS GREAT FUNCTIONALITY IN HANDLING THE DATA.
  • 23. THE FIRST EXERCISE HERE WE CHOOSE A FILE WITH WHICH I WILL WORK, THEN COUNT THE NUMBER OF EACH MODEL AND DISPLAY THE MOST POPULAR DUE TO THE CUT BY 1, AND SORT FROM LARGEST TO SMALLEST.
  • 24. THE SECOND EXERCISE WE SELECT OUR DATAFRAME, GROUP THE DATA BY DEALER AND COUNT THE NUMBER OF MODELS OF EACH MODEL AND SELECT SLICE TWO TO GET DATA THAT DOES NOT RELATE TO INDIVIDUAL SALES
  • 25. THE THIRD EXERCISE AT THIS POINT WE DECIDED TO DISPLAY TWO COLUMNS, THE MODEL OF THE CAR AS WELL AS ITS AVERAGE PRICE, BY MEANS OF THE FUNCTION AGGREGATE.
  • 26. THE FORTH EXERCISE WE FIND THE MINIMUM YEAR AND THEN USING THE FILTER FUNCTION FIND A LINE WHERE THE YEAR COINCIDES WITH THE MINIMUM YEAR AND EQUATE THE CUT TO 1 TO GET ONLY ONE LINE, WITH ONE MODEL, THIS WILL HELP US IF WE HAVE A LOT OF CARS WITH THE SAME YEAR, SUCH AS A LITTLE FURTHER WHERE WE USE EXACTLY THE SAME CODE BUT WE ARE LOOKING FOR THE MAXIMUM YEAR. BECAUSE THERE ARE A LOT OF 2020 CARS AND WE ONLY WANT TO GET 1 RESULT, WE MAKE THE SLICE EQUAL TO ONE
  • 27. THE FIFTH EXERCISE HERE WE DECIDED TO GIVE TWO POSSIBLE SOLUTIONS TO THIS PROBLEM. WHERE THE MINIMUM PRICE CORRESPONDS TO THE MILEAGE AND VICE VERSA. THE MINIMUM ELEMENTS I FOUND WITH THE FUNCTION MIN, AND THEN SUBSTITUTED THEM IN THE FILTER TO FIND THE NECESSARY ROWS.
  • 28. THE SIXTH EXERCISE HERE WE FIND THE AMOUNT EARNED BY THE DEALER WITH THE NAME INDIVIDUAL DUE TO THE AGGREGATE FUNCTION AND SPECIFYING THE PRICE AS THE ITEM OVER WHICH THE SUM FUNCTION WILL BE CONDUCTED AND THE DEALER AS AN UNDERSTANDING OF WHO EARNED WHAT AMOUNT.
  • 29. THE SEVENTH EXERCISE WE FIND THE SUBSET FOR EACH OF THE DEALERS EXCEPT THE INDIVIDUAL AND COUNT THE NUMBER OF TRANSMISSIONS FOR EACH TO SEE WHAT TYPE OF CAR EACH DEALER HAS THE MOST POPULAR
  • 30. LET’S MOVE TO THE NEXT TASK THE MAIN GOAL IS TO PREDICT THE ANY CAR PRICE BASED ON THE ODOMETER + PREDICT THE ANY CAR PRICE BASED ON THE ODOMETER AND YEAR OF MANUFACTURE
  • 31. FIRST WE CREATE A SUBSET (TABLE) WITH ONLY THE VALUE WITH THE NAME OF A PARTICULAR MACHINE (MARUTI SWIFT VDI)
  • 32. THEN WE GET A SUMMARY OF A FORMULA A + B * N
  • 33. AND THEN WE GET THE RELATIONSHIP RESULT
  • 34. THEN WE USE THE FUNCTION LM TO DETERMINE THE RELATIONSHIP BETWEEN THE PRICE AND THE ODOMETER. OUTPUT THE RESULT AND SEE THE "ESTIMATE" VALUE. USE THE FORMULA A + B * N TO DISPLAY THE RESULT.
  • 35. THIS RESULT IS AN APPROXIMATE CALCULATION OF HOW MUCH THE CAR WILL COST BASED ON THE VALUE N(ANY) WE SET. IN THE NEXT PROBLEM WE DO THE SAME BUT ADD THE SECOND ARGUMENT TO THE FORMULA A + B*N + C*N1. The summary
  • 37. THANK YOU FOR YOUR ATTENTION!