This document summarizes a student's research project on using big data technologies and linear regression to analyze second-hand vehicle sales data from Craig's List. The student cleaned and preprocessed the data, loaded it into databases, and performed analysis using Pig, Hive, and MapReduce. Regression analysis was then used to build a model to predict used car prices based on various attributes in the data.
These slides address the issue of predicting the reselling price of cars based on ads extracted from popular websites for reselling cars. To obtain the most accurate predictions, we have used two machine learning algorithms (multiple linear regression and random forest) to build multiple models to reflect the importance of different combinations of features in the final price of the cars. The predictions are generated based on the models trained on the ads extracted from such sites. The developed system provides the user with an interface that allows navigation through ads to assess the fairness of prices compared to the predicted ones
The automotive industry is changing rapidly, leading some in the field to a conclusion that soon cars may become the most technologically advanced products that consumers will ever buy.
The transition from a primarily mechanical-based to a software-based industry leads some to claim that car manufacturing is focusing less on transportation but rather on technology.
Since those changes will sooner or later affect most of the world’s population, it is fascinating to take a closer look as to what is currently happening in the automotive industry.
The report presents the car industry based on the Customer Value Canvas and will highlight innovation within the core product, added-value services, customer/shopping experience and brand communication, and as they are all important in customer’s overall impression.
Prediction of Car Price using Linear Regressionijtsrd
In this paper, we look at how supervised machine learning techniques can be used to forecast car prices in India. Data from the online marketplace quikr was used to make the predictions. The predictions were made using a variety of methods, including multiple linear regression analysis, Random forest regressor and Randomized search CV. The predictions are then analyzed and compared to determine which ones provide the best results. A seemingly simple problem turned out to be extremely difficult to solve accurately. All of the strategies yielded similar results. In the future, we want to make predictions using more advanced technologies. Ravi Shastri | Dr. A Rengarajan "Prediction of Car Price using Linear Regression" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-4 , June 2021, URL: https://www.ijtsrd.compapers/ijtsrd42421.pdf Paper URL: https://www.ijtsrd.comcomputer-science/data-processing/42421/prediction-of-car-price-using-linear-regression/ravi-shastri
Project report on IR based vehicle with automatic braking and driver awakenin...Neeraj Khatri
IR Based Vehicle With AUTOMATIC BRAKING and DRIVER AWAKENING System (2014)
In this Project When any obstacle comes in front of the vehicle in a fixed range then the automatic brakes are applied and the vehicle sounds an alarm and also a very small current is produced on the steering of the vehicle. If the vehicle met with an accident then a call is generated on the last dialled number from the phone which is placed in the vehicle.
These slides address the issue of predicting the reselling price of cars based on ads extracted from popular websites for reselling cars. To obtain the most accurate predictions, we have used two machine learning algorithms (multiple linear regression and random forest) to build multiple models to reflect the importance of different combinations of features in the final price of the cars. The predictions are generated based on the models trained on the ads extracted from such sites. The developed system provides the user with an interface that allows navigation through ads to assess the fairness of prices compared to the predicted ones
The automotive industry is changing rapidly, leading some in the field to a conclusion that soon cars may become the most technologically advanced products that consumers will ever buy.
The transition from a primarily mechanical-based to a software-based industry leads some to claim that car manufacturing is focusing less on transportation but rather on technology.
Since those changes will sooner or later affect most of the world’s population, it is fascinating to take a closer look as to what is currently happening in the automotive industry.
The report presents the car industry based on the Customer Value Canvas and will highlight innovation within the core product, added-value services, customer/shopping experience and brand communication, and as they are all important in customer’s overall impression.
Prediction of Car Price using Linear Regressionijtsrd
In this paper, we look at how supervised machine learning techniques can be used to forecast car prices in India. Data from the online marketplace quikr was used to make the predictions. The predictions were made using a variety of methods, including multiple linear regression analysis, Random forest regressor and Randomized search CV. The predictions are then analyzed and compared to determine which ones provide the best results. A seemingly simple problem turned out to be extremely difficult to solve accurately. All of the strategies yielded similar results. In the future, we want to make predictions using more advanced technologies. Ravi Shastri | Dr. A Rengarajan "Prediction of Car Price using Linear Regression" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-4 , June 2021, URL: https://www.ijtsrd.compapers/ijtsrd42421.pdf Paper URL: https://www.ijtsrd.comcomputer-science/data-processing/42421/prediction-of-car-price-using-linear-regression/ravi-shastri
Project report on IR based vehicle with automatic braking and driver awakenin...Neeraj Khatri
IR Based Vehicle With AUTOMATIC BRAKING and DRIVER AWAKENING System (2014)
In this Project When any obstacle comes in front of the vehicle in a fixed range then the automatic brakes are applied and the vehicle sounds an alarm and also a very small current is produced on the steering of the vehicle. If the vehicle met with an accident then a call is generated on the last dialled number from the phone which is placed in the vehicle.
The Basic Idea Behind “Smart Web Cam Motion Detection Surveillance System” Is To Stop The Intruder To Getting Into The Place Where A High End Security Is Required. This Paper Proposes A Method For Detecting The Motion Of A Particular Object Being Observed. The Motion Tracking Surveillance Has Gained A Lot Of Interests Over Past Few Years. This System Is Brought Into Effect Providing Relief To The Normal Video Surveillance System Which Offers Time-Consuming Reviewing Process. Through The Study And Evaluation Of Products, We Propose A Motion Tracking Surveillance System Consisting Of Its Method For Motion Detection And Its Own Graphic User Interface.
Artificial Intelligence In The Automotive Industry - M&A Trend AnalysisNetscribes
Artificial Intelligence (AI) is redefining the automotive industry. Organizations in the automotive industry are realizing the need to leverage advanced algorithms and computational structures, innovative testing and validation platforms, integrated cockpit solutions, and 5G network adoption and application deployment for building their next generation mobility services. As a result, mergers and acquisitions focused on acquiring AI capabilities is on the rise in auto sector.
The report provides a detailed analysis of more than 60 AI-focused deals in the auto sector over the last 10 years. Understand the specific AI technologies and capabilities that are high in demand, deal sizes, and the strategies driving those partnerships.
To purchase the full report, write to us at info@netscribes.com
IoT in Automobile industry by Shri Kaushal Jani, Project Head, Amiraj College of Engg & Technology, Ahmedabad
presented at All India Seminar on #IoT - Trends that affect Lives at The Institution of Engineers (I) Gujarat State Center
Tesla Autopilot is an advanced driver-assistance system feature offered by Tesla .That has lane centring, adaptive cruise control, self-parking, ability to automatically change lanes without requiring driver steering, and enables the car to be summoned to and from a garage or parking spot.
Many resources discuss machine learning and data analytics from a technology deployment perspective. From the business standpoint, however, the real value of analytics is in the methodology for solving some systemic holistic problems, rather than a specific technology or platform.
In this presentation, the focus is shifted from the technology deployment to the analytics methodology for solving some holistic business problems. Two examples will be covered in detail:
(i) Analysis of the performance and the optimal staffing of a team of doctors, nurses, and technicians for a large local hospital unit using discrete event simulation with a live demonstration. This simulation methodology is not included in most Machine Learning algorithms libraries.
(ii) Identifying a few factors (or variables) that contribute most to the financial outcome of a local hospital using principal component decomposition (PCD) of the large observational dataset of population demographic and disease prevalence.
Wine Quality Analysis Using Machine LearningMahima -
Wine industries use Product Quality Certification to promote their products and become a concern for every individual who consumes any product. But it's not possible to ensure wine quality by experts with such a huge demand for the product as it will increase the cost. It allows building a model using machine learning techniques with a user interface which predicts the quality of the wine by selecting the important parameters.
Automobile Industry in India, ATL & BTL Marketing spend, Luxury Car Market, Mercedes-Benz, BMW & Audi India analysis, Key Opportunities for marketers and Indian Consumer in 2015, Passenger car market, Commercial vehicle market, women as a consumer, auto component industry in India,
INVENTORY MANAGEMENT CASE STUDY :Maruti udyog limited(mul) ii ravneetubs
INVENTORY MANAGEMENT CASE STUDY :Maruti udyog limited(mul)
The raw material consumption cost was 75-83% of net sales. Therefore the company has been paying special attention to its inventory management.
Around 70% of the firm’s components are outsourced.
Average inventory turnover ratio of the company increased from 11.9 (in 2005-06) to 13.9 (in 2006-07).
The company has undertaken various initiatives to improve it’s inventory management because of which there has been an increase in its average inventory turnover ratio over the years.
Artificial Intelligence In Automotive Industry: Surprisingly Slow Uptake And ...Bernard Marr
Artificial intelligence has been a hot topic in the automotive industry for years, and we have seen rapid advances in things like autonomous driving. It, therefore, comes as a bit of a surprise that new research shows a slower than expected AI adoption rate. Let’s look at why.
The Best Capstone Award is bestowed upon the capstone team chosen by the judges that best displayed a well-rounded package of the core skills essential to Data Scientists: math & statistics expertise, computing & modeling competency, domain knowledge and presentation skills. For The Women Foundation is proud to bestow this honor to Team Money which consists of Jeroselle Santos, Elyse Katrina Go, Nicole Lumagui and Bernardette Misa.
They made an app that uses machine learning to estimate fair market value for used cars trained on the Philippine market. It's incredible what they have achieved in this short program! Their team has shown committed effort, strong teamwork, and a willingness to be bold with its insights and recommendations to CARLOVE.
The Basic Idea Behind “Smart Web Cam Motion Detection Surveillance System” Is To Stop The Intruder To Getting Into The Place Where A High End Security Is Required. This Paper Proposes A Method For Detecting The Motion Of A Particular Object Being Observed. The Motion Tracking Surveillance Has Gained A Lot Of Interests Over Past Few Years. This System Is Brought Into Effect Providing Relief To The Normal Video Surveillance System Which Offers Time-Consuming Reviewing Process. Through The Study And Evaluation Of Products, We Propose A Motion Tracking Surveillance System Consisting Of Its Method For Motion Detection And Its Own Graphic User Interface.
Artificial Intelligence In The Automotive Industry - M&A Trend AnalysisNetscribes
Artificial Intelligence (AI) is redefining the automotive industry. Organizations in the automotive industry are realizing the need to leverage advanced algorithms and computational structures, innovative testing and validation platforms, integrated cockpit solutions, and 5G network adoption and application deployment for building their next generation mobility services. As a result, mergers and acquisitions focused on acquiring AI capabilities is on the rise in auto sector.
The report provides a detailed analysis of more than 60 AI-focused deals in the auto sector over the last 10 years. Understand the specific AI technologies and capabilities that are high in demand, deal sizes, and the strategies driving those partnerships.
To purchase the full report, write to us at info@netscribes.com
IoT in Automobile industry by Shri Kaushal Jani, Project Head, Amiraj College of Engg & Technology, Ahmedabad
presented at All India Seminar on #IoT - Trends that affect Lives at The Institution of Engineers (I) Gujarat State Center
Tesla Autopilot is an advanced driver-assistance system feature offered by Tesla .That has lane centring, adaptive cruise control, self-parking, ability to automatically change lanes without requiring driver steering, and enables the car to be summoned to and from a garage or parking spot.
Many resources discuss machine learning and data analytics from a technology deployment perspective. From the business standpoint, however, the real value of analytics is in the methodology for solving some systemic holistic problems, rather than a specific technology or platform.
In this presentation, the focus is shifted from the technology deployment to the analytics methodology for solving some holistic business problems. Two examples will be covered in detail:
(i) Analysis of the performance and the optimal staffing of a team of doctors, nurses, and technicians for a large local hospital unit using discrete event simulation with a live demonstration. This simulation methodology is not included in most Machine Learning algorithms libraries.
(ii) Identifying a few factors (or variables) that contribute most to the financial outcome of a local hospital using principal component decomposition (PCD) of the large observational dataset of population demographic and disease prevalence.
Wine Quality Analysis Using Machine LearningMahima -
Wine industries use Product Quality Certification to promote their products and become a concern for every individual who consumes any product. But it's not possible to ensure wine quality by experts with such a huge demand for the product as it will increase the cost. It allows building a model using machine learning techniques with a user interface which predicts the quality of the wine by selecting the important parameters.
Automobile Industry in India, ATL & BTL Marketing spend, Luxury Car Market, Mercedes-Benz, BMW & Audi India analysis, Key Opportunities for marketers and Indian Consumer in 2015, Passenger car market, Commercial vehicle market, women as a consumer, auto component industry in India,
INVENTORY MANAGEMENT CASE STUDY :Maruti udyog limited(mul) ii ravneetubs
INVENTORY MANAGEMENT CASE STUDY :Maruti udyog limited(mul)
The raw material consumption cost was 75-83% of net sales. Therefore the company has been paying special attention to its inventory management.
Around 70% of the firm’s components are outsourced.
Average inventory turnover ratio of the company increased from 11.9 (in 2005-06) to 13.9 (in 2006-07).
The company has undertaken various initiatives to improve it’s inventory management because of which there has been an increase in its average inventory turnover ratio over the years.
Artificial Intelligence In Automotive Industry: Surprisingly Slow Uptake And ...Bernard Marr
Artificial intelligence has been a hot topic in the automotive industry for years, and we have seen rapid advances in things like autonomous driving. It, therefore, comes as a bit of a surprise that new research shows a slower than expected AI adoption rate. Let’s look at why.
The Best Capstone Award is bestowed upon the capstone team chosen by the judges that best displayed a well-rounded package of the core skills essential to Data Scientists: math & statistics expertise, computing & modeling competency, domain knowledge and presentation skills. For The Women Foundation is proud to bestow this honor to Team Money which consists of Jeroselle Santos, Elyse Katrina Go, Nicole Lumagui and Bernardette Misa.
They made an app that uses machine learning to estimate fair market value for used cars trained on the Philippine market. It's incredible what they have achieved in this short program! Their team has shown committed effort, strong teamwork, and a willingness to be bold with its insights and recommendations to CARLOVE.
Car buying landscape: the road ahead by QuoraSocial Samosa
Quora, Talkwalker, and Grant Thornton Bharat collaborate to release a report highlighting the latest car buying trends on digital platforms and how the customer experience will be improved with a shift from physical to digital.
A recent report published by Infinium Global Research on automotive semiconductor market provides in depth analysis of segments and sub-segments in global as well as regional automotive semiconductor market. The study also highlights the impact of drivers, restraints and macro indicators on the global and regional automotive semiconductor market over the short term as well as long term. The report is a comprehensive presentation of trends, forecast and dollar values of global automotive semiconductor market. According to report the global automotive semiconductor market is projected to grow at a CAGR of 6.4 % over the forecast period of 2018-2024.
Automotive Industry Insights Winter 2018Duff & Phelps
The automotive industry showed signs of peak sales in 2017, while earnings and stock prices continued to increase. Industry competitors are racing to develop revolutionary new technology that could dramatically change the automotive landscape over the next decade.
A Study of Customer Purchasing Behaviour of Automobiles in Kukatpally, Hyderabadijtsrd
In this competitive world automobiles have become an important part of daily transportation and the competition between different automobile industries lead to the high level innovations in automobiles in order to reach the needs of people. This research attempts to answer some of the questions regarding brand personality of selected cars by conducting the survey. This research will help in knowing what a customer thinks about a given brand of car and what are the possible factors guiding a possible purchase. The main aim of this research is to analyse sales of a particular automobile company and to identify the advertisements and sales promotion activities adopted by the automobile companies. The present study is an attempt to evaluate the consumer purchase behaviour of automobiles. This research highlights the various factors which influence the consumer behaviour towards four wheeler, small sedan cars purchase decision and their behaviour and level of satisfaction. Dr. Mayuri Chaturvedi | Baddam Harish Reddy "A Study of Customer Purchasing Behaviour of Automobiles in Kukatpally, Hyderabad" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-2 , February 2019, URL: https://www.ijtsrd.com/papers/ijtsrd21376.pdf
Paper URL: https://www.ijtsrd.com/management/consumer-behaviour/21376/a-study-of-customer-purchasing-behaviour-of-automobiles-in-kukatpally-hyderabad/dr-mayuri-chaturvedi
The global automated fare collection market to increase from US$ XX.X million in 2020 to US$ XX.X million by 2025 with a compound annual growth rate (CAGR) of XX.X% for the period 2020-2025. The research report on the global automated fare collection industry provides extensive competition analysis and competitive conditions. The report includes information on significant products, players, challenges and developments, and other information specific to the automated fare collection industry. The data in this report is targeted for business and industry practitioners and specifically intended to assist in the explanation, direction, and to understand the potential of the automated fare collection markets. The study focuses on providing readers with an understanding of developments in the industry, market segments, market forecasts, leading players, and market drivers and inhibitors.
Autonomous Cars Market Is Estimated To Reach 138,089 Units By 2024: Grand Vie...Amit M
The growing acceptance of semi-autonomous technologies, such as Adaptive Cruise Control (ACC), automatic parking, and forward collision avoidance, is anticipated to pave the way for the adoption of driverless automobiles over the next seven years.
: The main aim of the study is to know and understand the behaviour of the automobile customer in the
state of Haryana which is one of the fine developed states of India. The study is being carried out to understand
the preferences of automobile customers in special reference to CNG cars. The objective of the study is to
acknowledge the preferences and create awareness of the CNG cars for pollution reduction
Digital Marketing Plan for Euro Car Parts Ltd.
This plan was an assigment for my Professional Diploma in Digital Marketing in the IDM (Institute of Direct and Digital Marketing).
Please be aware: this is not a real plan, it does not contain any company's real information.
Value Creation in Collaborative Supply Chain Network in Automobile Industry i...Waqas Tariq
ABSTRACT Abstract Purpose : This paper aims to determine the key factors that influence the value based supply chain in the collaborative network of Automotive sector in India and the extent of information sharing in a B2B set up and its implication on business decisions in the Automobile sector. The paper makes an attempt to examine the value creation in the supply chain network of e-commerce based automobile companies. The predominant factors that influence the Collaborative practices of Automobile Original Equipment Manufacturers (OEMs) in Karnataka and their dealership network in the background of e-commerce is thoroughly examined. Design/methodology/approach : In the paper, the systemic and logical analysis of value creation expert research made over the past several years is used and statistical analysis(Exploratory Factor Analysis) has been conducted based on the survey results of perceptions of the dealers assimilated through online survey. Findings : Major empirical findings based on automobile companies’ data analysis vide Exploratory factor analysis allow stating that adoption of e-based collaborative arrangements and information sharing based on trust and long term alliance between partners enhances the value creation and results in the improvement in Supply chain management. It thus results in: Savings in cost, Timely decisions based on superior information, Better and positive relationships with Manufacturer-Suppliers and Dealers, Superior and strong collaboration, Integrated customer service and Enhancement of ultimate consumer value. Originality value: This paper has thoroughly examined the Collaborative network of the e-commerce based automobile co’s. and empirical findings suggest that the entire supply chain network has reaped the benefits of technology adoption and its impact on business results is tangible which could be evidenced in positive outcomes like improvement in Manufacturer-Supplier-Dealer co-ordination and enhancement of long term customer relations. Research limitations/implications : The presented research work confirms the positive implications of technology on value creation in the supply chain network of e-commerce based automobile companies. Research in this area highlights only the value addition in the collaborative supply chain network in select automobile co’s (OEMs) in Karnataka and their dealership network in India. Broader generalisations could be drawn using this information, by selecting a larger sample size. The future research should be made on the entire industry in the country and by bringing more co’s. and suppliers into the sampling frame. Keywords : Value creation, Collaborative supply chain, Information sharing, Superior collaboration.
Philippine Automotive Industry Insights | AutoDeal | Q3 2018 Christopher Franks
It's not just about generating leads, but also understanding conversion. AutoDeal.com.ph, the Philippines no.1. automotive marketplace provides Insights on consumer behavior related to the Philippine automotive industry. Data focuses on car buyer preferences, online shopping behavior and lead-management fundamentals.
Similar to Big Data Analysis of Second hand Car Sales (20)
Multiclass skin lesion classification with CNN and Transfer LearningYashIyengar
A multiclass classifier has been developed using transfer learning approach to detect seven different types of skin lesions. A modified VGG19 architecture is used for this purpose. The proposed model has achieved 78 % validation accuracy and 74 % test accuracy.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Big Data Analysis of Second hand Car Sales
1. Statistical Analysis on Second hand vehicle
sales using Big Data Technologies and Linear
Regression
Programming for Data Analytics
Msc Data Analytics
Yash Balaji Iyengar
Student ID: x18124739
School of Computing
National College of Ireland
Supervisor: Dr. Muhammad Iqbal
2. National College of Ireland
Project Submission Sheet
School of Computing
Student Name: Yash Balaji Iyengar
Student ID: x18124739
Programme: Msc Data Analytics
Year: 2018
Module: Programming in Data Analytics
Supervisor: Dr. Muhammad Iqbal
Submission Due Date: 16th August 2019
Project Title: Statistical Analysis on Second hand vehiclesales using Big
Data Technologies and LinearRegression
Word Count: 3141
Page Count: 11
I hereby certify that the information contained in this (my submission) is information
pertaining to research I conducted for this project. All information other than my own
contribution will be fully referenced and listed in the relevant bibliography section at the
rear of the project.
ALL internet material must be referenced in the bibliography section. Students are
required to use the Referencing Standard specified in the report template. To use other
author’s written or electronic work is illegal (plagiarism) and may result in disciplinary
action.
Signature:
Date: 16th August 2019
PLEASE READ THE FOLLOWING INSTRUCTIONS AND CHECKLIST:
Attach a completed copy of this sheet to each project (including multiple copies).
Attach a Moodle submission receipt of the online project submission, to
each project (including multiple copies).
You must ensure that you retain a HARD COPY of the project, both for
your own reference and in case a project is lost or mislaid. It is not sufficient to keep
a copy on computer.
Assignments that are submitted to the Programme Coordinator office must be placed
into the assignment box located outside the office.
Office Use Only
Signature:
Date:
Penalty Applied (if applicable):
3. Statistical Analysis on Second hand vehicle sales using
Big Data Technologies and Linear Regression
Yash Balaji Iyengar
x18124739
Abstract
In the modern age, an automobile vehicle has become more of a necessity than
a luxury. Due to the increase in living costs and demands of the modern world
variety of vehicles are launched every ear around the world. But not all can afford
to buy a brand new car. So in developing countries, people buy vehicles on lease
or contract. This gives rise to a market for used vehicles. Unlike the mainstream
automobile market, the prices of the vehicles fluctuate in the second-hand sales
market. In order to create a stable price range for different vehicles in the market,
a methodology has been proposed. This paper uses Craig’s list dataset on used
cars and performs analysis on it in a Big Data Environment. Regression analysis
performed on the dataset and a model has been proposed which can predict the
price of the vehicle in the used-car market.
1 Introduction
The automobile industry in any country is one of the major factors to improve the
country’s economy. Used car sales prediction has gained high importance in recent years.
Vehicle production has increased exponentially over the years Gegic et al. (2019). With
the growing standard of living of people all over the world, people buy a new car or a
second-hand one depending on their budget. In economically emerging nations people
lease cars on loans or on a contract basis. At the end of the contract, the car is returned
to the dealer. Thus it is very common to see the used car sales market surpass the new
automobile sales. Now the car price changes while it is sold in the secondary market.
In the car sales market, the cost of the new car is decided by the manufacturer and it
remains standard. But in the second-hand car market the cost of a car is relative, varying
and depends on a lot of factors like the brand or manufacturer of the vehicle, make of the
car, age of the car is, cylinder capacity and type of fuel used Pai and Liu (2018). Due to
this, there is no standard set for assigning the selling price of the second-hand car.
Sometimes car dealers take advantage of the ignorance of the customer and inflate
the prices of vehicles. So understanding the needs of the market and devising a way
to maintain a standard price range for different cars in the second-hand market is an
interesting challenge. In other scenarios, the customer is not willing to spend much on
the vehicle or many times the customers choice of purchasing a vehicle is influenced by
the features of the car and how comfortable the car feels. Due to the scenario discussed
above both parties that is the customer and the car dealer trying to figure out what the
1
4. other one needs and how do both get the best deal. The car sellers try to advertise their
best deals with the help of social media platforms and thus try to increase their trade.
In this paper, we will try to analyse a used car sales data set which was made public on
Craig’s list. The dataset was collected from Kaggle 1
and it consists of 1.7 million records
of car sales data from the year 1900 to 2019. The data set is pre-processed and cleaned in
R. Then the data is loaded into MySQL. Using Sqoop the data is moved from MySQL to
Hive and Hadoop Distributed File System. The data is also moved to Pig. Data analysis
is performed in the Pig, Hive and the MapReduce environment. The hdfs stored data is
then processed and regression analysis is performed on the data to predict the price of
the used cars.
2 Related Work
There is an increase in the number of car sales recently. Residents of the developing
nations use the option of leasing the cars instead of buying as it is more economical due
to which the sales of used cars have amplified. Keeping this in mind, the car sellers are
setting unrealistic prices as there is high demand. Hence there is a requirement to build
a model that sets the price for a car by determining its key attributes. This research
proposes a supervised learning approach i.e. Random forest to predict the cost of used
cars Pal et al. (2019). A random Forest with 500 decision trees was formed in order
to train the data. Random Forest is mainly utilized as a Classification model but, in
this research, it was employed in the form of a regression model and the related problem
was transformed in a corresponding regression problem. A Grid Search Algorithm was
deployed to determine the appropriate count of trees and best efficiency deduced was
500 decision trees. The attributes selected is equal to the attributes in the input data.
After executing the model, the results determined were 95.82% training accuracy and the
testing accuracy deduced was 83.63%. Random Forest provided with better results in
comparison with other models which were used in prior researches in the data.
Due to the expanding social media, stating the views on commodities has become
effortless. Data available on social media can be used as a potential input in order to
predict the sales of vehicles. Stock market estimates to have an impact on sales. This
research proposes a methodology to use multivariate regression with social media data
and stock market estimates and time-series models are used to forecast monthly total
vehicle sales Pai and Liu (2018). The least-square support vector regression (LSSVR)
model was employed to handle multivariate regression data. The data used in this re-
search are sentiment score of tweets, stock market values and hybrid data to predict
the car sales in the USA. The results specify that predicting vehicle sales using hybrid
multivariate regression data combined with deseasonalizing methodologies can deduce ac-
curate prediction outputs as compared to other models. The employment of hybrid data
comprising sentimental analysis of social media and stock market estimates can improvise
to predict the accuracy. The deseasonalizing methodologies in condition variables and
decision variables aid in expanding the forecasting efficiency.
Presently, there is an astonishing amount of alternatives and selections accessible to
car buyers. Manufacturers and vendors proceed to examine various ways and use re-
markable advertising approaches to satisfy potential consumers to buy their cars. This
1
https://www.kaggle.com/austinreese/craigslist-carstrucks-data#craigslistVehicles.
csv
2
5. research proposes to examine and create an intelligent system for forecasting the mar-
keting usefulness of cars on the basis of utilizing selected car aspects and supervised
neural network prediction model Khashman and Sadikoglu (2019). The neural network
prediction model which is based on propagation neural network was employed and its
benchmarks enhanced deducing a rapid training time of 0.017s and a proper combined
prediction rate (CPR) of 85.07%. Hence ensuring a practical resolution to this feasibility
forecasting undertaking.
Gegic et al. (2019) has proposed research on car price prediction which is one of the
hot topics in the research field. It states that to build a highly accurate and reliable
model of prediction is a complex task where the number of distinct attributes needs to be
taken into consideration. As this model is based on car prices in Bosnia and Herzegovina
which according to the author is a potentially growing market and the automobile makers
must target more strategically. There are three such models applied on the data namely
Random Forest, ANN and Support Vector Machin. The data was scraped from the
website called autopijaca.ba and the tool used was PHP. The model was given 90%
training data and 10% testing data. And data evaluation is done using 10-fold cross-
validation. The author is trying to compare these models against each other and find the
best performing model based on accuracy. Where the SVM outperformed the ANN in
terms of the best model. Finally, for the prediction model integration was done in JAVA.
This research study Zhang et al. (2017) proposes how big data can improve car sales.
Making use of data mining techniques and java program to analyse the application of
the big data. Data mining is used to help the manufacturer in increasing the production
and also to reduce the inventory and also to cut on the wastage of resources. The data
was collected of chine market which it had data ranging from 2005 to 2015. The author
considered three features from this which are as follows 1. Analysis of a large volume of
data rather than small data analysis, 2. Rather than having accuracy look for complic-
ation in data, 3. Exploring any unwanted anomalies in data. The approach followed in
this research is the KDD methodology as data is very large and it follows unsupervised
learning. Through this research, there was many objectives answer like which model is
most popular in which category of customer, how many cars to be produced in the coming
months and which models give most profit.
Use of big data in the automobile industry is ever increasing. More and more car
manufacturer is making use of big data analytics to improve aspects like production,
sales and brand value. One such example is AUDI which in paper Dremel et al. (2017)
has descried as the digital transformation. The researcher suggests that big data gives
insights into new business models, digital services and innovation. There are three models
of the use of big data been proposed in this paper. And the how automobile makers are
connecting these digital opportunities of business with data services. It also states that
there should be a collective force between I.T. and Business dept. With understanding
from this research, there should be a cross-departmental and cross-discipline task to
further thrive the business.
3 Methodology
The above flow diagram 1 gives a pictorial representation of the entire workflow of the
project. For a detailed explanation, this section is divided into the following sections.
3
6. Figure 1: Second-hand Car Sales Analysis and Price Prediction Work Flow
3.1 Environment Setup and Data Pre-Processing
First, an instance was created on NCI OpenStack Cloud Environment. The Big Data
Environment was set up from scratch by installing the required software in the following
order. Hadoop was installed, then HBase was installed and run in distributed mode. Then
MySQL was installed. A Test Database was created and a dummy table was created for
testing purposes. Then Sqoop was installed. Then the functionality of Sqoop was tested
by transferring the dummy table from MySQL to HDFS and HBase. Then Pig and
Hive were installed in the virtual environment. The data set is obtained from Kaggle 2
.
The data set consists of 26 features and 1.7 million rows. The data is downloaded and
loaded in the R - studio environment. The data set is then checked for null values using
the Amelia library and miss mapp function. This function visualises the data and gives
a distribution of the missing values in the dataset. Unwanted columns were dropped
and the null values were omitted using the na.omit function available in base R. Factor
levels of the categorical variables were checked and altered to avoid data anomalies and
duplication of data. The final clean data set consisted of 331473 rows and 20 columns.
3.2 Data processing in Big Data Environment
In order to improve the processing and loading performance, we load the data into the
MySQL database. The dfs and yarn services are switched on. First, the dataset is loaded
into the local memory in the house profile. A new database is created named cars and a
schema is created describing the data types of each column in the data. Then the data
is loaded into MySQL. This data is then loaded to the Hadoop Distributed File system
Environment. Since the size of the data is huge we use Sqoop for transferring the data.
2
https://www.kaggle.com/austinreese/craigslist-carstrucks-data#craigslistVehicles.
csv
4
7. Sqoop is a software developed by Apache to facilitate the transfer of huge chunks of data
between relational databases and hdfs 3
. Hadoop uses the java database connectivity
and MySQL to connect with different databases, so the data is transferred to HDFS
from MSQL. Similarly, the data is transferred to Hive using Sqoop. The data stored on
Hadoop is then processed using multiple frameworks for exploring and analysing the data.
Mapreduce is used for our two BI Queries. Hive is used to calculate three BI Queries and
Pig is used for two BI Queries. We will discuss them in detail in the next section.
3.3 Software Used
3.3.1 Hadoop Mapreduce:
It is a processing framework which works on HDFS 4
. We have used it to calculate the
used cars price distribution over the years and also to find out the vehicle of which size
is preferred by the customers. The entire Mapreduce functioning can be divided into
two parts and the code can be explained in three parts. The mapper assigns a key-value
pair to each element in a column and passes it to the reducer. The reducer performs an
aggregate function on the input key-value pair received from the mapper depending on
the application.
3.3.2 Hive:
Hive is a MySQL like database which is used to store large datasets and alter them 5
.
We use Hive for processing our data to find answers to our three BI Queries they are
which state has most listings of used cars, Which manufacturer is more trusted in the
second-hand vehicle market and the third query is the vehicle with what cylinder capacity
is sold most. The select, group by and the count syntax have been used to calculate the
above queries. The output of the BI queries has been stored in HDFS
3.3.3 Pig:
It is a framework that is used for exploring patterns in datasets with a scripting syntax.
We have used Pig to find out the most popular vehicle type sold in the used car market
and customer’s tendency to buy which colour vehicle. The group by, for each and the
count functions, have been used to answer the BI Queries. The output of the queries has
been stored in HBase database.
3.3.4 HBase:
HBase a NoSQL database which is used to store semi-structured as well as structured
data. It is integrated with Hadoop and it is used to store vast and widespread data. It
has column families and is a schemaless database. In this project, we have used HBase
to store the output of the BI queries obtained from Pig framework.
3
https://sqoop.apache.org/
4
https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/
hadoop-mapreduce-client-core/MapReduceTutorial.html
5
https://hive.apache.org/
5
8. 3.3.5 R - Programming Language:
R is a statistical language used for statistical analysis. We have used it for data pre-
processing and in the final stage for visualization of the BI queries.
3.3.6 Python Programming Language:
It is a programming language used extensively in the data science domain. In this project,
it is used for performing regression analysis and building a model that predicts the price
of used cars. The results of the model will be discussed in the results section.
3.4 Tableu and PowerBI:
Both are data visualization tools used to plot data and extract important meaningful
insights from the data. We have visualised our data with the help of this software.
4 Results
The results of the analyses performed on the dataset will be explained in this section.
4.1 BI Query 1: Trend of Used Car Sales over the years
Figure 2: Results for BI Query 1
In Figure 2 Count of car Sales has been plotted against years in a line graph. This
output was obtained from Hadoop MapReduce. From the trend, we can see that there
is a gradual increase in the second car sales market from the year 1985 and it can be
observed that there is a steep descend in the sales in the year 2009. This might have
happened because of the recession and the economic loss that year.
6
9. 4.2 BI Query 2: Can an inference be made based on the vehicle
size distribution ?
Figure 3: Results for BI Query 2
In Figure 3 the vehicle size variable has been plotted in a doughnut chart. The output
file of this query is obtained by performing analysis on Hadoop MapReduce. The chart
shows the tendency of the majority of customers to buy full-size vehicles. From this,
we can infer that main customers buy heavy vehicles and must be truck drivers or lorry
drivers or delivery contractors. So based on this data new marketing strategies can be
designed to increase sales.
4.3 BI Query 3: Statewise distribution of car sales
Figure 4: Results for BI Query 3
7
10. Above Figure 4 shows the state-wise distribution of used cars sales. It can be observed
that Florida and California have established themselves as the top leading second-hand
vehicle market followed by Texas, Michigan and Wisconsin. This gives the customers a
better idea of where to find reliable car dealers and helps the vehicle sales company new
customer base to target.
4.4 BI Query 4: Which Manufacturer is the most trusted in
the second hand vehicle sales market ?
Figure 5: Results for BI Query 4
Figure 5 shows the most trusted brand in the used cars market. This query is processed
in the hive framework. Ford and Chevrolet lead the market by a high margin followed by
Toyota, Honda, Nissan and others. So the vehicle dealers can set a price of the vehicle
based on this statistic and the customer can benefit by narrowing down the range of
vehicles to search from to purchase.
8
11. 4.5 BI Query 5: Vehicle Sales distribution based on number of
cylinders in a vehicle
Figure 6: Results for BI Query 5
The data for this query has been processed in Hive and we can observe that custom-
ers prefer a vehicle with 10 cylinder engine. This visualization has been plotted in R
programming language.
4.6 BI Query 6: Which type of vehicle are most sold in the
market?
Figure 7: Results for BI Query 6
9
12. This query has been processed in the Pig Framework and the output has been stored
in HBase as discussed in subsection 3.3.3. We can see that Sedan, Truck and SUV are
most purchased. If we observe closely we can co-relate it with the 4.2. Both these queries
bring more credibility to our assumption that the main customers are the truck drivers
and delivery contractors in the second-hand vehicle sales industry.
4.7 Linear Regression in Python
The processed data in HDFS is then used for Regression Analysis to build a model for
predicting the prices of the used cars. First, the clean data is loaded into the Jupyter en-
vironment. Then Exploratory Data Analysis is performed and outlier values are checked.
The price and odometer column are filtered to rid the data from anomalies and outlier
values. Only numeric features are selected which are the year, odometer and price is the
dependent variable. Then the data is portioned into training and testing with partition
split ratio as 75 to 25. Linear Regression model is run on the training data to obtain a
root means square error value of 7549.25. The model is then checked on test data to get
an RMSE value if 7488.68. This means, the model correctly predicts the price of the cars
74.88% of the times.
5 Conclusion
In this study, the Hadoop and its Big Data Family features have been used to demonstrate
the ease of working on large datasets in a virtual environment. In this paper, an analysis
has been performed on the used cars dataset which helps us understand the various factors
and its importance on the dependent variable that is price. A proper workflow has been
devised as mentioned in the figure1 and all the tasks have been performed according to
the design flow.
References
Dremel, C., Herterich, M., Wulf, J., Waizmann, J.-C. and Brenner, W. (2017). Digital
Transformation Often Requires Big Data Analytics Capabilities1 2 How AUDI AG
Established Big Data Analytics in Its Digital Transformation, MIS Quarterly Executive
16(81): 81–100.
URL: http://www.misqe.org/ojs2/index.php/misqe/article/viewFile/765/459
Gegic, E., Isakovic, B., Keco, D., Masetic, Z. and Kevric, J. (2019). Car price prediction
using machine learning techniques, TEM Journal 8(1): 113–118.
Khashman, A. and Sadikoglu, G. (2019). Data Coding and Neural Network Arbitration
for Feasibility Prediction of Car Marketing, Vol. 2, Springer International Publishing.
URL: http://dx.doi.org/10.1007/978-3-030-04164-9 34
Pai, P. F. and Liu, C. H. (2018). Predicting vehicle sales by sentiment analysis of twitter
data and stock market values, IEEE Access 6: 57655–57662.
Pal, N., Arora, P. and Kohli, P. (2019). How Much Is My Car Worth ? A Methodology
for Predicting Used Cars ’ Prices Using Random Forest, Vol. 1, Springer International
10
13. Publishing, pp. 413–422.
URL: http://dx.doi.org/10.1007/978-3-030-03402-3 28
Zhang, Q., Zhan, H. and Yu, J. (2017). Car Sales Analysis Based on the Application of
Big Data, Procedia Computer Science 107(Icict): 436–441.
URL: http://dx.doi.org/10.1016/j.procs.2017.03.137
11