Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

823 views

Published on

Published in:
Technology

No Downloads

Total views

823

On SlideShare

0

From Embeds

0

Number of Embeds

9

Shares

0

Downloads

14

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Leveraging Machine Learning Techniques for the Vehicle Auction Industry Raji Balasubramaniyan, PhD Senior Data Scientist Manheim, Inc., Manheim | Proprietary and Confidential 1
- 2. Overview • Automobile auction – Manheim • Introduce the ML use cases – Churn rate – recommendations – Forecasting • How to approach a problem? – Tools and algorithms used • QA
- 3. Manheim, Inc., Automobile auction Providing auction services for the physical sale of automobiles as well as online tools to connect wholesale vehicle buyers and sellers. Leader in wholesale vehicle auction industry. 85% vehicle auction business happens at Manheim. We have over 100 location across US and Canada About 15 million cars goes through auction every year
- 4. ML use case 1: Predicting Churn rate • What is Churn? – Churn rate, refers to the proportion of members who leave during a given time period • Motto: Make customer happy – If the customer is happy, he/she wont churn. • Why it is important? – It helps us predict and analyze the parameters that drives the customers away helps sales force team to focus on those parameters and coach the customer Manheim | Proprietary and Confidential 4
- 5. Predicting Churn rate: The approach • Step 1 – Create profile for current and cancelled members by collecting their behavior data for last 6 months • Activity, Transactions, Messages, Response time etc., • Step 2 – Segment the customer according to their behavior • Unsupervised clustering • Step 3 – For every segment perform supervised learning, to select parameters that influence current members Vs. cancelled members • Logistic regression, Neural net • Step 4 – Include sentiment analysis add another score Manheim | Proprietary and Confidential 5
- 6. Algorithms: Unsupervised K-means clustering • Given a set of observations (x1, x2, …, xn), where each observation is a d-dimensional real vector consists of each members parameters, k-means clustering aims to partition the n observations into k (≤ n) sets S = {Successful Seller, Successful Buyer, Buyer at risk, Seller at risk, undecided} so as to minimize the within-cluster sum of squares (WCSS). In other words, its objective is to find: • where μi is the mean of points in Si. Manheim | Proprietary and Confidential 6
- 7. Algorithms: Logistic regression Manheim | Proprietary and Confidential 7 If P is viewed as a linear function of an explanatory variable, or a linear combination of explanatory variables, then the logistic regression function can be written as Where α1…αn are parameters influencing the churn
- 8. Algorithms: Neural net Manheim | Proprietary and Confidential 8 Given a specific task to assign a user in a group, given 5 groups, learning means using a set of factors to find f* ∈ F which solves the task in optimal sense. Our training data consists of N dealers from each group from 5 groups. x1 :Activity x2 : Number of messages x3: Response time xn : etc w1 w2 w3 wn wnå xn Output Our cost function is the mean-squared error, which tries to minimize the average squared error between the network's output.
- 9. Algorithms :Sentiment analysis Manheim | Proprietary and Confidential 9 Sentiment refers to the use of natural language processing, text analysis and computational linguistics to identify and extract subjective information in source materials. We used Naïve-Bayes model. We have two training groups G ={ ‘Cancel’, “Member”}, D= Messages Example tk= {“like”, “love”, “hate”, “bad”, “worst” , "interesting-to-me" : "not-interesting-to- me”,…..k-terms} Goal is to find best group for a message D using maximum a posteriori (MAP) group Gmap tk is a term; Dm is the set from ‘Members’; Dmk is the subset that contain tk; Dc is the set from ‘Cancelled Member’; Dck is the subset that contain tk.
- 10. The Result • Every dealer will be assigned to a group • He / She will have 3 different health score (1-Churn rate) – 0-30 days health score (Calculated using last 30 days data) – 30-60 days health score (Calculated using last 30-60 days data) – 60+days health score (Calculated using last 60-120 days data) • Sales force will be alarmed to see if a successful user turned to fall in risk category. They will look into the parameter which forced them to be in risk category – Example : Last 30 days less Activity • Marketing team will take risk category users and aim promotion schemes to them Manheim | Proprietary and Confidential 10
- 11. ML use case 2: Recommendation Manheim | Proprietary and Confidential 11 What is recommendation system? Recommender systems are a subclass of information filtering system that seek to predict the 'rating' or 'preference' that a user would give to an item. Goal Suggest relevant content to the users
- 12. Recommendation: The Approach Manheim | Proprietary and Confidential 12 • Step 1 – Segment customers according their transaction patterns • Step 2 – For every segment create user profile per customer • Step 3 – Match user profile with vehicle profile and arrive at matching score • Step 4 – Rank the relevant content • Step 5 – Combine profile matching and ranking and provide recommendations
- 13. The approach: Segment the customers Manheim | Proprietary and Confidential 13 Segment the customers according to their behavior • Franchise dealer, Independent, Wholesaler K-means or any clustering technique could be used for this purpose Our objective is to find best group every dealer belongs to. where μi is the mean of points in Si. and S = {different customer segments}
- 14. The approach :Creating user profile and Matching • Create user profiles by collecting the dealer transaction pattern for a period of time • For every user profile perform vehicle filtering using content based collaborative filtering – User – Item collaborative filtering: Relevant content recommendation • Customers who bought car X also bought car Y – 2010 Honda Accord Vs 2010 Toyota Camry – User- User collaborative filtering : You may also like these • Dealer A and Dealer B how much their profiles match Similarity or Co-rating matrix is used to arrive at relevant content matching correlations Manheim | Proprietary and Confidential 14
- 15. The approach: Ranking scores using regression Customer need score Once we have filtered the profiles that are relevant to the users, rank/sort the vehicles according to some goal to provide more relevant content on top • Example: Suggest items that makes more profit for the customers in the retail market, in this case regression goal is profit. Where α1…αn can be Buying price from auction, retail selling price, Detailing work done on the cars etc., Result Suggest relevant cars to the dealers when they login to the site
- 16. ML use case 3: Forecasting • How many transaction a buyer is going to make in next few weeks? – Given the past year transaction history for a buyer, how many cars the dealer will buy in future few auctions or online. – Which year, make and model the dealer buy? – In which auction, region he will buy. • How many users are going to Churn in next few months? – How many will move from risk category to successful category – How many will move to risk category – How many non active moved to active category Manheim | Proprietary and Confidential 16
- 17. Synopsis : Time series and ARIMA Manheim | Proprietary and Confidential 17 A time series can be viewed as a combination of signal and noise, and could have different patterns like, and it could also have a seasonal component. • Mean reversion • The trend will tend to move to the mean over time • Sinusoidal oscillation • Etc., An ARIMA model can be viewed as a “filter” that tries to separate the signal from the noise, and the signal is then extrapolated into the future to obtain forecasts. ARIMA models are, the most general class of models for forecasting a time series.
- 18. The Approach :ARIMA Auto Regressive Integrated moving average model for calculating the forecast, A non seasonal ARIMA model is classified as an"ARIMA(p,d,q) model, where: p is the number of autoregressive terms d is the number of non seasonal differences needed for stationarity q is the number of moving average terms. A seasonal ARIMA model is classified as an ARIMA(p,d,q)x(P,D,Q) model, where P=number of seasonal autoregressive (SAR) terms D=number of seasonal differences Q=number of seasonal moving average (SMA) terms According to signal type, we developed automatic forecast parameter prediction algorithm, that choses different p,P, d,D and q,Q values and selects the one which has lowest RMSE value using 80-20 rule. Manheim | Proprietary and Confidential 18
- 19. Manheim | Proprietary and Confidential 19 perioid− Example4−c(0, 0, 0),S(1,0,0) Weeks count 0 20 40 60 80 100 400005000060000700008000090000 80/20 Weeks count 0 20 40 60 400005000060000700008000090000 One Example
- 20. Summary • We used various ML techniques and implemented them for vehicle auction industry use cases. • Choosing the algorithm determines the success of the results and depending on the use case, various algorithms can be used • Extracting , Cleaning and normalizing the data forms the crucial layer in determining the use case success Manheim | Proprietary and Confidential 20
- 21. Acknowledgement • Dr. Stephane Pinel • Sonar Team • Manheim Manheim | Proprietary and Confidential 21
- 22. Q &A Manheim | Proprietary and Confidential 22

No public clipboards found for this slide

Be the first to comment