SlideShare a Scribd company logo
1 of 18
Download to read offline
Corporate
Relocation Prediction
A 10 Year, 2 million company use
case
Rahul Shetty, Ana Maldonado, Mauricio Rodriguez Lara
Who We Are
Who We Are
Relocation Prediction use case
Problem:
businesses, schools, hospitals, etc. move locations over time
(growth, bankruptcy, new markets, etc)
Can we predict if they will relocate?
- To where?
- When?
- Why?
=> For now, we focus only on
relocation probability
For businesses we have historical Corporate Data:
- Company size, credit rating, relocation, etc...
=> Can company characteristics predict relocation?
- Useful information for service providers, realtors, city councils,
investors and developers
=> Investigatory POC: 6 week study
- Limit the scope to determine if relocation can be predicted, and
if so, which properties can be a signal
Relocation Prediction use case
(Big) Data
We encountered some challenges:
- Monthly data from branches of 2 million companies, going back
10 years… ~ 300 million rows
- Disperse Data: where/how should it be gathered?
- Monthly data too granular: how to aggregate?
- Client did not have a suitable platform for data handling and
analysis...
Data & Modeling Considerations
- High dimensional time series data
- Preserve the temporal granularity to maximize information
- Neural Networks?
- LSTM or CNNs?
- NN design/exploration time > available time
- Simplify data and modeling due to time constraints
Preparing the Data
- Step 1: Collect the data on an appropriate platform:
- Set up Google Cloud platform in one week
- Step 2: Aggregate the Data
- From monthly to yearly: predicted relocation based on
yearly data
- Choose how to deal with categorical variables
- Subsequent Steps : Spawn virtual machine(s) on GCP for
modeling
Summary Statistics
- Final dataset: 75 features from 1 year and ‘has_relocated’
target from following year
- 2 million entries per year
- ~5% relocation (imbalanced dataset)
- Goal: Build a model that can predict ‘has_relocated’
better than randomly (better than 95% accurate)
Modeling step 1: Exploring Models
- Apply binary classification algorithms: SVM, logistic regression,
decision trees (DT), random forests (RF)
- Choose models with best performance: AUC, kappa
- DTs and RF did best
- Apply Sampling Techniques to improve models
- Tune model parameters
- Validate
Modeling step 2: ResultsTPR
FPR
AUC: 0.66
Best DT model produced
by undersampling data,
5-fold CV, and DT
parameters explored via
grid search
Modeling Results: Features
The most important
features having an
influence on a
‘has_relocated’ index
were related to:
- Company financial
assessments and
health
- Company age
Validation
How well can yearly models
predict the next year’s
relocation?
AUC
Validation
How well can yearly models
predict the next year’s
relocation?
… in general, rather well
AUC
Validation
How well can yearly models
predict the next year’s
relocation?
… in general, rather well
… except for 2016 (?)
AUC
- Company properties can be indicative of whether they relocate
- Yearly aggregated data is sufficient for high level indications of
relocation.
- More granular modeling (e.g. with NN) may provide additional
information
Take aways
- Possible to perform successful
POC on big data within 6 weeks
on GCP
Having had more time we would have:
- Full time series modeling
- NN, hierarchical modeling, etc...
- Automate prediction, given company characteristics
- Investigate anomalous year
- Make use of modeling results:
Future work
Any questions?
rshetty@qualogy.com
Thank you!

More Related Content

Similar to Rahul Shetty - Corporate relocation prediction - Codemotion Amsterdam 2019

Risk dk
Risk dkRisk dk
Risk dkdkocis
 
Resume___Ajay_Prabhakar.
Resume___Ajay_Prabhakar.Resume___Ajay_Prabhakar.
Resume___Ajay_Prabhakar.Ajay Prabhakar
 
Resume Christian Cisneros Sanchez
Resume Christian Cisneros SanchezResume Christian Cisneros Sanchez
Resume Christian Cisneros SanchezChristian Cisneros
 
Maximize Efficiency with Minitab Workspace and Minitab Statistical Software -...
Maximize Efficiency with Minitab Workspace and Minitab Statistical Software -...Maximize Efficiency with Minitab Workspace and Minitab Statistical Software -...
Maximize Efficiency with Minitab Workspace and Minitab Statistical Software -...Minitab, LLC
 
Building a better alll in 2015
Building a better alll in 2015Building a better alll in 2015
Building a better alll in 2015Libby Bierman
 
Analytics and MBA is a great career choice
Analytics and MBA is a great career choiceAnalytics and MBA is a great career choice
Analytics and MBA is a great career choiceHimanshu Arora
 
HWZ-Darden Konferenz: Building a Sustainable Analytics Orientation
HWZ-Darden Konferenz: Building a Sustainable Analytics OrientationHWZ-Darden Konferenz: Building a Sustainable Analytics Orientation
HWZ-Darden Konferenz: Building a Sustainable Analytics OrientationHWZ Hochschule für Wirtschaft
 
Adept Change Management_Panna Visani 2015_1
Adept Change Management_Panna Visani 2015_1Adept Change Management_Panna Visani 2015_1
Adept Change Management_Panna Visani 2015_1Panna Visani MBCS ACCA
 
WorldAtWorkConfernce_USBank_OS FINAL (no notes)
WorldAtWorkConfernce_USBank_OS FINAL (no notes)WorldAtWorkConfernce_USBank_OS FINAL (no notes)
WorldAtWorkConfernce_USBank_OS FINAL (no notes)Laura Roach
 
Retail Stores Network: Optimal Size & Geographical Distribution
Retail Stores Network: Optimal Size & Geographical DistributionRetail Stores Network: Optimal Size & Geographical Distribution
Retail Stores Network: Optimal Size & Geographical DistributionSotiris Athanassopoulos
 
Resume, references and degrees
Resume, references and degreesResume, references and degrees
Resume, references and degreesJuan Antonio Beis
 
Operations analytics assignment
Operations analytics assignmentOperations analytics assignment
Operations analytics assignmentssuser58cd6d
 
Strategic Solutions
Strategic SolutionsStrategic Solutions
Strategic Solutionsrnance
 
Finance Transformation for JABIL with IBM Cognos TM1
Finance Transformation for JABIL with IBM Cognos TM1Finance Transformation for JABIL with IBM Cognos TM1
Finance Transformation for JABIL with IBM Cognos TM1Tridant
 
Resume Rohan Godbole
Resume Rohan Godbole Resume Rohan Godbole
Resume Rohan Godbole Rohan Godbole
 

Similar to Rahul Shetty - Corporate relocation prediction - Codemotion Amsterdam 2019 (20)

Analytics
AnalyticsAnalytics
Analytics
 
Risk dk
Risk dkRisk dk
Risk dk
 
Body Media Poster
Body Media PosterBody Media Poster
Body Media Poster
 
Resume___Ajay_Prabhakar.
Resume___Ajay_Prabhakar.Resume___Ajay_Prabhakar.
Resume___Ajay_Prabhakar.
 
Resume Christian Cisneros Sanchez
Resume Christian Cisneros SanchezResume Christian Cisneros Sanchez
Resume Christian Cisneros Sanchez
 
Maximize Efficiency with Minitab Workspace and Minitab Statistical Software -...
Maximize Efficiency with Minitab Workspace and Minitab Statistical Software -...Maximize Efficiency with Minitab Workspace and Minitab Statistical Software -...
Maximize Efficiency with Minitab Workspace and Minitab Statistical Software -...
 
Building a better alll in 2015
Building a better alll in 2015Building a better alll in 2015
Building a better alll in 2015
 
sheethal_kamath
sheethal_kamathsheethal_kamath
sheethal_kamath
 
Analytics and MBA is a great career choice
Analytics and MBA is a great career choiceAnalytics and MBA is a great career choice
Analytics and MBA is a great career choice
 
HWZ-Darden Konferenz: Building a Sustainable Analytics Orientation
HWZ-Darden Konferenz: Building a Sustainable Analytics OrientationHWZ-Darden Konferenz: Building a Sustainable Analytics Orientation
HWZ-Darden Konferenz: Building a Sustainable Analytics Orientation
 
Adept Change Management_Panna Visani 2015_1
Adept Change Management_Panna Visani 2015_1Adept Change Management_Panna Visani 2015_1
Adept Change Management_Panna Visani 2015_1
 
Itm 2b
Itm 2bItm 2b
Itm 2b
 
WorldAtWorkConfernce_USBank_OS FINAL (no notes)
WorldAtWorkConfernce_USBank_OS FINAL (no notes)WorldAtWorkConfernce_USBank_OS FINAL (no notes)
WorldAtWorkConfernce_USBank_OS FINAL (no notes)
 
Retail Stores Network: Optimal Size & Geographical Distribution
Retail Stores Network: Optimal Size & Geographical DistributionRetail Stores Network: Optimal Size & Geographical Distribution
Retail Stores Network: Optimal Size & Geographical Distribution
 
Resume, references and degrees
Resume, references and degreesResume, references and degrees
Resume, references and degrees
 
What is analytics
What is analyticsWhat is analytics
What is analytics
 
Operations analytics assignment
Operations analytics assignmentOperations analytics assignment
Operations analytics assignment
 
Strategic Solutions
Strategic SolutionsStrategic Solutions
Strategic Solutions
 
Finance Transformation for JABIL with IBM Cognos TM1
Finance Transformation for JABIL with IBM Cognos TM1Finance Transformation for JABIL with IBM Cognos TM1
Finance Transformation for JABIL with IBM Cognos TM1
 
Resume Rohan Godbole
Resume Rohan Godbole Resume Rohan Godbole
Resume Rohan Godbole
 

More from Codemotion

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Codemotion
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyCodemotion
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaCodemotion
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserCodemotion
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Codemotion
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Codemotion
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Codemotion
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 - Codemotion
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Codemotion
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Codemotion
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Codemotion
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Codemotion
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Codemotion
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Codemotion
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Codemotion
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...Codemotion
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Codemotion
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Codemotion
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Codemotion
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Codemotion
 

More from Codemotion (20)

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending story
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storia
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard Altwasser
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
 

Recently uploaded

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Recently uploaded (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Rahul Shetty - Corporate relocation prediction - Codemotion Amsterdam 2019

  • 1. Corporate Relocation Prediction A 10 Year, 2 million company use case Rahul Shetty, Ana Maldonado, Mauricio Rodriguez Lara
  • 4. Relocation Prediction use case Problem: businesses, schools, hospitals, etc. move locations over time (growth, bankruptcy, new markets, etc) Can we predict if they will relocate? - To where? - When? - Why? => For now, we focus only on relocation probability
  • 5. For businesses we have historical Corporate Data: - Company size, credit rating, relocation, etc... => Can company characteristics predict relocation? - Useful information for service providers, realtors, city councils, investors and developers => Investigatory POC: 6 week study - Limit the scope to determine if relocation can be predicted, and if so, which properties can be a signal Relocation Prediction use case
  • 6. (Big) Data We encountered some challenges: - Monthly data from branches of 2 million companies, going back 10 years… ~ 300 million rows - Disperse Data: where/how should it be gathered? - Monthly data too granular: how to aggregate? - Client did not have a suitable platform for data handling and analysis...
  • 7. Data & Modeling Considerations - High dimensional time series data - Preserve the temporal granularity to maximize information - Neural Networks? - LSTM or CNNs? - NN design/exploration time > available time - Simplify data and modeling due to time constraints
  • 8. Preparing the Data - Step 1: Collect the data on an appropriate platform: - Set up Google Cloud platform in one week - Step 2: Aggregate the Data - From monthly to yearly: predicted relocation based on yearly data - Choose how to deal with categorical variables - Subsequent Steps : Spawn virtual machine(s) on GCP for modeling
  • 9. Summary Statistics - Final dataset: 75 features from 1 year and ‘has_relocated’ target from following year - 2 million entries per year - ~5% relocation (imbalanced dataset) - Goal: Build a model that can predict ‘has_relocated’ better than randomly (better than 95% accurate)
  • 10. Modeling step 1: Exploring Models - Apply binary classification algorithms: SVM, logistic regression, decision trees (DT), random forests (RF) - Choose models with best performance: AUC, kappa - DTs and RF did best - Apply Sampling Techniques to improve models - Tune model parameters - Validate
  • 11. Modeling step 2: ResultsTPR FPR AUC: 0.66 Best DT model produced by undersampling data, 5-fold CV, and DT parameters explored via grid search
  • 12. Modeling Results: Features The most important features having an influence on a ‘has_relocated’ index were related to: - Company financial assessments and health - Company age
  • 13. Validation How well can yearly models predict the next year’s relocation? AUC
  • 14. Validation How well can yearly models predict the next year’s relocation? … in general, rather well AUC
  • 15. Validation How well can yearly models predict the next year’s relocation? … in general, rather well … except for 2016 (?) AUC
  • 16. - Company properties can be indicative of whether they relocate - Yearly aggregated data is sufficient for high level indications of relocation. - More granular modeling (e.g. with NN) may provide additional information Take aways - Possible to perform successful POC on big data within 6 weeks on GCP
  • 17. Having had more time we would have: - Full time series modeling - NN, hierarchical modeling, etc... - Automate prediction, given company characteristics - Investigate anomalous year - Make use of modeling results: Future work