Kaggle - Santa Gift Matching Challenge

•Download as PPTX, PDF•

0 likes•275 views

KnowledgeMavens

Santa Gift Matching Challenge

Technology

Knowledge Mavens
Who are we and what we do
We’re a group of Scientists,
Artist, Engineers, Musicians
…aka Polymaths
We “Show and Tell” every
Saturday 1pm in Beaverton
Our mission is “Free Knowledge”

https://www.kaggle.com/c/traveling-santa-2018-prime-paths
A version of the Traveling Salesman
problem but with reindeer, Santa, and a
carrot issue to make it more interesting.
In short travel all the dots in the picture
once for the shortest path, but on the
tenth stop if the point number is not a
prime number then there is 10%
increase (the reindeer did not get the
expected carrot reward and thus are a
bit slower).
Reward is $7000 for best with other
prizes.

http://www.math.uwaterloo.ca/tsp/concorde.html
“Concorde is a computer code for
the symmetric traveling salesman
problem (TSP) and some related
network optimization problems.
The code is written in the ANSI C
programming language and it is
available for academic research
use; for other uses,
contact William Cook for licensing
options.”
Concorde scores about 1.5M path for contest. About 900
teams have about the same score.

https://github.com/alohawild/Raindeer
Our team, Wildteam, first coding in Python 3 is to just get the
data in and out (and to remember how Python 3 coding).
Our first run was to create the first basic dataset. We
managed to get our first rating, about 450 million units. We
shared our results with the reindeer folks and they were less
than excited. That would require the reindeer to travel, in the
four-hour period of our normal allowed delivery window, 1.8
million units (or so) a second.
We then created a sorted by X,Y list and inserted a prime
every tenth step. This was about 203 Million. Run time less
than a minute.
We then wrote a Monte Carlo program with simple greedy
selection: 73 M. This was with one hour run time on my Apple
with more than 100 epochs. Code is improvedeet.py. Run
time about an hour.

https://github.com/alohawild/Raindeer
Multiple runs showed no improvement of value over 100
epochs.
Path tracing was next.
Created a 10x10 matrix of the dataset (“CityID” list in the
contest terms). Then calculated the centroid for each focus
area defined by the matrix (0-99). Then created a path by
starting with North Pole (CityID of zero) and adding in focus
area by order of distance from centroid. Looped thru all the
areas starting with the one contained the North Pole
connecting all to the new path. This included trying to find a
prime and assigning to tenth step.
35 Million and then with “snake” loop 28 Million! Run time
about an hour.

https://github.com/alohawild/Raindeer
Again, no real improvements could be made including running
in improvedeer.py.
Expand the selection and allowed returning to greedy testing.
The logic selects the unused CityIDs that are closest to
centroid. This allows to find the next best one. The code is
arranged to use a parameter for this. New mode of “testing”
was added to routines to allow debugging—this gets a bit
obscure.
The results were a stunning 2.8 million with a 500 wide scan
(about 1/8 of the size of the list).
A comment from a data scientist and we dropped the prime
logic (likely making the reindeer unhappy) and scored on a
hour run on my apple of 1.9 million!
Code is deerpath.py and includes some commented out
prime code.

The Future…
The Concorde is still beating us….we could use it…Run times
of 7+ hours get 1.5M.
We know that a pure greedy process that runs a distance for
every point gets a 1.8 million run from an article in Kaggle. It
runs for a long time.
Starting on creating checking sub-paths and copying in the
best sub-path into the path. Calling this “quilting.”
We are looking at
https://en.wikipedia.org/wiki/Branch_and_bound as this may
be some help. Again we will code it. Resisting using
package…resisting….resisting…

Similar to Kaggle - Santa Gift Matching Challenge

Monitoring as Code: Getting to Monitoring-Driven Development - DEV314 - re:In...Amazon Web Services

Yahoo compares Storm and SparkChicago Hadoop Users Group

Traffic Congestion using IOTSayantanGhosh58

Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Codemotion

Follow the money with graphsStanka Dalekova

Arno candel scalabledatascienceanddeeplearningwithh2o_reworkboston2015Sri Ambati

SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData

Introduction to pythonRajesh Rajamani

Arno candel scalabledatascienceanddeeplearningwithh2o_odsc_boston2015Sri Ambati

Scalable Data Science and Deep Learning with H2Oodsc

Monte carlo methods in graphics and hackingHimanshu Goel

Algorithms - a brief introductionGiacomo Belocchi

Eclipse Con Europe 2014 How to use DAWN Science ProjectMatthew Gerring

Crunching Gigabytes LocallyDima Korolev

Automobile Route Matching with Dynamic Time Warping Using PySpark with Cather...Databricks

ArnoCandelScalabledatascienceanddeeplearningwithh2o_gotochgSri Ambati

Big Data Meetup #7Paul Lo

Systemof insightsuresh sood

Angular and Deep LearningOswald Campesato

Lecture38David Evans

Similar to Kaggle - Santa Gift Matching Challenge (20)

Monitoring as Code: Getting to Monitoring-Driven Development - DEV314 - re:In...

Yahoo compares Storm and Spark

Traffic Congestion using IOT

Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...

Follow the money with graphs

Arno candel scalabledatascienceanddeeplearningwithh2o_reworkboston2015

SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th

Introduction to python

Arno candel scalabledatascienceanddeeplearningwithh2o_odsc_boston2015

Scalable Data Science and Deep Learning with H2O

Monte carlo methods in graphics and hacking

Algorithms - a brief introduction

Eclipse Con Europe 2014 How to use DAWN Science Project

Crunching Gigabytes Locally

Automobile Route Matching with Dynamic Time Warping Using PySpark with Cather...

ArnoCandelScalabledatascienceanddeeplearningwithh2o_gotochg

Big Data Meetup #7

Systemof insight

Angular and Deep Learning

Lecture38

Recently uploaded

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

AI as an Interface for Commercial BuildingsMemoori

Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge

Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada

Key Features Of Token Development (1).pptxLBM Solutions

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

Build your next Gen AI Breakthrough - April 2024Neo4j

Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

CloudStudio User manual (basic edition):comworks

Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

Vulnerability_Management_GRC_by Sohang Sengupta.pptxnull - The Open Security Community

Recently uploaded (20)

Unleash Your Potential - Namagunga Girls Coding Club

Are Multi-Cloud and Serverless Good or Bad?

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

AI as an Interface for Commercial Buildings

Designing IA for AI - Information Architecture Conference 2024

Benefits Of Flutter Compared To Other Frameworks

Unblocking The Main Thread Solving ANRs and Frozen Frames

New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024

Key Features Of Token Development (1).pptx

Streamlining Python Development: A Guide to a Modern Project Setup

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

Build your next Gen AI Breakthrough - April 2024

Swan(sea) Song – personal research during my six years at Swansea ... and bey...

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

CloudStudio User manual (basic edition):

Unlocking the Potential of the Cloud for IBM Power Systems

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...

Vulnerability_Management_GRC_by Sohang Sengupta.pptx

Kaggle - Santa Gift Matching Challenge

1. Knowledge Mavens Who are we and what we do We’re a group of Scientists, Artist, Engineers, Musicians …aka Polymaths We “Show and Tell” every Saturday 1pm in Beaverton Our mission is “Free Knowledge”

2. https://www.kaggle.com/c/traveling-santa-2018-prime-paths A version of the Traveling Salesman problem but with reindeer, Santa, and a carrot issue to make it more interesting. In short travel all the dots in the picture once for the shortest path, but on the tenth stop if the point number is not a prime number then there is 10% increase (the reindeer did not get the expected carrot reward and thus are a bit slower). Reward is $7000 for best with other prizes.

3. http://www.math.uwaterloo.ca/tsp/concorde.html “Concorde is a computer code for the symmetric traveling salesman problem (TSP) and some related network optimization problems. The code is written in the ANSI C programming language and it is available for academic research use; for other uses, contact William Cook for licensing options.” Concorde scores about 1.5M path for contest. About 900 teams have about the same score.

4. https://github.com/alohawild/Raindeer Our team, Wildteam, first coding in Python 3 is to just get the data in and out (and to remember how Python 3 coding). Our first run was to create the first basic dataset. We managed to get our first rating, about 450 million units. We shared our results with the reindeer folks and they were less than excited. That would require the reindeer to travel, in the four-hour period of our normal allowed delivery window, 1.8 million units (or so) a second. We then created a sorted by X,Y list and inserted a prime every tenth step. This was about 203 Million. Run time less than a minute. We then wrote a Monte Carlo program with simple greedy selection: 73 M. This was with one hour run time on my Apple with more than 100 epochs. Code is improvedeet.py. Run time about an hour.

5. https://github.com/alohawild/Raindeer Multiple runs showed no improvement of value over 100 epochs. Path tracing was next. Created a 10x10 matrix of the dataset (“CityID” list in the contest terms). Then calculated the centroid for each focus area defined by the matrix (0-99). Then created a path by starting with North Pole (CityID of zero) and adding in focus area by order of distance from centroid. Looped thru all the areas starting with the one contained the North Pole connecting all to the new path. This included trying to find a prime and assigning to tenth step. 35 Million and then with “snake” loop 28 Million! Run time about an hour.

6. https://github.com/alohawild/Raindeer Again, no real improvements could be made including running in improvedeer.py. Expand the selection and allowed returning to greedy testing. The logic selects the unused CityIDs that are closest to centroid. This allows to find the next best one. The code is arranged to use a parameter for this. New mode of “testing” was added to routines to allow debugging—this gets a bit obscure. The results were a stunning 2.8 million with a 500 wide scan (about 1/8 of the size of the list). A comment from a data scientist and we dropped the prime logic (likely making the reindeer unhappy) and scored on a hour run on my apple of 1.9 million! Code is deerpath.py and includes some commented out prime code.

7. The Future… The Concorde is still beating us….we could use it…Run times of 7+ hours get 1.5M. We know that a pure greedy process that runs a distance for every point gets a 1.8 million run from an article in Kaggle. It runs for a long time. Starting on creating checking sub-paths and copying in the best sub-path into the path. Calling this “quilting.” We are looking at https://en.wikipedia.org/wiki/Branch_and_bound as this may be some help. Again we will code it. Resisting using package…resisting….resisting…

8. Questions? Thoughts? Concerns?

Kaggle - Santa Gift Matching Challenge

Recommended

Recommended

More Related Content

Similar to Kaggle - Santa Gift Matching Challenge

Similar to Kaggle - Santa Gift Matching Challenge (20)

Recently uploaded

Recently uploaded (20)

Kaggle - Santa Gift Matching Challenge