Bei diesem Hackathon sollen Ideen mit Artifical-Intelligence-Bezug in 24 Stunden umgesetzt werden. Bildet Teams von 2-5 Personen und baut rund um die Uhr an eurem Prototypen. Die fertigen Projekte werden dann am Schluss vor einer Jury vorgestellt und kämpfen um das Preisgeld im Wert von 10.000 €.
https://devpost.com/software/laserchallenge
2. Data preprocessing & visualisation
● Clean up data (delete NaNs, Norm …)
● Grouping the data by ID
○ ID 00* - Basic
○ ID 01* - Sheet
○ ID 02* - Part
○ ID 03* - Pin
● Used t-SNE to visualize data clusters
4. First approach - 5 Classificators
● 2 Classes
○ 68600 Good examples
○ 52400 Bad examples
○ Balanced ?
● Train 5 classificators
● Default hyperparameters
● -> Overfitting...
5. Second approach - Feature Selection
● Feature selection
○ Feature cross correlation & Manual
via histogram
○ 89 -> 16 features
● 0.85 accuracy & still overfitting
●
●
●
overlap separable?
6. Possible explanation for overfitting
● Many columns
describe one .LST
file
● 607 unique .LST
files
7. Third approach - Gradient Boosting
● CatBoost (Gradient boosting
on Decision Trees)
● BUT the same .LST file must
not be in test and train same
time
● Accuracy only 61%, but no
overfitting to .LST files!
Most relevant features
8. Results
● Specificity > 98% (Minimized False Positive Rate)
● Reduced feature set from 100 to 16
● Tunable decisions by adjusting features in our selected feature set
● Simple model interpretations through reduced feature set
In a nutshell:
If model predicts 1 (success), one can be almost sure it’s true
→ Potential usage: choose best parameter set from a list of possible
process parameters, almost 98% success chance
9. Experimental approach - Genetic Algorithm
● Use GA for feature selection
● Evaluated with logistic regression
● Own implementation - Recombine top 1/2/3 specimen of each
generation semi-randomly for each new feature vector
● Progress from 50% accuracy to 58% within less than two hours
● training still in progress
11. In a nutshell:
If model predicts 1 (success), one can be almost
sure it’s true
→ Potential usage: Choose best parameter set
from a list of possible process parameters,
almost 98% success chance
12. Data insights - Feature selection
● Threshold feature cross correlation &
Manual via histogram
● 89 -> 15 features
overlap separable?
13. The challenge
● Goal: Problems with the part removal should be avoided
○ Binary decision problem
● No geometry of the parts
○ 89 derived characteristics from parts, pins, laser cutter ...