Research Triangle Analysts          rtpanalysts.org • Intro to Kaggle.com • Titantic Getting Started Competition • Predict...
Classification Problems• 2- levels or outcomes• Data       Model      Predictions• Examples  – Find customers who are like...
Classifier - Trees• Decision Trees                            All                        Passengers          Female]      ...
Classifier - Logistic Regression• Equation – Logistic Regression• F(x) = sigmoid(age+class-embarked+gender)               ...
Titanic Data• Passenger List  – Name, class, fare, embarked, family    members, age, cabin, etc  – Survival• Training Set ...
Kaggle.com• Data• Tutorials  – Tools – Excel, Python  – Models – Trees, Random Forests• Submission• Leaderboard           ...
Where to Start• create a Kaggle account  http://www.kaggle.com/account/register• read and agree to the rules if you choose...
Benefits• Extended Data Shoot-Out• Tailor participation• Opportunities  -   New classifiers  -   New tools, languages  -  ...
Upcoming SlideShare
Loading in …5
×

Titanic prediction

1,798 views

Published on

Info from rtpanalysts.org lunch discussion.
I am not affliated with Kaggle. See kaggle.com for info about their competitions.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,798
On SlideShare
0
From Embeds
0
Number of Embeds
16
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Titanic prediction

  1. 1. Research Triangle Analysts rtpanalysts.org • Intro to Kaggle.com • Titantic Getting Started Competition • Prediction Problem with two outcome Levels • Opportunity for an extended Data Shootout with Kaggle.com providing data, scoring, tutorials, forums. • Public domain data allows for detailed discussion of modeling issues and solutions without client data confidentiality concerns. • A common ground for in depth learning and debates on analytics topics. • Participants of all levels of expertise welcome • You influence the direction of this effort by your participation. Post questions and thoughts on rtpanalysts.org . • Welcome!Slides by Linda Schumacher. Contact via Research Triangle Analysts LinkedIn group member list 1
  2. 2. Classification Problems• 2- levels or outcomes• Data Model Predictions• Examples – Find customers who are likely to buy product – Id patients likely to be admitted to hospital – Categorize cells as cancerous or benign – Who survives the Titanic disaster? Slides by Linda Schumacher. Contact via Research Triangle Analysts LinkedIn group member 2 list
  3. 3. Classifier - Trees• Decision Trees All Passengers Female] Male SecondFirst Class Age < 16 Age >= 16 Third Class Slides by Linda Schumacher. Contact via Research Triangle Analysts LinkedIn group 3 member list
  4. 4. Classifier - Logistic Regression• Equation – Logistic Regression• F(x) = sigmoid(age+class-embarked+gender) Slides by Linda Schumacher. Contact via Research Triangle Analysts LinkedIn group 4 member list
  5. 5. Titanic Data• Passenger List – Name, class, fare, embarked, family members, age, cabin, etc – Survival• Training Set of 891 Passengers• Test Set of 418 Slides by Linda Schumacher. Contact via Research Triangle Analysts LinkedIn group 5 member list
  6. 6. Kaggle.com• Data• Tutorials – Tools – Excel, Python – Models – Trees, Random Forests• Submission• Leaderboard Slides by Linda Schumacher. Contact via Research Triangle Analysts LinkedIn group 6 member list
  7. 7. Where to Start• create a Kaggle account http://www.kaggle.com/account/register• read and agree to the rules if you choose to continue• enter the Kaggle Titantic Competition http://www.kaggle.com/c/titanic-gettingStarted• download train.csv and test.csv• If you choose to use R, obtain-download R from http://www.r-project.org/ You will have to choose a ‘mirror’ or site – usually a university or research site• If you share code or data outside of your Kaggle team, be sure to post a copy on Kaggle Titanic Forum see http://www.kaggle.com/c/titanic- gettingStarted/details/rules Slides by Linda Schumacher. Contact via Research Triangle Analysts LinkedIn group 7 member list
  8. 8. Benefits• Extended Data Shoot-Out• Tailor participation• Opportunities - New classifiers - New tools, languages - Training vs test error - Round Table Discussion of Solutions - Compare model results Slides by Linda Schumacher. Contact via Research Triangle Analysts LinkedIn group 8 member list

×