This slides briefly describe how to crack a data challenge for beginners. It provides justified resources for beginners to start their data science journey with little pain.
2. Data Science
Venn Diagram
• Data Science Challenge
– Hacking skill and Math and Statistics Knowledge used to be more
important
– Substantive expertise
• Used to be not important, because common sense and experience works very well
and algorithms matter
• Becomes more and more important, because algorithms are off-the-shelf
• Feature engineering are very important
• Data Science -> 3 skills are
equally important
3. 0 to 1
• Hacking skills
– Learn Python or R
• Math and Statistics Knowledge
– Learn Machine Learning algorithms
• Substantive Expertise
– Practice data challenges in data challenge platforms like Kaggle and
DEXTRA, and learn from other data scientists
– Attend data science meetup like DataScience SG and R user group –
Singapore, and learn from speakers and attendents
4. Hacking skills (Python or R)
• Overall suggestion:
– First, pick up one and master it
– Second, learn the other
– Finally, use them together to complement each other
• Learn Python:
– Google Python Class, and videos series Introduction to Python in
YouTube
– Learn Python packages numpy, sklearn, and pandas when practicing
data challenges
• Learn R:
– Coursera “Data Science Specialization by Johns Hopkins University”
5. Math and Statistics Knowledge
• To start:
– Coursera course “Machine Learning by Andrew Ng”
– YouTube videos of other algorithms not covered above
• To advance:
– Stanford Machine Learning course by Andrew Ng
– Practice data challenges in Kaggle or DEXTRA
6. Substantive Expertise
• To start:
– Not very important
– Usually common sense and existing experience will work
– Do some research on the data challenges
• To advance:
– Practice more data challenges and learn from other data
scientists
7. Get hand dirty
-- 1st blood
• Start with simplest data challenge in Kaggle and DEXTRA
• Kaggle
– Titanic: Machine Learning from Disaster
– Very detailed procedures on how to crack the challenge
– Follow those procedures, gain experience
• DEXTRA
– Knowledge & Practice: Titanic Survival Prediction Challenge
– Similar to Kaggle’s, but using different evaluation metric
– By comparing two, you’ll understand the importance of evaluation
metrics
8. Try hard challenges
• Try different types of challenges
– Classification
– Regression
– Clustering
• Understand different types of evaluation metrics
– Classification
• Precision, Recall, F-Score, Accuracy, Log Loss
– Regression
• Root Mean Squared Error, Root Mean Squared Logarithmic Error
– Clustering
• Complicated
9. Find a data analyst/scientist job
• Practice data analytics with real, unmasked data
• Gain substantive expertise in your domain
• Work with colleagues of different specializations
• Understand the whole pipeline of data processing,
analytics, visualization and results delivering
10. Thank You
• Final conclusion and suggestion
Practice, Practice, and Practice
• Slides available here:
http://www.slideshare.net/zhaoqf123
http://www.slideshare.net/zhaoqf123/crack-
data-science-challenges-0-to-1
Editor's Notes
Because algorithms are off-the-shelf. People are using the same algorithms