* Link of the talk: https://www.youtube.com/watch?v=47ZTn8saHTo
Data Science is a multidisciplinary field consisting of Statistics, Mathematics, Computer Science and even some specific domain knowledge based on what problem you are solving. It’s often the case that a single individual cannot be equipped with all the requirements when entering the field.
This talk is mainly about how I, a non-CS major with pure Mathematics and Statistical background, worked through 2 startups as a Data Scientist/ Machine Learning Engineer and along the way gradually picked up those missing skillsets. I will also share what it is like to be a Data Scientist in a startup and some essential skills I found necessary if you are seeking for similar opportunities.
Presentation on how to chat with PDF using ChatGPT code interpreter
Stepping into the AI Wave - Words from an Industry Newbie
1. Stepping into the AI Wave:
Words from an Industry Newbie
Sept. 29, 2017 @ AI Web Talk Series
Joanne Tseng
joanne@appdiff.com
2. About me
! B.S. Degree in Mathematics and Statistics(DM), NCKU, Taiwan
! 2.5 years of working experience as a Machine Learning Engineer/ Data Scientist
! SVIP (Silicon Valley Internship Programme) 2016-2017
! Data Scientist @ Appdiff Inc.
Joanne Tseng
joanne@appdiff.com
3. SVIP (Silicon Valley Internship Programme)
! Non-profit organization based in the UK
! gives newly graduating students one year full time internship @ Silicon Valley Startup
! partnership with Girls In Tech (GIT) - open opportunities to women around the world
Joanne Tseng
joanne@appdiff.com
4. This talk is about…
! My self-directed learning process
! The project I’m doing right now @ Appdiff
! Q&A
Joanne Tseng
joanne@appdiff.com
5. Three years ago...
- I was in my senior year at the university
- with mathematics and statistics background
- The first time I heard of the term “machine learning”
- No coding background
- Interested in data analysis
self-directed learning process My project @ Appdiff Q&A
FYI:
Machine Learning (from
Wiki): is a field of computer
science that gives computers
the ability to learn without being
explicitly programmed.
Joanne Tseng
joanne@appdiff.com
6. I was wondering about two questions...
- What kind of job I can get/ I might do in the future if I’m interested in data analysis?
- If I want to start my master degree, which area/ field I should go for?
self-directed learning process My project @ Appdiff Q&A
Joanne Tseng
joanne@appdiff.com
7. I was wondering about two questions...
- What kind of job I can get/ I might do in the future if I’m interested in data analysis?
- If I want to start my master degree, which area/ field I should go for?
self-directed learning process My project @ Appdiff Q&A
Ask graduated seniors !!
Joanne Tseng
joanne@appdiff.com
8. Suggestions I got
- Learn python language
- Take some basic CS courses including Algorithms and Data Structure
- Take machine learning course
self-directed learning process My project @ Appdiff Q&A
Joanne Tseng
joanne@appdiff.com
10. The starting point of my self-directed learning
- Learn python language
- Take some basic CS courses including Algorithms and Data Structure
- Take machine learning course
self-directed learning process My project @ Appdiff Q&A
Joanne Tseng
joanne@appdiff.com
11. Self-directed learning: online courses
- Good side and the downside: TOO FLEXIBLE
- Hard to persist
self-directed learning process My project @ Appdiff Q&A
Joanne Tseng
joanne@appdiff.com
12. Starting is easy, persistence is an art
- To have a study group!!!
- Plan the long term and short term schedule
- Have meeting regularly
- Can’t find study group members: recommend meetup.com
self-directed learning process My project @ Appdiff Q&A
Joanne Tseng
joanne@appdiff.com
13. My study group (three years ago...)
- The long term path we followed:
https://www.springboard.com/learning-paths/data-analysis/learn/
self-directed learning process My project @ Appdiff Q&A
Joanne Tseng
joanne@appdiff.com
14. Suggested order to learn
- Learn Python on codecademy
- Use Ipython notebook and one basic Kaggle dataset to practice data analysis flow
- Take Algorithms course (recommend MIT course) and use python to do practices
- Try to use terminal to set up your python environment
- Try to use github to manage your code.
self-directed learning process My project @ Appdiff Q&A
Joanne Tseng
joanne@appdiff.com
15. After I started to work...
self-directed learning process My project @ Appdiff Q&A
Joanne Tseng
joanne@appdiff.com
16. After I started to work...
- Engineer’s Mindset
self-directed learning process My project @ Appdiff Q&A
Joanne Tseng
joanne@appdiff.com
17. Engineer’s Mindset
- Learn how to solve problems independently - aka. Google everything!
- Never stop learning
- Be a patient problem solver!! - Don’t afraid of having new bugs :)
self-directed learning process My project @ Appdiff Q&A
Joanne Tseng
joanne@appdiff.com
18. After I started to work...
- Engineer’s Mindset
- Having a new study group - keep doing self-directed learning
self-directed learning process My project @ Appdiff Q&A
Joanne Tseng
joanne@appdiff.com
19. Our study group right now
- Deep Learning
- Follow online courses
> Deep Learning A-Z™: Hands-On Artificial Neural Networks on Udemy
> (Next Course) Andrew Ng Deep Learning course (https://www.deeplearning.ai/)
- Have online meetup regularly (once per two weeks)
- If you are interested: find more info on https://dosudo.com/
self-directed learning process My project @ Appdiff Q&A
Joanne Tseng
joanne@appdiff.com
20. About my current company - Appdiff Inc.
- Building AI system for software testing
self-directed learning process My project @ Appdiff Q&A
Joanne Tseng
joanne@appdiff.com
21. What’s Software Testing
Software testing (from Wiki):
is an investigation conducted to provide stakeholders with information about the quality of the
software product or service under test. Test techniques include the process of executing a
program or application with the intent of finding software bugs (errors or other defects), and
verifying that the software product is fit for use.
e.g.
self-directed learning process My project @ Appdiff Q&A
Joanne Tseng
joanne@appdiff.com
22. Machine Learning Related Topics
- Build page classifiers and button classifiers
self-directed learning process My project @ Appdiff Q&A
Page Level: login page
Button Level: login button, facebook
signin button, password button etc.
Joanne Tseng
joanne@appdiff.com
23. Data Scientist @ startup company
- What you really do is closer to machine learning engineer
- Training classifier only accounts 30% of your job :)
self-directed learning process My project @ Appdiff Q&A
Joanne Tseng
joanne@appdiff.com
24. Data Scientist @ startup company
- What you really do is closer to machine learning engineer
- Training classifier only accounts 30% of your job :)
self-directed learning process My project @ Appdiff Q&A
How about the rest 70%?
Joanne Tseng
joanne@appdiff.com
25. Data Scientist @ startup company
- What you really do is closer to machine learning engineer
- Training classifier only accounts 30% of your job :)
- Building model training pipeline (40%)
Building the system of the cycle from getting label data → feature extraction →
model training → model evaluation → storage → label correction → getting new labels.
self-directed learning process My project @ Appdiff Q&A
Joanne Tseng
joanne@appdiff.com
26. Data Scientist @ startup company
- What you really do is closer to machine learning engineer
- Training classifier only accounts 30% of your job :)
- Building model training pipeline (40%)
Building the system of the cycle from getting label data → feature extraction →
model training → model evaluation → storage → label correction → getting new labels.
- Label collection by designing the experiment or implement simple label interface (15%)
self-directed learning process My project @ Appdiff Q&A
Joanne Tseng
joanne@appdiff.com
28. Skills I’m using
! Building data pipeline
- Language: Python
- Database: BigQuery, GCP API
- System Design
! Building ML classifiers
- python data libraries: pandas, numpy, matplotlib(plotting library),
nltk(Natural Language ToolKit), keras(for training Neural Networks Models),
scikit-learn(tools for data mining and data analysis)
self-directed learning process My project @ Appdiff Q&A
Joanne Tseng
joanne@appdiff.com
29. Reference
! Data Analysis Learning Path (https://www.springboard.com/learning-paths/data-analysis/learn/)
! Kaggle (https://www.kaggle.com/)
! Meetup (https://www.meetup.com/)
! Learn Python, codecademy
! Introduction to Algorithms, MITOpenCourseWare
! Machine Learning, Andrew Ng, Coursera
! Deep Learning A-Z™: Hands-On Artificial Neural Networks, Udemy
! Deep Learning Specializtion, deeplearning.ai
Joanne Tseng
joanne@appdiff.com