4. Where does your data come from?
stand up and get out, talk with people, read doc
library(ggplot2)
library(tabplot)
tableplot(diamonds)
Do not underes;mate
naïve tests…
Mistake 9
Do not check data quality
Package R Datacheck
9. Mistake 5
Keep it complex
Do not jump first on the fashion complicated method
Keep your method as simple as possible (focus on the ques8on)
Know the limits of this method
Compare the methods (caret, ROC)
Boosted Decision
Tree coupled to
neural network
Linear regression
Complexity comes at a price
(speed, error prone,
exper8se, amount of data)
Can you afford it?
Montréal Big Data Meetup 22nd March 2017 – Xavier Prudent – www.xavierprudent.com
14. DATA science associa8on: code of conduct
hNp://www.datascienceassn.org/code-of-conduct.html
Mistake 1
Ethics is a useless luxury
What are you doing? For whom?
What is the impact of your work?
- Company, society, yourself
- Short – long term
What type of data are you analyzing?
- Law & regula8on
- Privacy
Do you have any conflict of interest?
Montréal Big Data Meetup 22nd March 2017 – Xavier Prudent – www.xavierprudent.com
Tendency to focus on the technics, on the challenge
“Yes, but” answers?
15. CAST!
Xavier Prudent XAVIER PRUDENT
Organizer MICHAEL ALBO
The Audience ALL OF YOU
Technical Support OVH
Design-Photography CHRISTINE NAULLEAU
Special Thanks to George Lucas and to the
audience for their aNen8on
question? Comment?
Feel free to contact me:!
Xavier Prudent, prudentxavier@gmail.com!