The document outlines a data science roadmap that covers fundamental concepts, statistics, programming, machine learning, text mining, data visualization, big data, data ingestion, data munging, and tools. It provides the percentage of time that should be spent on each topic, and lists specific techniques in each area, such as linear regression, decision trees, and MapReduce in big data.
Data Science
Google Trends
GoogleNGRAM
Data Science: interest by country
Data Science
Data Scientist
Data Science Roadmap
Data Science Programming Language
Data Science
Descriptivestatistics
Probability theorem
Random variables
Bayes theorem
Histograms
Continues distributions
ANOVA
Monte Carlo method
Kernel density
Regressions
Correlations
Euclidian distance
Least Fit
10%
Data Science
Textanalysis
Name entity recognition
Corpus
Market based analysis
Feature extraction
Using Mahout/UIMA
Using Weka
Using NLTK
Classify text
40%
Data Science
DataExploration in R/Python
Uni, Bi & Multivariate Viz
ggplot2
Histogram & Pie (Uni)
Tree & Tree map
Scatter plot (Bi)
Line Charts (Bi)
50%
23.
Data Science
SpecialCharts
Time line
Decision Tree
D3.js
Tableau
Matplotlib, vispy, bokeh, seaborn, pygal, folium, and networkx
50%
Data Science
Mapreduce Fundamentals
Hadoop Components
HDFS
Data replication principles
Name and data nodes
Using Mahout
Cassandra
MongoDB
70%