Personal Information
Organization / Workplace
United States United States
Occupation
Data Scientist
Industry
Technology / Software / Internet
About
I lead the development and deployment of scaleable models, with expertise in both real-time and big data architecture.
= Apache: Spark, Hadoop, Pig, Hive, and Oozie.
= Python: scikit-learn, pandas, NumPy, and Luigi.
= R: PivotalR, madlib, Time Series Analysis with X12-ARIMA.
= Modeling: MLLib, H2O, yhat, Sense
= Machine Learning: Random Forests, Clustering, Association Rules, and Logistic Regression.
= Software Development: Streaming, Distributed Systems, REST APIs.
= Visualization: Matplotlib, ggplot2, Seaborn, and D3.
= Database: Hive, Postgres, SQL
I build data science pipelines and frameworks (see my presentations below).
Tags
model
classification
machine learning
kaggle
predictive analytics
analytics
data science
software
scikit-learn
logistic regression
xgboost
tensorflow
pipeline
pandas
python
gradient boosting
random forest
framework
stock market
regression
market analysis
change point
nfl
fantasy
sports
See more
Presentations
(3)Likes
(4)AlphaPy
Robert Scott
•
7 years ago
kaggle_meet_up
Marios Michailidis
•
7 years ago
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Vivian S. Zhang
•
8 years ago
General Tips for participating Kaggle Competitions
Mark Peng
•
8 years ago
Personal Information
Organization / Workplace
United States United States
Occupation
Data Scientist
Industry
Technology / Software / Internet
About
I lead the development and deployment of scaleable models, with expertise in both real-time and big data architecture.
= Apache: Spark, Hadoop, Pig, Hive, and Oozie.
= Python: scikit-learn, pandas, NumPy, and Luigi.
= R: PivotalR, madlib, Time Series Analysis with X12-ARIMA.
= Modeling: MLLib, H2O, yhat, Sense
= Machine Learning: Random Forests, Clustering, Association Rules, and Logistic Regression.
= Software Development: Streaming, Distributed Systems, REST APIs.
= Visualization: Matplotlib, ggplot2, Seaborn, and D3.
= Database: Hive, Postgres, SQL
I build data science pipelines and frameworks (see my presentations below).
Tags
model
classification
machine learning
kaggle
predictive analytics
analytics
data science
software
scikit-learn
logistic regression
xgboost
tensorflow
pipeline
pandas
python
gradient boosting
random forest
framework
stock market
regression
market analysis
change point
nfl
fantasy
sports
See more