Ian Ouellette has over 7 years of experience in data science roles developing predictive models, performing data analysis, and deploying solutions across various industries. He has experience with programming languages like Python and R, as well as tools like SQL, Hadoop, and machine learning algorithms. His background also includes biostatistics research analyzing health outcomes.
Enterprise Analytics: Serving Big Data Projects for Healthcare
20170110_IOuellette_CV
1. IAN OUELLETTE
Norwalk CT - (401) 741-7834 - imouellette81@gmail.com
Experience
Senior Data Scientist – Advanced Analytics - Frontier Communications - May 2016 - Present
Data Scientist – May 2015 – April 2016
● Development of customer specific models in Customer Lifetime Value, actionable/non-pay risk & cross/upsell propensity domains
enabling strategy efforts in new & upsell/cross acquisition, entry-risk mitigation, retention & save offers to be proactive and targeted
● Deployment of models into automated production stacks integrating Python, SQL, R workflows along with error handling, defensive
programming & logging system ensuring low overhead, proper handling of anticipated errors, easy debugging of unanticipated errors
● Pioneered analysis of internet usage data leveraging Hadoop producing the Bandwidth Abuse model allowing an actionable remedy to
high value customer user experience compromised from low value customer abuse
● Development & deployment of call center wav/xml archive crawler to identify, extract, translate via Google Speech Api. Analysis of
post connection holds, transfers and conferences providing insight to rep-customer interaction besides simple call counts
● Performed market competition/performance clustering of logical operating geographic polygons which enhanced accuracy of risk
models given an individual market’s profile by eliminating misclassification in “no competition” markets
● Linked churn behavior to Net Promotor Score of surveys leading to an internal NPS variant metric for customer experience
improvement tracking more closing to existing customer behavior rather than multi-industry averages
● Co-authored and designed Customer Analytics database schema to format data from many product oriented, transactional & disparate
data sources which resulted in customer centric data records formatted for rapid data analytics rather than legacy IT formats. Sources
included customer attributes, interactions (billing, payment, purchases, and terminations), internet usage, and behavior events
● Exploratory & ad-hoc analysis including A/B testing, trend normalization/pro-forma estimation, common trouble ticket cause analysis &
geographic proximity mapping
● Relentless investigation of source data (in current, legacy and uber-legacy systems) to gain understanding of real-life scenarios
producing observed data points enabling knowledge based exploitation instead of flawed "big data" blanket approaches
● Development and publication of data definitions ensuring consistent data sourcing techniques and analysis approaches across groups and
between analysts producing apples to apples comparisons
● Puzzle solving & joining data from many repositories that don't always want to play nicely together
● Internal networking among data domain players to discover new data repositories and integrate these sources into the advanced analytics
workflow
Statistician – Northeast Program Evaluation Center - Department of Veteran Affairs – January 2014 – April 2015
● Performed outcomes & cost analysis on cost intensive VA mental health programs to assess effectiveness and the resulting cost
reduction providing senior management with data to continue, modify or improve the program
● Deployed an automated quarterly reporting system leveraging SQL, R and LaTeX to custom generate audience specific text, results and
figures for 180+ sites delivering zero lag time statistical and accounting feedback to the field on a rapidly expanding data source and
saving 250+ man hours every quarter by eliminating the existing manual process
● Developed and deployed an MS Outlook COM object parsing python program to extract, parse, concatenate, and archive (as csv) over
30,000 XML attachments saving hundreds of man hours and eliminating human error due to 30,000 manual repetitions
Research Associate – Immunobiology – International AIDS Vaccine Initiative – March 2010 – December 2013
● Utilized a Systems Biology approach, leveraging machine learning techniques, to identify variables of importance and potential
immunological mechanisms in vaccine design from an inefficient trial-by-error process to a more data driven process
● Performed statistical analysis of studies for publication, internal hypotheses, and improved assay reliability/experiment design
Consulting
● Trisonic - Constructed a Multiple Multivariate Poisson regression model to predict monthly sales of 2956 items to assist with inventory
stocking allowing for better utilization of personnel resources in more mission critical tasks
● Avatar Biotechnologies, LLC - Conducted nonparametric survival analysis of mice studies for publication
● BioAssets Development Corporation – Increased lagging drug trial patient enrollment via modeling inclusion/exclusion eligibility on
various patient population centers allowing the company to reach key milestones in a $42.5 million dollar sale
Education
Columbia University - Master of Science, Biostatistics
University of Connecticut – Bachelor of Science, Molecular and Cell Biology
Coursework - Generalized Linear Models & Regression, Linear Regression, Survival Analysis, The Randomized Clinical Trial,
Design of Medical Experiments, Categorical Analysis, Statistical Inference, Probability, Introduction to Biostatistics, Calculus; I, II,
III, Physics with Calculus
2. Technical Skills
Programming – R (Procedural), Python (Procedural/OOP), SQL (T-SQL/bcp, PostgreSQL), Hive, Impala, MongoDB, bash
Data Analysis (R/Python/SQL)
Data retrieval and export – flat-files, RDBMS (ODBC, OLEDB), HTTP/FTP, HTTP web-scraping (request & HTML/XML parsing)
Big Data – in-memory and out-of-memory solutions including Hadoop (MapReduce, PySpark). Millions & Billions.
Working with data - data merging, sub-setting, reshaping, cleaning, transformation, aggregation, sampling, over-under sampling
Data presentation - simple, complex, or conditional visual plots with continuous and/or categorical data
Simple Statistics - descriptive statistics, parametric and non-parametric hypothesis tests, continuous & categorical analysis
Regression – OLS, GLS, WLS, polynomial, variable selection methods, shrinkage methods (Ridge, LASSO), derived input methods
(PCR, PLS), robust (LAD, MAD, Huber's), resistant (LQS, LMS, LTS), Poisson, Cox, GEE, Mixed Effects
Classification - Logit, probit, cloglog, multinomial, proportional odds, adjacent category, continuation ratio, Survival, rare events
Machine Learning – GAM, Decision Trees, Bagging/Boosting, Random Forests, PRIM, MARS
Clustering – traditional and robust hierarchical & partitioning clustering
Model Assessment - residual diagnostics, dispersion adjustment, hypothesis testing, model comparison, goodness-of-fit
Model Validation - training, validation and training set method, K-fold/leave-one-out Cross-Validation, bootstrap
Computing – Linux/Windows, Hadoop, Alteryx, vim, git, LaTeX, SSH, job scheduling (crontab/task scheduler), port forwarding, R
server, Shiny Server, MS SQL Server, Postgres server, MySQL server, MongoDB, FTP server, Spotfire, I love Regex
Publications
Ross W. Lindsay, Ian Ouellette, Heather E. Arendt, Jennifer Martinez, Joanne DeStefano, Mary Lopez, George N. Pavlakis, Maria J.
Chiuchiolo, Christopher L. Parks, C. Richter King. "SIV antigen-specific effects on immune responses induced by vaccination with DNA
electroporation and plasmid IL-12." Vaccine 2013 Oct 1;31(42):4749-58. Print.
R.L.R. Powell, I. Ouellette, R. Lindsay, C. Parks, C.R. King, A. McDermott, G. Morrow. “A multiplex microsphere-based immunoassay
increases the sensitivity of SIV-specific antibody detection in serum samples and mucosal specimens collected from rhesus macaques
infected with SIVmac239.” BioResearch Open Access 2013 Jun;2(3):171-8. Print.
N. Winstone, A. J. Wilson, G. Morrow, C. Boggiano, M. J. Chiuchiolo, M. Lopez, M. Kemelman, A. A. Ginsberg, K. Mullen, J. W.
Coleman, C.-D. Wu, S. Narpala, I. Ouellette, H. J. Dean, F. Lin, N. Y. Sardesai, H. Cassamasa, D. McBride, B. K. Felber, G. N. Pavlakis, A.
Schultz, M. G. Hudgens, C. R. King, T. J. Zamb, C. L. Parks, and A. B. McDermott. "Enhanced Control of Pathogenic Simian
Immunodeficiency Virus SIVmac239 Replication in Macaques Immunized with an Interleukin-12 Plasmid and a DNA Prime-Viral Vector
Boost Vaccine Regimen." Journal of Virology 2011 Sep;85(18):9578-87. Print.