Semantically Enriched Knowledge Extraction With Data Mining
resume_MH
1. Mengling Hettinger
6501 Silver Ln
Plano, TX 75023
menglingzhang@gmail.com
Summary
In applying for this position, I will be utilizing knowledge and problem solving skills that I acquired through my
career at AT&T big data group and PhD degree in physics. I developed in-depth computer programming and
statistical analysis skills during my working and learning experience. Variety of supervised and unsupervised
projects I have worked on provide me with hands-on experience in applying machine learning techniques to both
standard and large data sets. My extensive experience working on real data analysis using tools like R, python,
Pig, Hive and H2o, my academic background, as well as the communication skills I developed from teaching
physics courses to non- physics major students, will enable me to bring a full range of skills to the position.
Employment History
August 2014 – Present: Professional Data Scientist at AT&T Big Data
May 2014 – August 2014: Data Science Intern at AT&T Big Data
August 2009 – April 2014: Research assistant/Teaching assistant at Michigan State University
Education
September 2009 – December 2014: PhD in physics at Michigan State University
Thesis: “Fluctuations in superconductors above paramagnetic limit”
Summer School
July 8th – 10th, 2013 VSCSE Data Intensive Summer School
Core Competencies
Strategic Thinking: From rich data sets, be able to create and implementing the strategic direction of the
company which leads to the growth including revenue and profits.
Modeling: Design and implement statistical/predictive models using cutting edge algorithms to predict demand,
risk and price elasticity, find association rules and implement cluster analysis
Analytics: Utilize analytical applications like R or python or data mining packages to identify trends and
relationships between different pieces of data, draw appropriate conclusions and translate analytical findings into
risk management and marketing strategies that drive value.
Data munging: Experience with collecting, cleaning, augmenting and transforming data using scripting
languages such as Python. Semantic technologies are used to discover structures from unstructured data.
Communications and Project Management: Capable of turning dry analysis into plots which are informative
and easy to visualize. Collaborate with teams to develop and support data platform and analyses.
Professional Skills
Programming languages : Python, R, Matlab, Java, SQL
Large Data: NoSql, Hadoop, Pig, Hive, H2O
2. Platform:Unix
Statistics: statistical model, data analysis, Bayesian statistical methods
Machine learning and data mining: predictive modeling, cluster analysis, association analysis, anomaly
detection
Data Mining Software: H2o, Weka, SVMlight, LibSVM, Cluster (R package), igraph (R package)
Mathematical physics: linear algebra, differential equation, Fourier transformation, calculus
Data Mining Experience
list of a couple examples that I worked on (https://github.com/MenglingHettinger):
KDD 2012 Weibo data: I use user profile and item keyword information, calculate the distance between the
user's keyword and item's keyword, extract user's personal information used as features to make predictions for a
given item that recommended to a user. The total training set has 73 million records and the testing dataset is 1
million records. The correct prediction rate is 64%.
Amazon co-purchasing network data from the Stanford Large Network Dataset Collection are used to
reproduce the groups of each products. 548,552 products in meta data and 403,394 nodes and 33,873,398 edges
in co-purchase data are analyzed. Due to the large dataset, K means and CLARA algorithms are used in both
pure link analysis and additional features extracted from the product meta-data.
AT&T Fleet Preventative Maintenance: AT&T has one of the largest fleet in the nation with 75000 vehicles.
We collect demand repair data (>2 million records/year), weather data and refueling data daily and sensor data
every 2 minutes (10 Gb/day). Using these data sources, I have successfully built battery failure prediction model
using the combination of random forest and time series analysis with an 82% accuracy. I also built a integrated
model to predict all the the subsystem failures for a given vehicle. I also helped my coworkers to improve the
brake system model, fuel systems and other parts of the analysis.
Courses Related to Data Science
CSE 881 Data Mining : Predictive Modeling (Classification)/Association Analysis/Cluster Analysis/Anomaly
Detection/Network Mining
CSE 891 Computational Techniques for Large Scale Data Analysis: Programming skills and tools for
collecting, storing, querying and analyzing large scale data/General concepts and methods for large-scale
computational data analysis
Qualification/Certification
August 2013: Machine Learning
Organization: PROVIDED BY STANFORD UNIVERSITY THROUGH COURSERA INC.
May 2014: The Data Scientist's Toolbox
Organization: PROVIDED BY STANFORD UNIVERSITY THROUGH COURSERA INC.
May 2014: R Programming
Organization: PROVIDED BY STANFORD UNIVERSITY THROUGH COURSERA INC.
May 2014: Getting and Cleaning Data
Organization: PROVIDED BY STANFORD UNIVERSITY THROUGH COURSERA INC.
Immigration / Work Status
Permanent Resident - United States