Imtiaz Khan Data Engineer Resume

Imtiaz Khan
Hyderabad Email: imtiaz.a.khan@accenture.com
Mobile: +919493377607
A passionate engineer from R/Azure Machine Learning, Machine Learning and
Algorithms background with strong research interest, loves to play with large volume,
velocity and variety of data. Other experience includes web and windows application in C#,
visual studio and SQL.
Professional Summary:
 Intensive, hands-on experience on Data Analytics. Technical skills spanning from
Statistics to Programming including data engineering, data visualization, machine
learning and programming in R and SQL.
 Trained on most common data analysis problems that arise in most business verticals:
Classification, Regression, Recommender Systems, Clustering, Association
Analysis, Frequent Pattern Mining and Outlier Detection.
 Have a strong foundation in business analysis.
 Exposure to environment of Microsoft Technologies like .NET framework, and
Visual Studio IDE.
Key Skills and Technologies:
 Data Analytics: Predictive Analytics Problems: Supervised and Unsupervised
Learning
 R Language- RStudio
 Azure Machine Learning Studio
 Language Understanding Intelligent service(LUIS)
 packages required for data Science in R
 other IDE’s : Visual Studio 2010,2012
 Microsoft .NET Framework 4.0
 SQL Server Management Studio 2008, 2010,2012,2013
 Microsoft Excel 2010,2013
 Windows7, Windows 8.
Technical Skills:
Data Collection
Techniques
Excel/csv/tsv
Databases
Scraping
 Collecting data from Excel/csv/tsv files
 Collecting data from databases
 Collecting data via scraping
Data Preparation
Techniques
Structured Data
Preparation
 Data Type Conversion
 Category to Numeric Conversion
 Numeric to Category Conversion
 Data Normalization:0-1, Z-Score
 Handling Skew Data: Box-Cox Idea
 Handling Missing Data
Text Data Preparation  Normalizing Text
 Stop word Removal
 Whitespace Removal
 Stemming

 Building Document Term Matrix
Image Data Preparation  Converting to gray scale
 Pixel Value Normalization
 Building Pixel Intensity Matrix
Data Analytics Predictive Analytics  Classification
 Regression
 Recommenders
Machine
Learning
Algorithms
Classification and
Regression
KNN Model
Decision Tree Model
Naive Bayes Model
Logistic Regression
SVM Model
Recommenders Content based Recommendation
User-User KNN Model
Item-Item KNN Model
Latent Factor Model
Clustering Iterative Models
Hierarchical Models
Density Models
Outliers Detection  Probabilistic Model
 Density Model
 KNN Model
Association Analysis  Apriori Model
Mathematical
skills
Linear Algebra, Vector
Algebra, Probability ,
Calculus and Statistics
Matrix Algebra
 Understanding of factorization: Spectral
factorization, Eigen factorization, SVD
factorization
 Applications of matrices: image processing,
solving systems of equations, modelling
discrete systems
Probability
 Bayes Rule/Reasoning
 MAP vs. MLE Reasoning
 Properties of Random variables: expectation,
variance, entropy and cross-entropy, covariance
and correlation
 Understanding standard random processes
Probability Distributions: Normal, Gamma.
Parameter Estimation in Distributions: MAP and
MLE approaches

Statistics
 Descriptive stats for single variable
 mean, median, mode, quantiles,
percentiles
 standard deviation, variance
 MAD, IQR
 Descriptive stats for two variables
 covariance
 correlation
 chi-squared Analysis
Hypothesis Testing
Job Experience:
Accenture
Senior Software Engineer
Machine Learning Projects:
Project: Ticket Classification in Application Management using Natural Language
processing (NLP) and Recommendation Systems
Problem Domain: Tickets come in many forms from end users; how can we
reduce the time to classify and resolve these tickets
This project is aimed at enabling substantial reduction of cost-to-serve in Application
Management by reducing the number of tickets and considerable time to resolve
tickets through cognitive automation. Cognitive Automation can be used to help
developers, testers & project managers make better decisions for various tasks
during defect logging, defect resolution, and test execution phases. This would
reduce the effort in analysing and classifying the known issues, provide
recommendations of similar issues and automate the process of creating a Team
Foundation Server (TFS) Work item and assigning the right team member to fix the
issue.
Use Case 1: Issue to Issue
 Identification of similarities of the defect observed with historic issues.
 Removal of duplicate issues, if similar ones being worked upon.
 Consolidated view of similar defects, which can be targeted to fix together
with same fix.
Use Case 2: Issue to Resolver/Tester/Developer and (Assignee Recommendation)
 Enabling Build/Test Leads to identify the best candidate who can fix/test it
faster, based on analytics the tool provides.

 History of Testers and Developers who worked on similar fixes, to help decide
whom to assign the defect for faster turnaround.
Use Case 3: Root Cause Analysis
 Assist users in the identification and analysis of root causes underlying a
ticket using available log data and other information sources pertaining to
incident.
Cognitive Solution: High Level Approach
 When a new ticket arrives, parse its details and would map the ticket to earlier built
knowledge model and determine semantically highly similar tickets as duplicates
 Based upon user acceptance or rejection of the recommendation, tool incrementally
learn and improve its performance on the duplication process.
 If duplicate tickets do not exist, would list semantically related tickets in ranked
order together with degree of semantic associations.
Technology Support
 Language Understanding Intelligent Service (LUIS) – for Natural Language
Processing of the email content.
 Azure Machine Learning (ML) – For classifying the content and finding
recommendations.
Project: Churn Modelling in Telecom using R
This project is in the domain of telecommunication (prepaid segment) .It involves
voluntary churn modelling enabling the business personnel in understanding the business
problem i.e. voluntary customer churn (e.g. Drop in usage, movement to another
network, no revenue generation) is a significant concern in many service industries. One
way to decrease churn is to identify customers in advance who are at risk of churning and
target an incentive to encourage them to stay. However, this requires accurate predictions
about which customers are at risk. Churn is a term for customers quitting and joining another

service provider. Most telecom companies suffer from voluntary churn. Churn rate has strong
impact on the life time value of the customer because it affects the length of service and the
future revenue of the company.
This problem domain was termed as classification as it was to determine two possibilities
“1” (churn) or “0” (not churn). The two machine learning models in the interest were
logistic regression and Decision Tree. Logistic regression had an edge on the accuracy
giving around 78.4%. The business proactive action was to treat the customers based on the
revenue they are generating to the company. For example high revenue customers were given
regular calls and follow-ups and low revenue customers were emailed regularly.
Packages used in R
Outliers: This package used for detecting outliers in the data set.
VIM: This package is used for the visualization of missing and/or imputed values, which can
be used for exploring the data and the structure of the missing and/or imputed values
Rattle: A GUI that provides a graphical user interface specifically for data mining using R
Car: A function VIF (variance inflation factor, to determine multicollinearity between
predictors)
Rpart: A machine learning decision tree package for building the tree model.
ROCR: This package used in determining the AUC metric to evaluate across models.
Projects: Learning Projects
 A knowledge driven supervised learning approach to identify image of a
handwritten single digit, and determine what that digit is. (Kaggle.com)
This competition is aimed at identifying a handwritten image of a single digit and
determining what the digit is. K-Nearest Neighbours and Naive Bayes has been used
separately for prediction.
KNN performed better with an accuracy of 97%. The dataset containing different
parameters are first pre-processed to remove near zero variance parameters. The most
effective parameters are then filtered and used for prediction. Cross Validation (10-fold) is
used to create training and test sets.
 A Supervised learning approach to identify an insulting comment
The challenge is to detect when a comment from a conversation would be considered
insulting to another participant in the conversation. Naive Bayes has been used for
prediction. The dataset containing different parameters (terms) are first pre-processed so as
to normalize, to remove punctuations, stop words, numbers, punctuations and stemmed
words. The most effective parameters of the bigram terms obtained are then filtered and
used for prediction. Cross Validation (10-fold) is used to create training and test sets

 Predict survival on the Titanic (Kaggle.com)
This competition is aimed at analysing what sorts of people are likely to survive or
applying different tools of Machine learning to predict which passengers survived the
tragedy. Logistic regression and random forest has been used separately for prediction
and gave same ranking (0.77990). The dataset containing different parameters are first pre-
processed to impute missing values. The most effective parameters are then filtered and
used for prediction. Cross Validation (10-fold) is used to create training and test sets.
Responsibilities:
1. Collected data from Excel/csv/tsv files, databases, services, web scrapping.
2. Performed data normalization using Z-score and max min normalization, smoothing
skew data and missing data through box-cox transformation.
3. Conducted exploratory and descriptive data analysis for large data sets
4. Explored features using univariate(mean, median and mode), bivariate(Covariance
and correlation) and multivariate(using R package: ggplot2) relationship by stats
quantities.
5. Applied Dimensionality Reduction, Image Compression using Principle component
Analysis (PCA).
6. Used Logistic regression model in Titanic Survivor dataset to measures the
relationship between the categorical dependent variable and independent variables by
estimating probabilities.
7. Implemented K-Nearest Neighbors (KNN) to identify digit in hand written image
dataset (Image Processing).
8. Identified text and sentiment analysis posted in social media network to classify the
indignity of a comment via Naïve Bayes approach
9. Improved the model build through K-fold cross validation
Accenture
Senior Software Engineer
Projects
The projects mentioned below are about developing a Fare Management Solution using
multiple technologies. We have implemented this Solution using BizTalk, SharePoint, BI and
Dynamics AX. AX supports transactional, Finance and back office support for the Solution.
We have integrated Dynamics AX with Payment providers, BizTalk and online website using
AIF and WCF services.
Project -3: Accenture Fare Management Solution
Senior Software Engineer (May 2013 – May 2015)
1. Worked on Coded UI Tool which is integrated with Visual studio, written scripts,
developed framework which would enable faster execution of Test Cases over night.
2. Integrated our test suite with several technologies like BizTalk, SQL server,
Microsoft Dynamics AX and Web application.
3. Effectively learnt new technologies like Microsoft Dynamics AX and created XPO
files which would enable faster creation of data required for the test cases.

4. Developed and configured the result set produced after execution to be sent over the
mail to all the team members using SMTP settings.
5. Implemented and developed automated test practices for both web and windows
applications primarily using Visual Studio’s Coded UI module for both web and
windows applications.
6. Designed and created test scripts using C# to address areas such as database impacts,
software scenarios, regression testing, negative testing, error or bug retests, or
usability in preparation for implementation.
Project - 2: Accenture Software’s
Software Engineer (Nov 2012 - May 2013)
Responsibilities:
1. Contributed to testing and validating of AX solution per requirements
2. Delivered testing results in a professional manner to customer
3. Delivered testing results according to required timeline and per quality
4. Provided inputs for continuous improvements of testing group
5. Cooperated with centralized Technical and Functional departments.
6. Documented test results and evaluate results to log defects
Project - 1: Presto E-ticketing
Associate Software Engineer (May 2012 - Nov 2012)
Responsibilities:
1. Major Technologies involved while deploying the solution include .NET, SQL,
BizTalk and Microsoft Dynamics AX
2. Monitoring the Connectivity of servers and informing the same to the developers to
ensure stability of the environment
3. Involved in mentoring the fresher’s in performing the tasks by guiding them with
necessary knowledge transfer.
4. Provided timely report to the supervisor about the fluctuations in the Environment and
took necessary actions to ensure stability.
Certifications:
BCS, The Chartered Institute for IT Foundation Certificate in Business Analysis
Educational Background:
Course of
Study
Board/
University Year Of Passing Percentage
B.E
(Electronics
and
Nagarjuna University 2012
78

Communicati
on)
Intermediate
(Higher
Secondary
Education)
Board Of
Intermediate
Education, AP 2008 96.4
10th standard
(Schooling) ICSE 2006 87.5
Personal Details:
Name Imtiaz khan
Date of Birth 02 Nov, 1990
Father’s Name Mohammad khan
Sex Male
Nationality Indian
Declaration:
I hereby declare that the information furnished above is true to the best of my knowledge.
Date:
Place: (Imtiaz khan)

Imtiaz Khan Data Engineer Resume

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to Imtiaz Khan Data Engineer Resume

Similar to Imtiaz Khan Data Engineer Resume (20)

Recently uploaded

Recently uploaded (20)

Imtiaz Khan Data Engineer Resume