Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Resume(kaushik shakkari)
1. KAUSHIK SHAKKARI
#318, 2700 Ellendale Place, Los Angeles, CA 90007 | shakkari@usc.edu | +1 (213) 477-3601 | linkedin.com/in/kaushik-shakkari/
public.tableau.com/profile/kaushik3654#!/|github.com/kaushikData|datacamp.com/kaushikshakkari| kaushikshakkari.wixsite.com
EDUCATION
University of Southern California, Los Angeles, CA Aug 2018 - May 2020
Master of Science in Computer Science (Data Science)
Amrita University, Coimbatore, TN, India Aug 2014 - May 2018
B. Tech. in Computer Science and Engineering, CGPA: 9.18 / 10
SKILLS
Programming Languages Python, Java, C++, C, JavaScript
Storage Systems and Query Languages SQL, MySQL, Oracle DB, Cassandra, MongoDB, PostgreSQL
Visualization Tools Tableau, Plotly
Big Data Framework and Technologies Hadoop, Hive, Sqoop, Pig, Oozie
Libraries (Python) Scikit-learn, NLTK, Beautiful Soup, Urllib, Bokeh, Keras, TensorFlow
Cloud Computing Google Cloud Platform
RESEARCH
UG Research Assistant, Amrita Multidimensional Data Analysis Lab, Amrita University, India Jan 2016 - Jul 2018
• Created a tool ‘Lakshya’ and evangelized ‘Lakshya’ as an extension to web browsers of 50 active users to analyse user
behaviour while browsing Internet, detect the level of diversion and nudge him back in real-time.
• Application’s usage history showed ‘Lakshya’ detected diversion and alerted users appropriately.
• Improved model accuracy to 95% through continuous feedback from users.
• Extrapolated insights on browsing behaviour with Plotly and Bokeh to make users understand their internet usage.
WORK EXPERIENCE
Grader for Statistics (GSBA 537) and Database Management (CSCI 585), University of Southern California Sept 2018
• Responsible for creating visualization and analytics assignments and grading student assignments on strict deadlines.
Team Leader, University Cisco collaboration real-time project Nov 2016 - Dec 2017
• Enabled cloud level scaling for a generic big data Rating and Billing Scheduling application using Hadoop framework.
• Automated deployment and orchestration of the application using Oozie tool, reducing human interaction.
• Managed and transferred data from various sources using Hive and Sqoop.
• Implemented code meet the production quality of Cisco Systems and was deployed to production.
Database Intern, APTOnline Dec 2016 - Jan 2017
• Designed, modelled and optimized a set of DML operations and cursors in collaboration with Watershed Development Team for
Watershed Management Project (Andhra Pradesh State Government Project).
PROJECTS
Data Clustering and Anomaly Detection of Malware Affected Systems Nov 2018
• Implemented PCA to reduce dataset to more than 60% of original dataset, retaining 95% of information.
• Executed various clustering algorithms like K-Means, Mean-Shift, DBSCAN, Agglomerative Hierarchical Clustering etc.
• Mathematically stated K-Means with K = 2 is best fit for dataset and detected anomalies (malware affected systems.)
• Used XGBoost, an implementation of gradient boosted decision tree to identify the most crucial features by F-Scores.
Product Analysis and Pricing Strategies Sept 2018 – Oct 2018
• Pre-processed and analysed sales data of tablets sold by different companies like Apple, Samsung, Kindle and others.
• Created a dashboard to show how Apple’s and Samsung’s products are competing in sales rank, rating and discount etc.
• Addressed outliers and computed trend lines on Tableau for various features of each brand over a period of 24 weeks.
Customer Churn Prediction and Analysis Jan 2018 - Apr 2018
• Performed pre-processing by encoding data, dealing with null values and reducing dimension of dataset.
• Executed k-folded cross validation technique with ten splits to avoid overfitting of data.
• Computed accuracy for ten classification algorithms including multilayer perceptron, a feedforward artificial neural network and
adaptive boost and visualized data using seaborn and matplotlib libraries to find insights in data.
• Created an interactive visualization application using bokeh for analysing relations of features in dataset.
Data tidying and cleaning Gapminder Original Dataset Dec 2017 - Feb 2018
• Implemented data cleaning techniques like melting and pivoting for data to be ready for analysis.
• Performed preliminary quality diagnosis by assert statements and created five-dimensional plot in tableau.
OOLECA (Optimization of Live Space Using Computer Algorithms) - predicting the best plant that can be grown at the area
considering climate, groundwater, humidity and purpose for growing plant. Jul 2016 - Dec 2016
• Cleaned data from different data sources and designed the database schema for the application.
• Constructed hybrid (content-based and collaborative) recommendation system for users of application.
• Collaborated with environmental science department for designing survey and modelling application.
ACHIEVEMENTS AND AWARDS
Outstanding Student Award, Amrita University Apr 2018
Outstanding Contribution and Successful Project Delivery, Cisco, India Nov 2017