SlideShare a Scribd company logo
Sai Teja K
+1-774-253-9819
ksaiteja1003@gmail.com
https://www.linkedin.com/in/ksaiteja1003/
https://github.com/ksaiteja1003
PROFESSIONAL SUMMARY
• 2+ years of experience in mentoring 2 team members, performing code reviews in Python & conducting
technical queries solving sessions.
• Designed and developed a mastering solution for the Pharma Industry for Trials, Sites, and Investigators and
increasing data accuracy by 85% by using TAMR MDM Tool.
• Collaborated with 2+ client stakeholders to gather requirements and ensured accurate implementation of client
business needs for specific KPI's.
• Developed ETL data pipelines for 5 data sources of oncology clinical trials study data reducing the processing
time by 35% and engineered them using AWS, PySpark, and Apache Airflow into a single scalable repository
for further research and analysis.
• Experience in managing data-related requests from clients, analyzing issues, and providing efficient resolution
within 24 hours.
• Mined and identified trends in Healthcare data sets to help clients make informed decisions, resulting in a
Client Delight Award.
• Created data testing scenarios in the application to ensure data accuracy and completeness.
• Provided inputs and support to senior team members for the project implementation plans.
SKILLS
• Languages: Python, MySQL
• Tools: Microsoft Office(Excel, Word, PPT),
Tableau, TAMR MDM, Zeppelin, Jupyter
• Libraries: NumPy, Pandas, Matplotlib,
Seaborn, Sklearn
• Machine Learning: Supervised(Regression,
Classification) & Unsupervised
• Big Data Technologies: PySpark, Hadoop,
MapReduce, Apache HBase, Sqoop, Flume,
Hive.
• Cloud Computing: AWS - S3, ATHENA, EC2,
EMR, REDSHIFT, AIRFLOW
• Version Controlling: Jira, Bitbucket
EDUCATION
Master in Data Analytics
Clark University
Graduation: August-2023
PG Diploma in Data Science Engineering
Great Lakes Executive Learning
Bachelor of Technology in ECE
SRM Institute of Science and Technology
PROFESSIONAL EXPERIENCE
Business Technology Solutions Associate
ZS Associates (Feb’21 – Aug’22)
Business Technology Analyst Intern
ZS Associates (Aug’20 – Jan’21)
ACHIEVEMENTS
• Client Delight Award by ZS Associates.
• Participated in Trekking in the Himalayas
(Bramhatal).
PROJECTS
Aspiring Mind(AMCAT) Employees Salary Prediction
AMCAT is a test that's conducted for graduates to test their domain skills and various personal traits. AMCAT
has collected around 200K records of employees who have appeared for the test and their education and location
details. Based on that we have to predict their salary per year.
Key skills: Data Pre-processing, Exploratory Data Analysis, Machine Learning- Regression using Python, Model
Evaluation, Hyper Parameter Tuning.
Contribution:
Ø Performed extensive coding and data preprocessing of 200K data points using Python, resulting in
improved accuracy of predictions by 15%.
Ø Implemented a Machine Learning pipeline for prediction of salary of employed professionals with 87%
accuracy by hyperparameter tuning of a Machine Learning-Regression model.
Feasibility Intelligence and Design Optimizer (FIDO)
FIDO De-duplicates various data sources and provides insightful analytics for decision-making throughout the
clinical trial planning process including site & investigator feasibility to break a long chain of clinical trial
inefficiencies that delay progress, burden participants, limit trial diversity and drain funding for research and
discovery.
Key skills: Healthcare Analytics
Contribution:
Ø Extensively worked on Data wrangling and proposed solutions for cleaning the data using PySpark,
MySQL, and Python.
Ø Performed counterfactual analysis to fill in for missing values using Python, resulting in an improvement
of KPI Enrollment Duration by 5%.
Ø Prepared data visualization and reports using Tableau for visualization of clinical trials timeline.
Ø Designed a method for standardizing the 300K variations of 27K diseases for multiple data sources
leveraging different ontologies using TERMite and CENTREE which resulted in an increase of data
accuracy of diseases by 60%.
Ø Analyzed key performance indicators to understand the enrollment of patients, including enrollment
duration, number of patients enrolled.
Ø Corrected the mapping of Therapeutic Area to disease for 5 data sources, resulting in 100% accuracy of
mapping.
IMDB Top 250 Movies data pipeline
IMDB is an online database of information about movies, television shows, video games, and streaming services
and In this project, Apache Airflow was used to build a data pipeline for extracting the top 250 IMDB movies by
rating by web scrapping.
Key skills: Apache Airflow, Web scrapping using Python.
Contribution:
Ø Development of web scrapper that extracts information of top 250 movies from the IMDB website
using Python and derive conclusions of what movie was most profitable.

More Related Content

Similar to Sai Teja K Resume.pdf

How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
Denodo
 
Cv saravanan v1.9
Cv saravanan v1.9Cv saravanan v1.9
Cv saravanan v1.9
Saravanan Raju
 
The Data Lake: Empowering Your Data Science Team
The Data Lake: Empowering Your Data Science TeamThe Data Lake: Empowering Your Data Science Team
The Data Lake: Empowering Your Data Science Team
Senturus
 
Resume anh chu
Resume anh chuResume anh chu
Resume anh chu
ANH CHU
 
Resume_Analytics_Nidhi
Resume_Analytics_NidhiResume_Analytics_Nidhi
Resume_Analytics_NidhiNidhi Gupta
 
rahul cv modified
rahul cv modifiedrahul cv modified
rahul cv modifiedRahul Patil
 
Alpha Analytics Market Research Profile
Alpha Analytics Market Research ProfileAlpha Analytics Market Research Profile
Alpha Analytics Market Research Profile
Vivek Rane
 
Data Science and Analytics
Data Science and Analytics Data Science and Analytics
Data Science and Analytics
Prommas Design Agency
 
Resume_Vignesh_ThulasiDass
Resume_Vignesh_ThulasiDass Resume_Vignesh_ThulasiDass
Resume_Vignesh_ThulasiDass
VigneshThulasiDass
 
Vadlamudi saketh30 (ml)
Vadlamudi saketh30 (ml)Vadlamudi saketh30 (ml)
Vadlamudi saketh30 (ml)
Vadlamudi Saketh
 
Anirban chakraborty(m eng., cssgb)
Anirban chakraborty(m eng., cssgb)Anirban chakraborty(m eng., cssgb)
Anirban chakraborty(m eng., cssgb)
Anirban Chakraborty
 
Resume (kaushik shakkari)
Resume (kaushik shakkari)Resume (kaushik shakkari)
Resume (kaushik shakkari)
Kaushik Shakkari
 
Robert Miller Public Resume
Robert Miller Public ResumeRobert Miller Public Resume
Robert Miller Public Resume
Robert Miller
 
Rahul Chauhan Resume - Data Scientist.pdf
Rahul Chauhan Resume - Data Scientist.pdfRahul Chauhan Resume - Data Scientist.pdf
Rahul Chauhan Resume - Data Scientist.pdf
rach3246
 
Nikhila Marripati Resume - BI/DataEngineer
Nikhila Marripati Resume -  BI/DataEngineerNikhila Marripati Resume -  BI/DataEngineer
Nikhila Marripati Resume - BI/DataEngineer
bnikhila43
 
ADV Slides: How to Improve Your Analytic Data Architecture Maturity
ADV Slides: How to Improve Your Analytic Data Architecture MaturityADV Slides: How to Improve Your Analytic Data Architecture Maturity
ADV Slides: How to Improve Your Analytic Data Architecture Maturity
DATAVERSITY
 
Rahul Chauhan - Data Scientist Resume.pdf
Rahul Chauhan - Data Scientist Resume.pdfRahul Chauhan - Data Scientist Resume.pdf
Rahul Chauhan - Data Scientist Resume.pdf
rach3246
 
Predicting Medical Test Results using Driverless AI
Predicting Medical Test Results using Driverless AIPredicting Medical Test Results using Driverless AI
Predicting Medical Test Results using Driverless AI
Sri Ambati
 
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
Value Amplify Consulting
 

Similar to Sai Teja K Resume.pdf (20)

How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
 
Cv saravanan v1.9
Cv saravanan v1.9Cv saravanan v1.9
Cv saravanan v1.9
 
The Data Lake: Empowering Your Data Science Team
The Data Lake: Empowering Your Data Science TeamThe Data Lake: Empowering Your Data Science Team
The Data Lake: Empowering Your Data Science Team
 
Resume anh chu
Resume anh chuResume anh chu
Resume anh chu
 
Resume_Analytics_Nidhi
Resume_Analytics_NidhiResume_Analytics_Nidhi
Resume_Analytics_Nidhi
 
rahul cv modified
rahul cv modifiedrahul cv modified
rahul cv modified
 
Alpha Analytics Market Research Profile
Alpha Analytics Market Research ProfileAlpha Analytics Market Research Profile
Alpha Analytics Market Research Profile
 
Data Science and Analytics
Data Science and Analytics Data Science and Analytics
Data Science and Analytics
 
Resume_Vignesh_ThulasiDass
Resume_Vignesh_ThulasiDass Resume_Vignesh_ThulasiDass
Resume_Vignesh_ThulasiDass
 
Vadlamudi saketh30 (ml)
Vadlamudi saketh30 (ml)Vadlamudi saketh30 (ml)
Vadlamudi saketh30 (ml)
 
Anirban chakraborty(m eng., cssgb)
Anirban chakraborty(m eng., cssgb)Anirban chakraborty(m eng., cssgb)
Anirban chakraborty(m eng., cssgb)
 
Resume (kaushik shakkari)
Resume (kaushik shakkari)Resume (kaushik shakkari)
Resume (kaushik shakkari)
 
Robert Miller Public Resume
Robert Miller Public ResumeRobert Miller Public Resume
Robert Miller Public Resume
 
Jithender_3+Years_Exp_ETL Testing
Jithender_3+Years_Exp_ETL TestingJithender_3+Years_Exp_ETL Testing
Jithender_3+Years_Exp_ETL Testing
 
Rahul Chauhan Resume - Data Scientist.pdf
Rahul Chauhan Resume - Data Scientist.pdfRahul Chauhan Resume - Data Scientist.pdf
Rahul Chauhan Resume - Data Scientist.pdf
 
Nikhila Marripati Resume - BI/DataEngineer
Nikhila Marripati Resume -  BI/DataEngineerNikhila Marripati Resume -  BI/DataEngineer
Nikhila Marripati Resume - BI/DataEngineer
 
ADV Slides: How to Improve Your Analytic Data Architecture Maturity
ADV Slides: How to Improve Your Analytic Data Architecture MaturityADV Slides: How to Improve Your Analytic Data Architecture Maturity
ADV Slides: How to Improve Your Analytic Data Architecture Maturity
 
Rahul Chauhan - Data Scientist Resume.pdf
Rahul Chauhan - Data Scientist Resume.pdfRahul Chauhan - Data Scientist Resume.pdf
Rahul Chauhan - Data Scientist Resume.pdf
 
Predicting Medical Test Results using Driverless AI
Predicting Medical Test Results using Driverless AIPredicting Medical Test Results using Driverless AI
Predicting Medical Test Results using Driverless AI
 
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
 

Recently uploaded

Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 

Recently uploaded (20)

Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 

Sai Teja K Resume.pdf

  • 1. Sai Teja K +1-774-253-9819 ksaiteja1003@gmail.com https://www.linkedin.com/in/ksaiteja1003/ https://github.com/ksaiteja1003 PROFESSIONAL SUMMARY • 2+ years of experience in mentoring 2 team members, performing code reviews in Python & conducting technical queries solving sessions. • Designed and developed a mastering solution for the Pharma Industry for Trials, Sites, and Investigators and increasing data accuracy by 85% by using TAMR MDM Tool. • Collaborated with 2+ client stakeholders to gather requirements and ensured accurate implementation of client business needs for specific KPI's. • Developed ETL data pipelines for 5 data sources of oncology clinical trials study data reducing the processing time by 35% and engineered them using AWS, PySpark, and Apache Airflow into a single scalable repository for further research and analysis. • Experience in managing data-related requests from clients, analyzing issues, and providing efficient resolution within 24 hours. • Mined and identified trends in Healthcare data sets to help clients make informed decisions, resulting in a Client Delight Award. • Created data testing scenarios in the application to ensure data accuracy and completeness. • Provided inputs and support to senior team members for the project implementation plans. SKILLS • Languages: Python, MySQL • Tools: Microsoft Office(Excel, Word, PPT), Tableau, TAMR MDM, Zeppelin, Jupyter • Libraries: NumPy, Pandas, Matplotlib, Seaborn, Sklearn • Machine Learning: Supervised(Regression, Classification) & Unsupervised • Big Data Technologies: PySpark, Hadoop, MapReduce, Apache HBase, Sqoop, Flume, Hive. • Cloud Computing: AWS - S3, ATHENA, EC2, EMR, REDSHIFT, AIRFLOW • Version Controlling: Jira, Bitbucket EDUCATION Master in Data Analytics Clark University Graduation: August-2023 PG Diploma in Data Science Engineering Great Lakes Executive Learning Bachelor of Technology in ECE SRM Institute of Science and Technology PROFESSIONAL EXPERIENCE Business Technology Solutions Associate ZS Associates (Feb’21 – Aug’22) Business Technology Analyst Intern ZS Associates (Aug’20 – Jan’21) ACHIEVEMENTS • Client Delight Award by ZS Associates. • Participated in Trekking in the Himalayas (Bramhatal).
  • 2. PROJECTS Aspiring Mind(AMCAT) Employees Salary Prediction AMCAT is a test that's conducted for graduates to test their domain skills and various personal traits. AMCAT has collected around 200K records of employees who have appeared for the test and their education and location details. Based on that we have to predict their salary per year. Key skills: Data Pre-processing, Exploratory Data Analysis, Machine Learning- Regression using Python, Model Evaluation, Hyper Parameter Tuning. Contribution: Ø Performed extensive coding and data preprocessing of 200K data points using Python, resulting in improved accuracy of predictions by 15%. Ø Implemented a Machine Learning pipeline for prediction of salary of employed professionals with 87% accuracy by hyperparameter tuning of a Machine Learning-Regression model. Feasibility Intelligence and Design Optimizer (FIDO) FIDO De-duplicates various data sources and provides insightful analytics for decision-making throughout the clinical trial planning process including site & investigator feasibility to break a long chain of clinical trial inefficiencies that delay progress, burden participants, limit trial diversity and drain funding for research and discovery. Key skills: Healthcare Analytics Contribution: Ø Extensively worked on Data wrangling and proposed solutions for cleaning the data using PySpark, MySQL, and Python. Ø Performed counterfactual analysis to fill in for missing values using Python, resulting in an improvement of KPI Enrollment Duration by 5%. Ø Prepared data visualization and reports using Tableau for visualization of clinical trials timeline. Ø Designed a method for standardizing the 300K variations of 27K diseases for multiple data sources leveraging different ontologies using TERMite and CENTREE which resulted in an increase of data accuracy of diseases by 60%. Ø Analyzed key performance indicators to understand the enrollment of patients, including enrollment duration, number of patients enrolled. Ø Corrected the mapping of Therapeutic Area to disease for 5 data sources, resulting in 100% accuracy of mapping. IMDB Top 250 Movies data pipeline IMDB is an online database of information about movies, television shows, video games, and streaming services and In this project, Apache Airflow was used to build a data pipeline for extracting the top 250 IMDB movies by rating by web scrapping. Key skills: Apache Airflow, Web scrapping using Python. Contribution: Ø Development of web scrapper that extracts information of top 250 movies from the IMDB website using Python and derive conclusions of what movie was most profitable.