Parmanand Sahu has over 8 years of experience in data science and machine learning. He has worked on various projects including predicting mortgage prepayment rates, developing an AI assistant, and building knowledge graphs. Currently he is a senior associate data scientist at Capital One where he developed a neural network model to predict mortgage prepayment rates.
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Parmanand_Sahu.pdf
1. PARMANAND SAHU
Arlington, VA | anandrocks.meta@gmail.com | 972-730-3967 | linkedin.com/in/parmanandsahu/ | parmanandsahu.com
WORK EXPERIENCE
Capital One Aug 2021 - Present
Senior Associate, Data Scientist Mclean,VA
Deep Neural Network Model to predict Mortgage Prepayment Rate for liquidity and interest rate risk managment
– Conducted extensive exploratory data analysis and feature engineering to enhance the predictive capabilities of pool
level mortgage data.
– Utilized Optuna’s sampling and pruning techniques to optimize model development and decrease time spent on hyper-
parameter optimization by 80%
– Developed an internal package to manage the workflow of over 1000+ experiments and 4 different subcategories of
models related to MBS products.
– Utilized the developed model to project cashflow, which is crucial for risk measurement and valuation of Capital One’s
massive MBS portfolio (worth over 70 billion dollars) and new purchases, resulting in savings of over $100,000 by
replacing vendor model.
– Collaborated with the team to optimize and debug the implementation, enabling the use of the model for predicting
MBS valuation both within and outside of the portfolio, facilitating smart rebalancing decisions.
Ocrolus Jun 2021 - Aug 2021
Data Scientist Intern Newyork,NY
Data Engineering pipeline for document parsing
– Worked on debugging and verifying data engineering pipeline and ML model for classifying businesses into SIC(Standard
Industrial Classification) categories for loan processing.
Lantern Pharma Jan 2021 - May 2021
Data Scientist and Platform Development Intern Dallas,TX
Data Pipeline for Cancer Drug exploration platform
– Developed module to ingest genomics data (TCGA) of 50k+ samples and 3 billion+ data points into S3 using Nextflow.
– Implemented micro-service using Docker and deployed it using AWS ECS for ingesting genomics data.
Capital One Jun 2020 - Aug 2020
Data Science Intern McLean,VA
Deep Neural Network Model to predict Mortgage Prepayment Rate for liquidity and interest rate risk managment
– Implemented parallel processing for tuning hyperparameters of model for prepayment rate and reduced experimentation
cycle by 30 mins via workflow optimization.
– Built and analyzed automated 70 page performance report (PDP/ICE, and S-Curves) for comparing models and
finalizing Hyper-parameters.
VHSS Lab, UTD Sep 2019 - May 2020
Machine Learning Specialist Richardson,TX
NSF funded Conversational Emotive Virtual Reality patient project
– Researched and trained transformer-based model for medical students with 91+% for virtual patient using Pytorch.
Huddl.ai Apr 2018 - Jun 2019
Artificial Intelligence Engineer Hyderabad, India
Named Entity Recognition for Voice Assistant
– Supervised data annotation team and trained a model with 85+% accuracy for Named-Entity-Recognition(NER).
– Built module using Levenshtein Distance and Phonetic similarity to fix incorrect transcription for recognized entities.
– Developed micro-service using Node.js and DynamoDB to use as gazetteer in NER to imrove accuracy by 2 %.
Reverse image search for information retrieval
– Developed parser for OCR response and clustered content using K-means to reduce false positive by 10% .
– Built keyword extraction using RAKE and graph algorithms for ranking meetings on search results on Elastic-Search.
Action Item Detection in Meeting Transcript
– Trained LSTM-RNN based model to classify the action items in the meeting transcript 95% accuracy.
– Deployed ML-flow for internal use and track experiments with different hyperparameters.
2. CoArtha Technosolution Sep 2017 - Apr 2018
Associate Data Science Engineer Hyderabad, India
Semantic Understanding of Job Description for ranking resumes
– Trained a model using Naive Bayes to classify sentences in job descriptions with 90%+ accuracy for matching resumes.
– Assisted in developing scoring logic to match and rank resumes based on job description.
Candidate screening from audio interviews
– Employed Random Forest for classifying candidates using interview response audio with 78+% accuracy.
CoArtha Technosolution Sep 2016 - Aug 2017
Associate Software Engineer Hyderabad, India
Knowledge Graph from job descriptions for ranking resumes
– Built a part of pipeline to scrape and pre-process 1 mil+ US job boards using Selenium/BSoup and ingested into
MongoDB.
– Built a part of Knowledge graph using Neo4j with skills, job titles & education entities from 100k+ job descriptions.
Semantic parsing of resumes
– Trained and deployed a Ensemble model to identify sections with 87+ % accuracy using AWS cloud stack.
– Implemented solution using regex for extracting entities and parsed table to correlate extracted entities.
DigiFledged Jul 2015 - Aug 2016
Founder Bhilai, India
– Managed daily operation and acquired technical & functional requirements of projects from new clients.
– Led a team to deliver 5+ web development, 17+ freelancing projects and establish a blog with 130K+ page-views.
JSW Steel Ltd Feb 2014 - Apr 2015
Junior Manager Bellary, India
– Analyzed production reports discovering insights through exploratory data analysis using MS Excel and R resulting in
saving 4 hrs manual work on weekly basis.
EDUCATION
THE UNIVERSITY OF TEXAS AT DALLAS, Richardson, TX Aug 2019 - Aug 2021
Master of Science in Computer Science GPA 3.77
NATIONAL INSTITUTE OF TECHNOLOGY, Raipur, INDIA Jul 2009 - Jul 2013
Bachelor of Technology (Hons.) in Metallurgical Engineering CPI 8.35
TECHNICAL SKILLS
Languages Python, R, C, C++, Java, Scala, Matlab, React, Node.js
Databases MongoDB, MySQL, DynamoDB, ElasticSearch, Neo4j
Libraries PyTorch, Scipy, Tensorflow, Optuna, Numpy, Pandas, Matplotlib, Plotly, NLTK, Gensim, Sklearn, Spacy
Algorithms Regression, SVM, Clustering, Tree Algorithms, Ensemble Methods, LSTM, CNN, Neural Network
NLP Sequence-to-Sequence, Sequence tagging and Classification, Question Answering
Technologies Linux, Django, Rest-API, Docker, Kubernetes, ML-flow, AWS, Terraform, EKR, ECR, Hadoop, Hive, Spark
CERTIFICATIONS AND ACTIVITIES
– Natural Language Processing, Machine Learning and Deep Learning by Coursera
– AWS Services by The University of Texas at Dallas and Linkedin Learning
– Neural Network and Deep Learning and Machine Learning offered through Coursera
– Linked Data Engineering by Hasso-Plattner Institute: Building Knowledge Graph,2016.
– M101: MongoDB for Developers by MongoDB University
– Volunteered as Member of AnalyticsVidhya.com (Data Science Community, Hyderabad Chapter.)