Resume

1 | P a g e
Shiv Shankar Dutta
Phone – +91-9136819634
Email – shivdutta@protonmail.com
GIT: https://github.com/Shivdutta
Medium: https://medium.com/@shivdutta
LinkedIn: https://www.linkedin.com/in/75ssd/
Experience Summary
 Experienced Data Scientist/Data Solution Architect in designing and developing machine learning and
deep learning solutions in line with quality and regulatory standards.
 Hands on experience in the areas of Classification Solutions, Regression methods, Decision Trees,
Neural Networks, NLP, Chatbots, Image and Video Analytics.
 Data Science Professional with experience in all stages of data processing and insightsdelivery.
 Experience working in Start-up, mid and large organization for project/product development, services
and delivery.
 Worked with various clients like HDFC, Shell, Indian Governments, AIG, Allstate
 Experience in conducting analytics assessment workshop and client requirement gathering.
 Gather, evaluate & document business requirements related to analytics, translate to analytics solution
definition & ability to implement using Python
 Deep involvement in handling the critical deliverables, benchmarking solutions, driving Key
Metrics, best practices, maintaining Productivity and ensuring projects areprofitable.
 Successful liaison between the business users and technical developers, working on Onsite and
offshore model for multiple deliveries. Planning and prioritizing the work product to meet time lines.
 Defining analytics product road map in line with Enterprise Architecture Framework.
 Design and Development of Machine Learning Models, Deep Learning Models, Data/Text Mining,
Image Analysis Classification and NLP from start to end for POC and Analytics Solutions in any Domain
or any Business Use Case.
 Hands on experience in handling unstructured data like image, video and text
 Hands on experience in data collection, feature selection and feature engineering from multiple
different data sources like No-SQL DB (Hive, HBase, Mongo DB), Flat files, SQL-DB
 Hands on experience in building predictive models in Real Time/Near Real Time/Batch using machine
learning
 Exposure in end to end data pipeline creation in enterprise system using different tools: Flume, Kafka,
Spark, Flask, Docker
 Model Validation from Testing and Client Stakeholder. Aligning with business and technology for
deployment, validation and acceptance.
 Manage and mentor team.
 TOGAF 9.1 Certified. Experience in presales activities like technical solution and estimation, POC,
customer demo.
 Well versed with Agile/Waterfall methodologies, CMMI Level 5 process, Estimation techniques,
Requirement Gathering and Elicitation, Design using UML techniques.
 Hands on Expertise on data governance data lineage data processes DML and data architecture control
execution, Master Data management (MDM) and Data Governance (DG).
 Exposure to deployment ML models in Kubernetes and Docker Cluster in premise or hybrid
environment
 Exposure in interfacing of IoT/Connected Device/Sensor for data analysis and Edge Analytics.
 Model Quantization for hand held devices using tensorlite.

2 | P a g e
Primary Skills
Machine Learning
Regression, K Means Clustering, Decision Tree, SVM, Bayes
Theorem, Naive Bayes, Random Forest, LGBM, XGBoost,Aproiri,Time
Series-ARIMA
Neural Network
CNN, RNN, AutoEncoders, Keras,Pytorch,TensorFlow ,YOLO, Open
CV, SSD, BERT,Resnet,Inception,RCNN,SSD-Mobilenet,
NLP
NLTK, Spacy, POS, Tokenization, Stemming and Lemmitization,BERT,
Word2vec,Glove,Embedding
RDBMS SQL Server, MySQL, PostgreSQL
Statistical Techniques Hypothesis, ANOVA, Chi-Square
No SQL DB MongoDB
Languages Python, R, C#
Enterprise Architecture TOGAF Certified
Chat bots Lex, LUIS, RASA, Dialog flow
Messaging Kafka
Tools PyCaret, AutoML,Jupyter Notebook,Spyder,Anaconda
Methodology Agile & Waterfall
Secondary Skills
Infrastructure Management:
Hortonworks Ambari, Cloudera Hue, Sagemaker, Cloud ML,Docker
and Kubernetes
Transformation: Sqoop, Apache Spark ,Pyspark, Flume
Big Data Hadoop, HDFS, Yarn, Zookeeper
Others Hive, Pig, HBase,Scala
Cognitive Services: Cloud/Open source Cognitive Services
Employment History
Tenure Company Position
Apr 17 to Jun 19 Sequretek Pvt. Ltd. Technology Architect/Data Scientist
Nov 2015 to Mar 17 CGI India Ltd. Solution/Technical Architect/ML(Associate
Consultant)
Oct 2008 to Nov 2015 Rolta India Ltd. Solution/Technical Architect/Project
Manager/Presales Architect/BI/Machine
Learning(Senior Manager)
July 2004 to Oct 2008 Syntel Project Lead
June 2003 to June 2004 L & T Infotech Ltd Software Engineer

3 | P a g e
Project: Hand Detection System for Shredder Machine
Technology/Deep Learning: RCNN, Python, Google Coral
Overview: The project was taken prevent the industrial accident to working near the shredding machine. The Shredding
machine or document shredder is a mechanical device for cutting paper and other media which contain information into
fragments so small that the information can no longer be retrieved.Putting the paper in machine is manual activity and
during this sometime accident occurs. To stop the machine automatically,system should detect the hand beyond and alert
user on crossing beyong limit line. The alert is in form for buzzer, red lighting and stopping of the machine. The GPIO pin of
Coral is connected to machine for stopping. The images of hands were labeled and trained using RCNN.
Role & Responsibilities:
 Part of Product Architecture Team.
 Model Development and Validation using RCNN
 Model Deployment using RCNN and Validation and acceptance.
 Development to code for hand detection
Outcome and Contributions:
o High accuracy with more than 95%.
o By automation, accident was completely avoided.
Team Size: 2
Client: Freelance
Project: Manage Detection and Response
Machine Learning: Light GBM and SVM, KNN, Clustering, Neural Network like RNN
Technology: Python, Apache Spark, Pytorch, Apache Kafka, Flume, MongoDB, Jenkins, Git,Hive,Hbase,Ozie
Overview: Manage Detection and Response is a combination of technology and skills to deliver advanced threat detection,
deep threat analytics, global threat intelligence, faster incident mitigation, and collaborative breach response on a 24x7
basis. The endpoints (IOT devices and Enterprise servers) have system scanning done by EMS system or scan component.
Apache Flume agents capture the logs and sends to Topic in Apache Kafka. Files received are server logs, client profile,
schema profile,network settings. The Master Data Management was setup for data profile and schema as part of Data
sharing agreement with client. In Data Ingestion, the queue is consumed using Apache Spark Batch and Steam
component. Data Quality services were setup as part data validation. Multiple Layer were setup in Spark component as
part of data validation and cleansing, standardization and transformation.The data is reduced and stored in HBase for
Machine Learning Hive was setup for data archiving . Ozie was used for scheduling.
Machine Learning Algorithm techniques like Ensemble Learning & Boosting like SVM, Light GBM, Random Forest, Kmeans
Clustering, KNN, RNN are applied to best possible result is derived.
Nov 2002 to June 2003 Amtech Communication Senior Application Developer
Sep 2001 to Oct 2002 Nazara.com Senior Programmer
Sep 1999 to Sep 2001 Ideaz Netechnologies Programmer
Recognitions
Manager Award in L&T Infotech
CMMI Level 5 participation in Rolta
Qualifications
Bachelor of Engineering in Electronics and Telecommunication, Marathawada University. First Class.
Certifications
Coursera - Neural Networks 2019-03 and Deep Learning
Open Group: 2018-02 Open Group - TOGAF 9.1 Certified
Udemy: 2019-07 Deep Learning with TensorFlow 2.0 [2019]
Udemy: 2017-02 Big Data with Spark Streaming and Pyspark

4 | P a g e
The data is processed using Machine Learning for threat detection. The output is stored in Mongo DB and displayed in
dashboard. Exception, Audit Trail were and service accounts were created as part of framework.
 Feature Selection and Engineering for Web Attack, Network Attack, Malware Attack.
 Model Development and Validation using Light GBM and SVM, Neural Network like RNN
 Model Deployment and Validation and acceptance.
 Collaborating with agile with cross-functional teams
 Development of data ingestion, log processing component using Apache Spark/Flume, Kafka and HDFS and
MongoDB
o Complexity in performing Feature Engineering is avoided
o High accuracy with more than 95% with compared to manual classification around 60%
o By automation, 4 people per month manual effort for each customer is saved.
o Approximately 5 to 10 licenses cost of HP ArcSight for each customer is saved.
Team Size 15
Client: HDFC, DMart, Product Development
Project: EDPR Machine Learning
Machine Learning: Light GBM and SVM, KNN, Clustering,Random Forest
Technology: Python, Apache Spark, Pytorch, Apache Kafka, Flume, SQLLite, Jenkins, Git,Hive,Hbase,Ozie
Overview: Endpoint Detection and Protection Response detects, protects and responds to cyberattacks which adds to the
complexity of securing the enterprise. Each of the point products adds an agent to the endpoint and is often managed
independent of the other security technologies present on that endpoint.
Files received are server logs, client profile, schema profile,network settings. The Master Data Management was setup for
data profile and schema as part of Data sharing agreement with client. In Data Ingestion, the queue is consumed using
Apache Spark Batch and Steam component. Data Quality services were setup as part data validation. Multiple Layer were
setup in Spark component as part of data validation and cleansing, standardization and transformation.This involves
Feature Extraction and Feature Engineering for malware based on Static Analysis for PE and PDF file types. The metadata is
extracted from malware samples. The data is reduced and stored in HBase for Machine Learning Hive was setup for data
archiving . Ozie was used for scheduling.
Machine Learning: Thereafter, Data Pre-Processing, Data Cleaning is done. Based on exploratory analysis, regularly model
is created/updated and validated. Machine Learning algorithm techniques like Ensemble Learning & Boosting like SVM,
Light GBM,Random Forest Tensorlite are applied to best possible result is derived.
 Model Creation, Data Pre-Processing, Data Cleaning.
 Feature Selection and Engineering.
 Model Development and Validation using Light GBM and Tensorlite
o Complexity in performing Feature Engineering is avoided
o Improved Accuracy metrics of 4% from earlier traditional model
o By automation, 10 people per month manual effort is saved.
Team Size 10
Client: Bharat Co-operative Bank, Product Development
Project: Website Security Checking

5 | P a g e
Technology: Python, Apache Spark, Pytorch, Apache Kafka, Flume, MongoDB, Jenkins, Git,Hbase,Ozie
Overview: The product was developed to check the website sanity and plug in the product in other products. The product
was used to validate the URL and create the database of Threat Intelligence. The feature URL length, domain registration,
port, https, dns record age google page rank etc. are used in detection. Separate environment is created for data creation
and based on the data captured the features are extracted.
In Data Ingestion, the queue is consumed using Apache Spark Batch component. Data Quality services were setup as part
data validation. Multiple Layer were setup in Spark component as part of data validation and cleansing, standardization
and transformation. The data is reduced and stored in MongoDB for Machine Learning Hive was setup for data
Some Samples were collected from third party and some were from server logs of managed services. The is continuous
process. EDA is performed for data standardization. The model is trained incrementally every week and thereby creating
rich threat intelligence repository.
 Model Development and Validation using Clustering, Boosting and Random Forest.
Client: Product Development(Internal)
Team Size 15
Project: SOC Incident Log Ticket Allocation:
Overview: The managed services handled lot of the security related measures for global customers. The SOC (Security
Operations Centre is established to achieve the objective. The ticket resolution time is important for customer and
capacity planning. Initially all the historical data available in excel sheet is used as input for training. EDA, Feature Selection
and Feature Engineering, PCA is done prior to training.
Apache Spark Batch component. Data Quality services were setup as part data validation. Multiple Layer were
setup in Spark component as part of data validation and cleansing, standardization and transformation. The data is
reduced and stored in MongoDB for Machine Learning Hive was setup for data
On post training, whenever new ticket arrives, ETA is calculated and on closing variance is also captured. Based on this,
analysis is done and model is retrained.
Client: HDFC, AEGON, Reliance, Product Development(Internal)
Team Size 10
Project: Malicious PDF file detection:
Technology: Python, Apache Spark, Pytorch, Apache Kafka, Flume, SQLLite, Jenkins, Git,Hive,Hbase,Ozie

6 | P a g e
Overview: The PDF file is source of malware or hyperlinks to phishing site. The product was developed to check the
document sanity and plug in to other product. The PDF file have not of attributes or semi structures or snippets where
exploit can be easily injected. The features like java script, rich media, JBIG2Decode, Open Action etc. are vulnerable to
attack. Samples were collected from third part.
Some Samples were collected from third party and internal threat repository. The is continuous process. EDA is performed
for data standardization. The model is trained incrementally every week and thereby creating rich threat intelligence
repository.
Model Deployment and Validation and acceptance.
Client: HDFC, AEGON, Reliance, Product Development(Internal)
Team Size 10
Project: IGA
Technology: Python, Apache Spark, Pytorch, Apache Kafka, Flume, MongoDB, Jenkins, Git,Hbase,Ozie
Overview: IGA is integrated access management and governance product which takes care entire life cycle of employee
engagement (on boarding and exit). During On boarding, employee id is created, access to different system is given after
approval. During exit, all the access and id are revoked. Machine Learning The data collected from multiple system like
attendance system, leave portal, access management, training system, appraisal system and other client multiple system.
Some of the attributes are Employee Salary, Employee Satisfaction, Promotion last year, hourly rate etc.
The collected data cleansed, parsed, validated and thereafter feature selection and engineering, exploratory data analysis
are applied to derive multiple metrics. Powerful dashboard is created using Tableau. Machine Learning Algorithm
techniques like Ensemble Learning & Boosting like Support Vector Machine, XgBoost,PCA are applied to best possible
result is derived.
 Model Development and Validation using Clustering, SVM, Boosting and Random Forest.
Team Size 15
Client: HDFC, AEGON, Axis Bank, Product Development(Internal)
Project: Bank Customer Churn
Overview: The model is developed for Agricultural segment. There are multiple co-operative banks and financial agency
who can support farmers. The Churn based product is developed to find the shift of customers. The data is sourced from
FTP and batch wise processed. Some of the attributes like gender, geography, loan, products, subsidy, income etc. are
considered for model building. FTP was used as data source later replaced with Kafka-spark streams. The is continuous

7 | P a g e
process. EDA is performed for data standardization. The model is trained incrementally every week.
Client: Bharat Cooperative Bank
Team Size 15
Project: Insurance Claim Prediction
Overview: Overview: The model is developed for special intervention of manual checking of claim based on historical claim
processing data as well as customer specific attributes like policy deductibles, exclusions, umbrella limit, collision details,
vehicle details. Initially FTP was used as data source later replaced with Kafka-spark streams. The is continuous process.
EDA is performed for data standardization. The model is trained incrementally every week and thereby creating rich threat
intelligence repository.
Client: HDFC ERGO
Team Size 15
Oil Card Fraud Detection(POC)
Overview: The model was developed as part of POC for Oil and Gas Company. In Europe, oil card, provides drivers of large
transport company to buy oil, pay toll as well as in some case allow to buy limited food in the tour. The model detects the
fraud on the transaction on time zone, transactions instances, latitude and longitude. The anomaly detection technique is
used for the same. This was developed on azue ml platform. Data source is csv. EDA is performed for data standardization.
The model is trained incrementally every week and thereby creating rich threat intelligence repository.
Model Deployment and Validation and acceptance.
Client: Oil and Gas
Team Size 5
BI Reports
Overview: This project involves migration from legacy platform to SSRS. There were nearly 200 reports. Utility was
developed to migrate sample reports to SS RS platform.
Client: Intel Bangalore
Team Size 30
BI Reports
Overview: This project involves migration from legacy platform to SSRS. There were nearly 40 reports.
Client: ACE Surety, US

8 | P a g e
Team Size 20
R & D Projects
Document Classification using NLP
Asset management using YOLO Object Detection
Chatbot for corona virus using lex and dialog flow and integrated to Facebook and Telegram
Wearing mask prediction for social distancing using RCNN.

Resume

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Resume

Similar to Resume (20)

Recently uploaded

Recently uploaded (20)

Resume