SlideShare a Scribd company logo
1 | P a g e
Shiv Shankar Dutta
Phone – +91-9136819634
Email – shivdutta@protonmail.com
GIT: https://github.com/Shivdutta
Medium: https://medium.com/@shivdutta
LinkedIn: https://www.linkedin.com/in/75ssd/
Experience Summary
 Experienced Data Scientist/Data Solution Architect in designing and developing machine learning and
deep learning solutions in line with quality and regulatory standards.
 Hands on experience in the areas of Classification Solutions, Regression methods, Decision Trees,
Neural Networks, NLP, Chatbots, Image and Video Analytics.
 Data Science Professional with experience in all stages of data processing and insightsdelivery.
 Experience working in Start-up, mid and large organization for project/product development, services
and delivery.
 Worked with various clients like HDFC, Shell, Indian Governments, AIG, Allstate
 Experience in conducting analytics assessment workshop and client requirement gathering.
 Gather, evaluate & document business requirements related to analytics, translate to analytics solution
definition & ability to implement using Python
 Deep involvement in handling the critical deliverables, benchmarking solutions, driving Key
Metrics, best practices, maintaining Productivity and ensuring projects areprofitable.
 Successful liaison between the business users and technical developers, working on Onsite and
offshore model for multiple deliveries. Planning and prioritizing the work product to meet time lines.
 Defining analytics product road map in line with Enterprise Architecture Framework.
 Design and Development of Machine Learning Models, Deep Learning Models, Data/Text Mining,
Image Analysis Classification and NLP from start to end for POC and Analytics Solutions in any Domain
or any Business Use Case.
 Hands on experience in handling unstructured data like image, video and text
 Hands on experience in data collection, feature selection and feature engineering from multiple
different data sources like No-SQL DB (Hive, HBase, Mongo DB), Flat files, SQL-DB
 Hands on experience in building predictive models in Real Time/Near Real Time/Batch using machine
learning
 Exposure in end to end data pipeline creation in enterprise system using different tools: Flume, Kafka,
Spark, Flask, Docker
 Model Validation from Testing and Client Stakeholder. Aligning with business and technology for
deployment, validation and acceptance.
 Manage and mentor team.
 TOGAF 9.1 Certified. Experience in presales activities like technical solution and estimation, POC,
customer demo.
 Well versed with Agile/Waterfall methodologies, CMMI Level 5 process, Estimation techniques,
Requirement Gathering and Elicitation, Design using UML techniques.
 Hands on Expertise on data governance data lineage data processes DML and data architecture control
execution, Master Data management (MDM) and Data Governance (DG).
 Exposure to deployment ML models in Kubernetes and Docker Cluster in premise or hybrid
environment
 Exposure in interfacing of IoT/Connected Device/Sensor for data analysis and Edge Analytics.
 Model Quantization for hand held devices using tensorlite.
2 | P a g e
Primary Skills
Machine Learning
Regression, K Means Clustering, Decision Tree, SVM, Bayes
Theorem, Naive Bayes, Random Forest, LGBM, XGBoost,Aproiri,Time
Series-ARIMA
Neural Network
CNN, RNN, AutoEncoders, Keras,Pytorch,TensorFlow ,YOLO, Open
CV, SSD, BERT,Resnet,Inception,RCNN,SSD-Mobilenet,
NLP
NLTK, Spacy, POS, Tokenization, Stemming and Lemmitization,BERT,
Word2vec,Glove,Embedding
RDBMS SQL Server, MySQL, PostgreSQL
Statistical Techniques Hypothesis, ANOVA, Chi-Square
No SQL DB MongoDB
Languages Python, R, C#
Enterprise Architecture TOGAF Certified
Chat bots Lex, LUIS, RASA, Dialog flow
Messaging Kafka
Tools PyCaret, AutoML,Jupyter Notebook,Spyder,Anaconda
Methodology Agile & Waterfall
Secondary Skills
Infrastructure Management:
Hortonworks Ambari, Cloudera Hue, Sagemaker, Cloud ML,Docker
and Kubernetes
Transformation: Sqoop, Apache Spark ,Pyspark, Flume
Big Data Hadoop, HDFS, Yarn, Zookeeper
Others Hive, Pig, HBase,Scala
Cognitive Services: Cloud/Open source Cognitive Services
Employment History
Tenure Company Position
Apr 17 to Jun 19 Sequretek Pvt. Ltd. Technology Architect/Data Scientist
Nov 2015 to Mar 17 CGI India Ltd. Solution/Technical Architect/ML(Associate
Consultant)
Oct 2008 to Nov 2015 Rolta India Ltd. Solution/Technical Architect/Project
Manager/Presales Architect/BI/Machine
Learning(Senior Manager)
July 2004 to Oct 2008 Syntel Project Lead
June 2003 to June 2004 L & T Infotech Ltd Software Engineer
3 | P a g e
Project: Hand Detection System for Shredder Machine
Technology/Deep Learning: RCNN, Python, Google Coral
Overview: The project was taken prevent the industrial accident to working near the shredding machine. The Shredding
machine or document shredder is a mechanical device for cutting paper and other media which contain information into
fragments so small that the information can no longer be retrieved.Putting the paper in machine is manual activity and
during this sometime accident occurs. To stop the machine automatically,system should detect the hand beyond and alert
user on crossing beyong limit line. The alert is in form for buzzer, red lighting and stopping of the machine. The GPIO pin of
Coral is connected to machine for stopping. The images of hands were labeled and trained using RCNN.
Role & Responsibilities:
 Part of Product Architecture Team.
 Model Development and Validation using RCNN
 Model Deployment using RCNN and Validation and acceptance.
 Development to code for hand detection
Outcome and Contributions:
o High accuracy with more than 95%.
o By automation, accident was completely avoided.
Team Size: 2
Client: Freelance
Project: Manage Detection and Response
Machine Learning: Light GBM and SVM, KNN, Clustering, Neural Network like RNN
Technology: Python, Apache Spark, Pytorch, Apache Kafka, Flume, MongoDB, Jenkins, Git,Hive,Hbase,Ozie
Overview: Manage Detection and Response is a combination of technology and skills to deliver advanced threat detection,
deep threat analytics, global threat intelligence, faster incident mitigation, and collaborative breach response on a 24x7
basis. The endpoints (IOT devices and Enterprise servers) have system scanning done by EMS system or scan component.
Apache Flume agents capture the logs and sends to Topic in Apache Kafka. Files received are server logs, client profile,
schema profile,network settings. The Master Data Management was setup for data profile and schema as part of Data
sharing agreement with client. In Data Ingestion, the queue is consumed using Apache Spark Batch and Steam
component. Data Quality services were setup as part data validation. Multiple Layer were setup in Spark component as
part of data validation and cleansing, standardization and transformation.The data is reduced and stored in HBase for
Machine Learning Hive was setup for data archiving . Ozie was used for scheduling.
Machine Learning Algorithm techniques like Ensemble Learning & Boosting like SVM, Light GBM, Random Forest, Kmeans
Clustering, KNN, RNN are applied to best possible result is derived.
Nov 2002 to June 2003 Amtech Communication Senior Application Developer
Sep 2001 to Oct 2002 Nazara.com Senior Programmer
Sep 1999 to Sep 2001 Ideaz Netechnologies Programmer
Recognitions
Manager Award in L&T Infotech
CMMI Level 5 participation in Rolta
Qualifications
Bachelor of Engineering in Electronics and Telecommunication, Marathawada University. First Class.
Certifications
Coursera - Neural Networks 2019-03 and Deep Learning
Open Group: 2018-02 Open Group - TOGAF 9.1 Certified
Udemy: 2019-07 Deep Learning with TensorFlow 2.0 [2019]
Udemy: 2017-02 Big Data with Spark Streaming and Pyspark
4 | P a g e
The data is processed using Machine Learning for threat detection. The output is stored in Mongo DB and displayed in
dashboard. Exception, Audit Trail were and service accounts were created as part of framework.
Role & Responsibilities:
 Part of Product Architecture Team.
 Feature Selection and Engineering for Web Attack, Network Attack, Malware Attack.
 Model Development and Validation using Light GBM and SVM, Neural Network like RNN
 Model Deployment and Validation and acceptance.
 Collaborating with agile with cross-functional teams
 Development of data ingestion, log processing component using Apache Spark/Flume, Kafka and HDFS and
MongoDB
Outcome and Contributions:
o Complexity in performing Feature Engineering is avoided
o High accuracy with more than 95% with compared to manual classification around 60%
o By automation, 4 people per month manual effort for each customer is saved.
o Approximately 5 to 10 licenses cost of HP ArcSight for each customer is saved.
Team Size 15
Client: HDFC, DMart, Product Development
Project: EDPR Machine Learning
Machine Learning: Light GBM and SVM, KNN, Clustering,Random Forest
Technology: Python, Apache Spark, Pytorch, Apache Kafka, Flume, SQLLite, Jenkins, Git,Hive,Hbase,Ozie
Overview: Endpoint Detection and Protection Response detects, protects and responds to cyberattacks which adds to the
complexity of securing the enterprise. Each of the point products adds an agent to the endpoint and is often managed
independent of the other security technologies present on that endpoint.
Files received are server logs, client profile, schema profile,network settings. The Master Data Management was setup for
data profile and schema as part of Data sharing agreement with client. In Data Ingestion, the queue is consumed using
Apache Spark Batch and Steam component. Data Quality services were setup as part data validation. Multiple Layer were
setup in Spark component as part of data validation and cleansing, standardization and transformation.This involves
Feature Extraction and Feature Engineering for malware based on Static Analysis for PE and PDF file types. The metadata is
extracted from malware samples. The data is reduced and stored in HBase for Machine Learning Hive was setup for data
archiving . Ozie was used for scheduling.
Machine Learning: Thereafter, Data Pre-Processing, Data Cleaning is done. Based on exploratory analysis, regularly model
is created/updated and validated. Machine Learning algorithm techniques like Ensemble Learning & Boosting like SVM,
Light GBM,Random Forest Tensorlite are applied to best possible result is derived.
Role & Responsibilities:
 Part of Product Architecture Team.
 Model Creation, Data Pre-Processing, Data Cleaning.
 Feature Selection and Engineering.
 Model Development and Validation using Light GBM and Tensorlite
 Model Deployment and Validation and acceptance.
Outcome and Contributions:
o Complexity in performing Feature Engineering is avoided
o Improved Accuracy metrics of 4% from earlier traditional model
o By automation, 10 people per month manual effort is saved.
Team Size 10
Client: Bharat Co-operative Bank, Product Development
Project: Website Security Checking
5 | P a g e
Machine Learning: Light GBM and SVM, KNN, Clustering,Random Forest
Technology: Python, Apache Spark, Pytorch, Apache Kafka, Flume, MongoDB, Jenkins, Git,Hbase,Ozie
Overview: The product was developed to check the website sanity and plug in the product in other products. The product
was used to validate the URL and create the database of Threat Intelligence. The feature URL length, domain registration,
port, https, dns record age google page rank etc. are used in detection. Separate environment is created for data creation
and based on the data captured the features are extracted.
In Data Ingestion, the queue is consumed using Apache Spark Batch component. Data Quality services were setup as part
data validation. Multiple Layer were setup in Spark component as part of data validation and cleansing, standardization
and transformation. The data is reduced and stored in MongoDB for Machine Learning Hive was setup for data
archiving . Ozie was used for scheduling.
Some Samples were collected from third party and some were from server logs of managed services. The is continuous
process. EDA is performed for data standardization. The model is trained incrementally every week and thereby creating
rich threat intelligence repository.
Role & Responsibilities:
 Part of Product Architecture Team.
 Model Creation, Data Pre-Processing, Data Cleaning.
 Feature Selection and Engineering.
 Model Development and Validation using Clustering, Boosting and Random Forest.
 Model Deployment and Validation and acceptance.
Client: Product Development(Internal)
Team Size 15
Project: SOC Incident Log Ticket Allocation:
Machine Learning: Light GBM and SVM, KNN, Clustering,Random Forest
Technology: Python, Apache Spark, Pytorch, Apache Kafka, Flume, MongoDB, Jenkins, Git,Hive,Hbase,Ozie
Overview: The managed services handled lot of the security related measures for global customers. The SOC (Security
Operations Centre is established to achieve the objective. The ticket resolution time is important for customer and
capacity planning. Initially all the historical data available in excel sheet is used as input for training. EDA, Feature Selection
and Feature Engineering, PCA is done prior to training.
Apache Spark Batch component. Data Quality services were setup as part data validation. Multiple Layer were
setup in Spark component as part of data validation and cleansing, standardization and transformation. The data is
reduced and stored in MongoDB for Machine Learning Hive was setup for data
archiving . Ozie was used for scheduling.
On post training, whenever new ticket arrives, ETA is calculated and on closing variance is also captured. Based on this,
analysis is done and model is retrained.
Role & Responsibilities:
 Part of Product Architecture Team.
 Model Creation, Data Pre-Processing, Data Cleaning.
 Feature Selection and Engineering.
 Model Development and Validation using Clustering, Boosting and Random Forest.
 Model Deployment and Validation and acceptance.
Client: HDFC, AEGON, Reliance, Product Development(Internal)
Team Size 10
Project: Malicious PDF file detection:
Machine Learning: Light GBM and SVM, KNN, Clustering,Random Forest
Technology: Python, Apache Spark, Pytorch, Apache Kafka, Flume, SQLLite, Jenkins, Git,Hive,Hbase,Ozie
6 | P a g e
Overview: The PDF file is source of malware or hyperlinks to phishing site. The product was developed to check the
document sanity and plug in to other product. The PDF file have not of attributes or semi structures or snippets where
exploit can be easily injected. The features like java script, rich media, JBIG2Decode, Open Action etc. are vulnerable to
attack. Samples were collected from third part.
Apache Spark Batch component. Data Quality services were setup as part data validation. Multiple Layer were
setup in Spark component as part of data validation and cleansing, standardization and transformation. The data is
reduced and stored in MongoDB for Machine Learning Hive was setup for data
archiving . Ozie was used for scheduling.
Some Samples were collected from third party and internal threat repository. The is continuous process. EDA is performed
for data standardization. The model is trained incrementally every week and thereby creating rich threat intelligence
repository.
Role & Responsibilities:
 Part of Product Architecture Team.
 Model Creation, Data Pre-Processing, Data Cleaning.
 Feature Selection and Engineering.
 Model Development and Validation using Clustering, Boosting and Random Forest.
Model Deployment and Validation and acceptance.
Client: HDFC, AEGON, Reliance, Product Development(Internal)
Team Size 10
Project: IGA
Machine Learning: Light GBM and SVM, KNN, Clustering,Random Forest
Technology: Python, Apache Spark, Pytorch, Apache Kafka, Flume, MongoDB, Jenkins, Git,Hbase,Ozie
Overview: IGA is integrated access management and governance product which takes care entire life cycle of employee
engagement (on boarding and exit). During On boarding, employee id is created, access to different system is given after
approval. During exit, all the access and id are revoked. Machine Learning The data collected from multiple system like
attendance system, leave portal, access management, training system, appraisal system and other client multiple system.
Some of the attributes are Employee Salary, Employee Satisfaction, Promotion last year, hourly rate etc.
Apache Spark Batch component. Data Quality services were setup as part data validation. Multiple Layer were
setup in Spark component as part of data validation and cleansing, standardization and transformation. The data is
reduced and stored in MongoDB for Machine Learning Hive was setup for data
archiving . Ozie was used for scheduling.
The collected data cleansed, parsed, validated and thereafter feature selection and engineering, exploratory data analysis
are applied to derive multiple metrics. Powerful dashboard is created using Tableau. Machine Learning Algorithm
techniques like Ensemble Learning & Boosting like Support Vector Machine, XgBoost,PCA are applied to best possible
result is derived.
Role & Responsibilities:
 Part of Product Architecture Team.
 Model Creation, Data Pre-Processing, Data Cleaning.
 Feature Selection and Engineering.
 Model Development and Validation using Clustering, SVM, Boosting and Random Forest.
 Model Deployment and Validation and acceptance.
Team Size 15
Client: HDFC, AEGON, Axis Bank, Product Development(Internal)
Project: Bank Customer Churn
Machine Learning: Light GBM and SVM, KNN, Clustering,Random Forest
Technology: Python, Apache Spark, Pytorch, Apache Kafka, Flume, MongoDB, Jenkins, Git,Hive,Hbase,Ozie
Overview: The model is developed for Agricultural segment. There are multiple co-operative banks and financial agency
who can support farmers. The Churn based product is developed to find the shift of customers. The data is sourced from
FTP and batch wise processed. Some of the attributes like gender, geography, loan, products, subsidy, income etc. are
considered for model building. FTP was used as data source later replaced with Kafka-spark streams. The is continuous
7 | P a g e
process. EDA is performed for data standardization. The model is trained incrementally every week.
Role & Responsibilities:
 Part of Product Architecture Team.
 Model Creation, Data Pre-Processing, Data Cleaning.
 Feature Selection and Engineering.
 Model Development and Validation using Clustering, SVM, Boosting and Random Forest.
 Model Deployment and Validation and acceptance.
Client: Bharat Cooperative Bank
Team Size 15
Project: Insurance Claim Prediction
Machine Learning: Light GBM and SVM, KNN, Clustering,Random Forest
Technology: Python, Apache Spark, Pytorch, Apache Kafka, Flume, MongoDB, Jenkins, Git,Hive,Hbase,Ozie
Overview: Overview: The model is developed for special intervention of manual checking of claim based on historical claim
processing data as well as customer specific attributes like policy deductibles, exclusions, umbrella limit, collision details,
vehicle details. Initially FTP was used as data source later replaced with Kafka-spark streams. The is continuous process.
EDA is performed for data standardization. The model is trained incrementally every week and thereby creating rich threat
intelligence repository.
Role & Responsibilities:
 Part of Product Architecture Team.
 Model Creation, Data Pre-Processing, Data Cleaning.
 Feature Selection and Engineering.
 Model Development and Validation using Clustering, SVM, Boosting and Random Forest.
 Model Deployment and Validation and acceptance.
Client: HDFC ERGO
Team Size 15
Oil Card Fraud Detection(POC)
Overview: The model was developed as part of POC for Oil and Gas Company. In Europe, oil card, provides drivers of large
transport company to buy oil, pay toll as well as in some case allow to buy limited food in the tour. The model detects the
fraud on the transaction on time zone, transactions instances, latitude and longitude. The anomaly detection technique is
used for the same. This was developed on azue ml platform. Data source is csv. EDA is performed for data standardization.
The model is trained incrementally every week and thereby creating rich threat intelligence repository.
Role & Responsibilities:
 Part of Product Architecture Team.
 Model Creation, Data Pre-Processing, Data Cleaning.
 Feature Selection and Engineering.
 Model Development and Validation using Clustering, SVM, Boosting and Random Forest.
Model Deployment and Validation and acceptance.
Client: Oil and Gas
Team Size 5
BI Reports
Overview: This project involves migration from legacy platform to SSRS. There were nearly 200 reports. Utility was
developed to migrate sample reports to SS RS platform.
Client: Intel Bangalore
Team Size 30
BI Reports
Overview: This project involves migration from legacy platform to SSRS. There were nearly 40 reports.
Client: ACE Surety, US
8 | P a g e
Team Size 20
R & D Projects
Document Classification using NLP
Asset management using YOLO Object Detection
Chatbot for corona virus using lex and dialog flow and integrated to Facebook and Telegram
Wearing mask prediction for social distancing using RCNN.

More Related Content

What's hot

Anand madhab c linux
Anand madhab c linuxAnand madhab c linux
Anand madhab c linux
Anand Madhab
 
Identity based encryption
Identity based encryptionIdentity based encryption
Identity based encryption
kitechsolutions
 
Dlf2
Dlf2Dlf2
Dlf2
ANANDU KB
 
CV_Surya_6 years in testing
CV_Surya_6 years in testingCV_Surya_6 years in testing
CV_Surya_6 years in testingSurya M
 
9 yrs of Testing Exp_STB and DSL gateway products
9 yrs of Testing Exp_STB and DSL gateway products9 yrs of Testing Exp_STB and DSL gateway products
9 yrs of Testing Exp_STB and DSL gateway productsPrakash S M
 
Crime File System
Crime File SystemCrime File System
Crime File System
IJARIIT
 
Droisys Hr process
Droisys Hr process Droisys Hr process
Droisys Hr process
Droisys Inc
 
Multiple Services Throughput Optimization in a Hierarchical Middleware
Multiple Services Throughput Optimization in a Hierarchical MiddlewareMultiple Services Throughput Optimization in a Hierarchical Middleware
Multiple Services Throughput Optimization in a Hierarchical Middleware
Frederic Desprez
 

What's hot (20)

DipTech
DipTechDipTech
DipTech
 
Nanaji_Sahukara
Nanaji_SahukaraNanaji_Sahukara
Nanaji_Sahukara
 
Pravin_CV_4+years
Pravin_CV_4+yearsPravin_CV_4+years
Pravin_CV_4+years
 
Anand madhab c linux
Anand madhab c linuxAnand madhab c linux
Anand madhab c linux
 
Saravanaperumal b
Saravanaperumal bSaravanaperumal b
Saravanaperumal b
 
Gaurav Gupta
Gaurav GuptaGaurav Gupta
Gaurav Gupta
 
Identity based encryption
Identity based encryptionIdentity based encryption
Identity based encryption
 
Dlf2
Dlf2Dlf2
Dlf2
 
Mallikarjuna_Resume
Mallikarjuna_ResumeMallikarjuna_Resume
Mallikarjuna_Resume
 
Resume_Gajendrasharma
Resume_GajendrasharmaResume_Gajendrasharma
Resume_Gajendrasharma
 
Amit Gupta_CV
Amit Gupta_CVAmit Gupta_CV
Amit Gupta_CV
 
CV_Surya_6 years in testing
CV_Surya_6 years in testingCV_Surya_6 years in testing
CV_Surya_6 years in testing
 
9 yrs of Testing Exp_STB and DSL gateway products
9 yrs of Testing Exp_STB and DSL gateway products9 yrs of Testing Exp_STB and DSL gateway products
9 yrs of Testing Exp_STB and DSL gateway products
 
Macher it vacancies
Macher it vacanciesMacher it vacancies
Macher it vacancies
 
Bindu_Resume
Bindu_ResumeBindu_Resume
Bindu_Resume
 
Crime File System
Crime File SystemCrime File System
Crime File System
 
Droisys Hr process
Droisys Hr process Droisys Hr process
Droisys Hr process
 
Saurabh_Profile
Saurabh_ProfileSaurabh_Profile
Saurabh_Profile
 
Sainath_Resume
Sainath_ResumeSainath_Resume
Sainath_Resume
 
Multiple Services Throughput Optimization in a Hierarchical Middleware
Multiple Services Throughput Optimization in a Hierarchical MiddlewareMultiple Services Throughput Optimization in a Hierarchical Middleware
Multiple Services Throughput Optimization in a Hierarchical Middleware
 

Similar to Resume

Resume dilip kumar_gangwar
Resume dilip kumar_gangwarResume dilip kumar_gangwar
Resume dilip kumar_gangwar
Dilip Kumar Gangwar
 
Arocom - Projects and Resource Portfolio.pdf
Arocom - Projects and Resource Portfolio.pdfArocom - Projects and Resource Portfolio.pdf
Arocom - Projects and Resource Portfolio.pdf
Arocom IT Solutions Pvt. Ltd
 
AnujGupta_TechnologyConsultant
AnujGupta_TechnologyConsultantAnujGupta_TechnologyConsultant
AnujGupta_TechnologyConsultantAnuj Gupta
 
Resume dilip kumar_gangwar
Resume dilip kumar_gangwarResume dilip kumar_gangwar
Resume dilip kumar_gangwar
Dilip Kumar Gangwar
 
Current Resume
Current ResumeCurrent Resume
Current Resume
Dinesh Kumar
 
Resume dilip kumar_gangwar
Resume dilip kumar_gangwarResume dilip kumar_gangwar
Resume dilip kumar_gangwar
Dilip Kumar Gangwar
 
Resume_20112016
Resume_20112016Resume_20112016
Resume_20112016Tara Panda
 
Python AWS Expert Engineer (Consultant/Freelancer)
Python AWS Expert Engineer (Consultant/Freelancer)Python AWS Expert Engineer (Consultant/Freelancer)
Python AWS Expert Engineer (Consultant/Freelancer)
Saurabh Jaiswal
 
MY NEWEST RESUME
MY NEWEST RESUMEMY NEWEST RESUME
MY NEWEST RESUMEHan Yan
 
ManuNair_PM_Profile
ManuNair_PM_ProfileManuNair_PM_Profile
ManuNair_PM_ProfileManu M Nair
 
Ankush_Goyal_Resume_2years_Exp
Ankush_Goyal_Resume_2years_ExpAnkush_Goyal_Resume_2years_Exp
Ankush_Goyal_Resume_2years_ExpAnkush Goyal
 
Automation test lead
Automation test leadAutomation test lead
Automation test lead
senthil kumar
 
Venkata Sateesh_BigData_Latest-Resume
Venkata Sateesh_BigData_Latest-ResumeVenkata Sateesh_BigData_Latest-Resume
Venkata Sateesh_BigData_Latest-Resumevenkata sateeshs
 
Chethan Updated Resume
Chethan Updated ResumeChethan Updated Resume
Chethan Updated ResumeChethan H
 

Similar to Resume (20)

Resume dilip kumar_gangwar
Resume dilip kumar_gangwarResume dilip kumar_gangwar
Resume dilip kumar_gangwar
 
Arocom - Projects and Resource Portfolio.pdf
Arocom - Projects and Resource Portfolio.pdfArocom - Projects and Resource Portfolio.pdf
Arocom - Projects and Resource Portfolio.pdf
 
AnujGupta_TechnologyConsultant
AnujGupta_TechnologyConsultantAnujGupta_TechnologyConsultant
AnujGupta_TechnologyConsultant
 
Resume dilip kumar_gangwar
Resume dilip kumar_gangwarResume dilip kumar_gangwar
Resume dilip kumar_gangwar
 
Current Resume
Current ResumeCurrent Resume
Current Resume
 
KotaSriHarsha
KotaSriHarsha KotaSriHarsha
KotaSriHarsha
 
Resume dilip kumar_gangwar
Resume dilip kumar_gangwarResume dilip kumar_gangwar
Resume dilip kumar_gangwar
 
Resume_20112016
Resume_20112016Resume_20112016
Resume_20112016
 
Python AWS Expert Engineer (Consultant/Freelancer)
Python AWS Expert Engineer (Consultant/Freelancer)Python AWS Expert Engineer (Consultant/Freelancer)
Python AWS Expert Engineer (Consultant/Freelancer)
 
SrinivasaVithal_CV
SrinivasaVithal_CVSrinivasaVithal_CV
SrinivasaVithal_CV
 
Neha_Maggu
Neha_MagguNeha_Maggu
Neha_Maggu
 
MY NEWEST RESUME
MY NEWEST RESUMEMY NEWEST RESUME
MY NEWEST RESUME
 
ManuNair_PM_Profile
ManuNair_PM_ProfileManuNair_PM_Profile
ManuNair_PM_Profile
 
Venkat_Resume_Updated
Venkat_Resume_UpdatedVenkat_Resume_Updated
Venkat_Resume_Updated
 
Ankush_Goyal_Resume_2years_Exp
Ankush_Goyal_Resume_2years_ExpAnkush_Goyal_Resume_2years_Exp
Ankush_Goyal_Resume_2years_Exp
 
Venkat_Resume_Updated
Venkat_Resume_UpdatedVenkat_Resume_Updated
Venkat_Resume_Updated
 
Automation test lead
Automation test leadAutomation test lead
Automation test lead
 
Pavan kumar k
Pavan kumar kPavan kumar k
Pavan kumar k
 
Venkata Sateesh_BigData_Latest-Resume
Venkata Sateesh_BigData_Latest-ResumeVenkata Sateesh_BigData_Latest-Resume
Venkata Sateesh_BigData_Latest-Resume
 
Chethan Updated Resume
Chethan Updated ResumeChethan Updated Resume
Chethan Updated Resume
 

Recently uploaded

一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 

Recently uploaded (20)

一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 

Resume

  • 1. 1 | P a g e Shiv Shankar Dutta Phone – +91-9136819634 Email – shivdutta@protonmail.com GIT: https://github.com/Shivdutta Medium: https://medium.com/@shivdutta LinkedIn: https://www.linkedin.com/in/75ssd/ Experience Summary  Experienced Data Scientist/Data Solution Architect in designing and developing machine learning and deep learning solutions in line with quality and regulatory standards.  Hands on experience in the areas of Classification Solutions, Regression methods, Decision Trees, Neural Networks, NLP, Chatbots, Image and Video Analytics.  Data Science Professional with experience in all stages of data processing and insightsdelivery.  Experience working in Start-up, mid and large organization for project/product development, services and delivery.  Worked with various clients like HDFC, Shell, Indian Governments, AIG, Allstate  Experience in conducting analytics assessment workshop and client requirement gathering.  Gather, evaluate & document business requirements related to analytics, translate to analytics solution definition & ability to implement using Python  Deep involvement in handling the critical deliverables, benchmarking solutions, driving Key Metrics, best practices, maintaining Productivity and ensuring projects areprofitable.  Successful liaison between the business users and technical developers, working on Onsite and offshore model for multiple deliveries. Planning and prioritizing the work product to meet time lines.  Defining analytics product road map in line with Enterprise Architecture Framework.  Design and Development of Machine Learning Models, Deep Learning Models, Data/Text Mining, Image Analysis Classification and NLP from start to end for POC and Analytics Solutions in any Domain or any Business Use Case.  Hands on experience in handling unstructured data like image, video and text  Hands on experience in data collection, feature selection and feature engineering from multiple different data sources like No-SQL DB (Hive, HBase, Mongo DB), Flat files, SQL-DB  Hands on experience in building predictive models in Real Time/Near Real Time/Batch using machine learning  Exposure in end to end data pipeline creation in enterprise system using different tools: Flume, Kafka, Spark, Flask, Docker  Model Validation from Testing and Client Stakeholder. Aligning with business and technology for deployment, validation and acceptance.  Manage and mentor team.  TOGAF 9.1 Certified. Experience in presales activities like technical solution and estimation, POC, customer demo.  Well versed with Agile/Waterfall methodologies, CMMI Level 5 process, Estimation techniques, Requirement Gathering and Elicitation, Design using UML techniques.  Hands on Expertise on data governance data lineage data processes DML and data architecture control execution, Master Data management (MDM) and Data Governance (DG).  Exposure to deployment ML models in Kubernetes and Docker Cluster in premise or hybrid environment  Exposure in interfacing of IoT/Connected Device/Sensor for data analysis and Edge Analytics.  Model Quantization for hand held devices using tensorlite.
  • 2. 2 | P a g e Primary Skills Machine Learning Regression, K Means Clustering, Decision Tree, SVM, Bayes Theorem, Naive Bayes, Random Forest, LGBM, XGBoost,Aproiri,Time Series-ARIMA Neural Network CNN, RNN, AutoEncoders, Keras,Pytorch,TensorFlow ,YOLO, Open CV, SSD, BERT,Resnet,Inception,RCNN,SSD-Mobilenet, NLP NLTK, Spacy, POS, Tokenization, Stemming and Lemmitization,BERT, Word2vec,Glove,Embedding RDBMS SQL Server, MySQL, PostgreSQL Statistical Techniques Hypothesis, ANOVA, Chi-Square No SQL DB MongoDB Languages Python, R, C# Enterprise Architecture TOGAF Certified Chat bots Lex, LUIS, RASA, Dialog flow Messaging Kafka Tools PyCaret, AutoML,Jupyter Notebook,Spyder,Anaconda Methodology Agile & Waterfall Secondary Skills Infrastructure Management: Hortonworks Ambari, Cloudera Hue, Sagemaker, Cloud ML,Docker and Kubernetes Transformation: Sqoop, Apache Spark ,Pyspark, Flume Big Data Hadoop, HDFS, Yarn, Zookeeper Others Hive, Pig, HBase,Scala Cognitive Services: Cloud/Open source Cognitive Services Employment History Tenure Company Position Apr 17 to Jun 19 Sequretek Pvt. Ltd. Technology Architect/Data Scientist Nov 2015 to Mar 17 CGI India Ltd. Solution/Technical Architect/ML(Associate Consultant) Oct 2008 to Nov 2015 Rolta India Ltd. Solution/Technical Architect/Project Manager/Presales Architect/BI/Machine Learning(Senior Manager) July 2004 to Oct 2008 Syntel Project Lead June 2003 to June 2004 L & T Infotech Ltd Software Engineer
  • 3. 3 | P a g e Project: Hand Detection System for Shredder Machine Technology/Deep Learning: RCNN, Python, Google Coral Overview: The project was taken prevent the industrial accident to working near the shredding machine. The Shredding machine or document shredder is a mechanical device for cutting paper and other media which contain information into fragments so small that the information can no longer be retrieved.Putting the paper in machine is manual activity and during this sometime accident occurs. To stop the machine automatically,system should detect the hand beyond and alert user on crossing beyong limit line. The alert is in form for buzzer, red lighting and stopping of the machine. The GPIO pin of Coral is connected to machine for stopping. The images of hands were labeled and trained using RCNN. Role & Responsibilities:  Part of Product Architecture Team.  Model Development and Validation using RCNN  Model Deployment using RCNN and Validation and acceptance.  Development to code for hand detection Outcome and Contributions: o High accuracy with more than 95%. o By automation, accident was completely avoided. Team Size: 2 Client: Freelance Project: Manage Detection and Response Machine Learning: Light GBM and SVM, KNN, Clustering, Neural Network like RNN Technology: Python, Apache Spark, Pytorch, Apache Kafka, Flume, MongoDB, Jenkins, Git,Hive,Hbase,Ozie Overview: Manage Detection and Response is a combination of technology and skills to deliver advanced threat detection, deep threat analytics, global threat intelligence, faster incident mitigation, and collaborative breach response on a 24x7 basis. The endpoints (IOT devices and Enterprise servers) have system scanning done by EMS system or scan component. Apache Flume agents capture the logs and sends to Topic in Apache Kafka. Files received are server logs, client profile, schema profile,network settings. The Master Data Management was setup for data profile and schema as part of Data sharing agreement with client. In Data Ingestion, the queue is consumed using Apache Spark Batch and Steam component. Data Quality services were setup as part data validation. Multiple Layer were setup in Spark component as part of data validation and cleansing, standardization and transformation.The data is reduced and stored in HBase for Machine Learning Hive was setup for data archiving . Ozie was used for scheduling. Machine Learning Algorithm techniques like Ensemble Learning & Boosting like SVM, Light GBM, Random Forest, Kmeans Clustering, KNN, RNN are applied to best possible result is derived. Nov 2002 to June 2003 Amtech Communication Senior Application Developer Sep 2001 to Oct 2002 Nazara.com Senior Programmer Sep 1999 to Sep 2001 Ideaz Netechnologies Programmer Recognitions Manager Award in L&T Infotech CMMI Level 5 participation in Rolta Qualifications Bachelor of Engineering in Electronics and Telecommunication, Marathawada University. First Class. Certifications Coursera - Neural Networks 2019-03 and Deep Learning Open Group: 2018-02 Open Group - TOGAF 9.1 Certified Udemy: 2019-07 Deep Learning with TensorFlow 2.0 [2019] Udemy: 2017-02 Big Data with Spark Streaming and Pyspark
  • 4. 4 | P a g e The data is processed using Machine Learning for threat detection. The output is stored in Mongo DB and displayed in dashboard. Exception, Audit Trail were and service accounts were created as part of framework. Role & Responsibilities:  Part of Product Architecture Team.  Feature Selection and Engineering for Web Attack, Network Attack, Malware Attack.  Model Development and Validation using Light GBM and SVM, Neural Network like RNN  Model Deployment and Validation and acceptance.  Collaborating with agile with cross-functional teams  Development of data ingestion, log processing component using Apache Spark/Flume, Kafka and HDFS and MongoDB Outcome and Contributions: o Complexity in performing Feature Engineering is avoided o High accuracy with more than 95% with compared to manual classification around 60% o By automation, 4 people per month manual effort for each customer is saved. o Approximately 5 to 10 licenses cost of HP ArcSight for each customer is saved. Team Size 15 Client: HDFC, DMart, Product Development Project: EDPR Machine Learning Machine Learning: Light GBM and SVM, KNN, Clustering,Random Forest Technology: Python, Apache Spark, Pytorch, Apache Kafka, Flume, SQLLite, Jenkins, Git,Hive,Hbase,Ozie Overview: Endpoint Detection and Protection Response detects, protects and responds to cyberattacks which adds to the complexity of securing the enterprise. Each of the point products adds an agent to the endpoint and is often managed independent of the other security technologies present on that endpoint. Files received are server logs, client profile, schema profile,network settings. The Master Data Management was setup for data profile and schema as part of Data sharing agreement with client. In Data Ingestion, the queue is consumed using Apache Spark Batch and Steam component. Data Quality services were setup as part data validation. Multiple Layer were setup in Spark component as part of data validation and cleansing, standardization and transformation.This involves Feature Extraction and Feature Engineering for malware based on Static Analysis for PE and PDF file types. The metadata is extracted from malware samples. The data is reduced and stored in HBase for Machine Learning Hive was setup for data archiving . Ozie was used for scheduling. Machine Learning: Thereafter, Data Pre-Processing, Data Cleaning is done. Based on exploratory analysis, regularly model is created/updated and validated. Machine Learning algorithm techniques like Ensemble Learning & Boosting like SVM, Light GBM,Random Forest Tensorlite are applied to best possible result is derived. Role & Responsibilities:  Part of Product Architecture Team.  Model Creation, Data Pre-Processing, Data Cleaning.  Feature Selection and Engineering.  Model Development and Validation using Light GBM and Tensorlite  Model Deployment and Validation and acceptance. Outcome and Contributions: o Complexity in performing Feature Engineering is avoided o Improved Accuracy metrics of 4% from earlier traditional model o By automation, 10 people per month manual effort is saved. Team Size 10 Client: Bharat Co-operative Bank, Product Development Project: Website Security Checking
  • 5. 5 | P a g e Machine Learning: Light GBM and SVM, KNN, Clustering,Random Forest Technology: Python, Apache Spark, Pytorch, Apache Kafka, Flume, MongoDB, Jenkins, Git,Hbase,Ozie Overview: The product was developed to check the website sanity and plug in the product in other products. The product was used to validate the URL and create the database of Threat Intelligence. The feature URL length, domain registration, port, https, dns record age google page rank etc. are used in detection. Separate environment is created for data creation and based on the data captured the features are extracted. In Data Ingestion, the queue is consumed using Apache Spark Batch component. Data Quality services were setup as part data validation. Multiple Layer were setup in Spark component as part of data validation and cleansing, standardization and transformation. The data is reduced and stored in MongoDB for Machine Learning Hive was setup for data archiving . Ozie was used for scheduling. Some Samples were collected from third party and some were from server logs of managed services. The is continuous process. EDA is performed for data standardization. The model is trained incrementally every week and thereby creating rich threat intelligence repository. Role & Responsibilities:  Part of Product Architecture Team.  Model Creation, Data Pre-Processing, Data Cleaning.  Feature Selection and Engineering.  Model Development and Validation using Clustering, Boosting and Random Forest.  Model Deployment and Validation and acceptance. Client: Product Development(Internal) Team Size 15 Project: SOC Incident Log Ticket Allocation: Machine Learning: Light GBM and SVM, KNN, Clustering,Random Forest Technology: Python, Apache Spark, Pytorch, Apache Kafka, Flume, MongoDB, Jenkins, Git,Hive,Hbase,Ozie Overview: The managed services handled lot of the security related measures for global customers. The SOC (Security Operations Centre is established to achieve the objective. The ticket resolution time is important for customer and capacity planning. Initially all the historical data available in excel sheet is used as input for training. EDA, Feature Selection and Feature Engineering, PCA is done prior to training. Apache Spark Batch component. Data Quality services were setup as part data validation. Multiple Layer were setup in Spark component as part of data validation and cleansing, standardization and transformation. The data is reduced and stored in MongoDB for Machine Learning Hive was setup for data archiving . Ozie was used for scheduling. On post training, whenever new ticket arrives, ETA is calculated and on closing variance is also captured. Based on this, analysis is done and model is retrained. Role & Responsibilities:  Part of Product Architecture Team.  Model Creation, Data Pre-Processing, Data Cleaning.  Feature Selection and Engineering.  Model Development and Validation using Clustering, Boosting and Random Forest.  Model Deployment and Validation and acceptance. Client: HDFC, AEGON, Reliance, Product Development(Internal) Team Size 10 Project: Malicious PDF file detection: Machine Learning: Light GBM and SVM, KNN, Clustering,Random Forest Technology: Python, Apache Spark, Pytorch, Apache Kafka, Flume, SQLLite, Jenkins, Git,Hive,Hbase,Ozie
  • 6. 6 | P a g e Overview: The PDF file is source of malware or hyperlinks to phishing site. The product was developed to check the document sanity and plug in to other product. The PDF file have not of attributes or semi structures or snippets where exploit can be easily injected. The features like java script, rich media, JBIG2Decode, Open Action etc. are vulnerable to attack. Samples were collected from third part. Apache Spark Batch component. Data Quality services were setup as part data validation. Multiple Layer were setup in Spark component as part of data validation and cleansing, standardization and transformation. The data is reduced and stored in MongoDB for Machine Learning Hive was setup for data archiving . Ozie was used for scheduling. Some Samples were collected from third party and internal threat repository. The is continuous process. EDA is performed for data standardization. The model is trained incrementally every week and thereby creating rich threat intelligence repository. Role & Responsibilities:  Part of Product Architecture Team.  Model Creation, Data Pre-Processing, Data Cleaning.  Feature Selection and Engineering.  Model Development and Validation using Clustering, Boosting and Random Forest. Model Deployment and Validation and acceptance. Client: HDFC, AEGON, Reliance, Product Development(Internal) Team Size 10 Project: IGA Machine Learning: Light GBM and SVM, KNN, Clustering,Random Forest Technology: Python, Apache Spark, Pytorch, Apache Kafka, Flume, MongoDB, Jenkins, Git,Hbase,Ozie Overview: IGA is integrated access management and governance product which takes care entire life cycle of employee engagement (on boarding and exit). During On boarding, employee id is created, access to different system is given after approval. During exit, all the access and id are revoked. Machine Learning The data collected from multiple system like attendance system, leave portal, access management, training system, appraisal system and other client multiple system. Some of the attributes are Employee Salary, Employee Satisfaction, Promotion last year, hourly rate etc. Apache Spark Batch component. Data Quality services were setup as part data validation. Multiple Layer were setup in Spark component as part of data validation and cleansing, standardization and transformation. The data is reduced and stored in MongoDB for Machine Learning Hive was setup for data archiving . Ozie was used for scheduling. The collected data cleansed, parsed, validated and thereafter feature selection and engineering, exploratory data analysis are applied to derive multiple metrics. Powerful dashboard is created using Tableau. Machine Learning Algorithm techniques like Ensemble Learning & Boosting like Support Vector Machine, XgBoost,PCA are applied to best possible result is derived. Role & Responsibilities:  Part of Product Architecture Team.  Model Creation, Data Pre-Processing, Data Cleaning.  Feature Selection and Engineering.  Model Development and Validation using Clustering, SVM, Boosting and Random Forest.  Model Deployment and Validation and acceptance. Team Size 15 Client: HDFC, AEGON, Axis Bank, Product Development(Internal) Project: Bank Customer Churn Machine Learning: Light GBM and SVM, KNN, Clustering,Random Forest Technology: Python, Apache Spark, Pytorch, Apache Kafka, Flume, MongoDB, Jenkins, Git,Hive,Hbase,Ozie Overview: The model is developed for Agricultural segment. There are multiple co-operative banks and financial agency who can support farmers. The Churn based product is developed to find the shift of customers. The data is sourced from FTP and batch wise processed. Some of the attributes like gender, geography, loan, products, subsidy, income etc. are considered for model building. FTP was used as data source later replaced with Kafka-spark streams. The is continuous
  • 7. 7 | P a g e process. EDA is performed for data standardization. The model is trained incrementally every week. Role & Responsibilities:  Part of Product Architecture Team.  Model Creation, Data Pre-Processing, Data Cleaning.  Feature Selection and Engineering.  Model Development and Validation using Clustering, SVM, Boosting and Random Forest.  Model Deployment and Validation and acceptance. Client: Bharat Cooperative Bank Team Size 15 Project: Insurance Claim Prediction Machine Learning: Light GBM and SVM, KNN, Clustering,Random Forest Technology: Python, Apache Spark, Pytorch, Apache Kafka, Flume, MongoDB, Jenkins, Git,Hive,Hbase,Ozie Overview: Overview: The model is developed for special intervention of manual checking of claim based on historical claim processing data as well as customer specific attributes like policy deductibles, exclusions, umbrella limit, collision details, vehicle details. Initially FTP was used as data source later replaced with Kafka-spark streams. The is continuous process. EDA is performed for data standardization. The model is trained incrementally every week and thereby creating rich threat intelligence repository. Role & Responsibilities:  Part of Product Architecture Team.  Model Creation, Data Pre-Processing, Data Cleaning.  Feature Selection and Engineering.  Model Development and Validation using Clustering, SVM, Boosting and Random Forest.  Model Deployment and Validation and acceptance. Client: HDFC ERGO Team Size 15 Oil Card Fraud Detection(POC) Overview: The model was developed as part of POC for Oil and Gas Company. In Europe, oil card, provides drivers of large transport company to buy oil, pay toll as well as in some case allow to buy limited food in the tour. The model detects the fraud on the transaction on time zone, transactions instances, latitude and longitude. The anomaly detection technique is used for the same. This was developed on azue ml platform. Data source is csv. EDA is performed for data standardization. The model is trained incrementally every week and thereby creating rich threat intelligence repository. Role & Responsibilities:  Part of Product Architecture Team.  Model Creation, Data Pre-Processing, Data Cleaning.  Feature Selection and Engineering.  Model Development and Validation using Clustering, SVM, Boosting and Random Forest. Model Deployment and Validation and acceptance. Client: Oil and Gas Team Size 5 BI Reports Overview: This project involves migration from legacy platform to SSRS. There were nearly 200 reports. Utility was developed to migrate sample reports to SS RS platform. Client: Intel Bangalore Team Size 30 BI Reports Overview: This project involves migration from legacy platform to SSRS. There were nearly 40 reports. Client: ACE Surety, US
  • 8. 8 | P a g e Team Size 20 R & D Projects Document Classification using NLP Asset management using YOLO Object Detection Chatbot for corona virus using lex and dialog flow and integrated to Facebook and Telegram Wearing mask prediction for social distancing using RCNN.