1. PrasadGaikwad
8087002445
prasadgaikwad09@gmail.com
Big Data Hadoop developer with over three years of experience in using cutting edge
technologies such as Big Data on Cloud along with machine learning and data visualization
& discovery to help businesses identify new opportunities and create disruptive business
models. Leading and mentoring team of solution developers in integrating new age
technologies with enterprise ETL and DW appliances. Won Tata Technologies Spot award
for proactive involvement in performance tuning and automating manual deliverables.
Professional Experience
BigData Hadoop Developer| Lead
Aug'15to present
TataTechnologies
Currently working as Big Data lead in digital team for Tata motors ltd, a major Indian
automotive manufacturer. Helping business identify value added insights by designing and
implementing cost effective Cloud based solutions, integrating open source technologies
viz., Spark, Hive, Sqoop, Kafka, Oozie with Enterprise applications such as SAP, CRM and
Cordys. Implementing multiple fast paced PoCs to validate concepts and mature it into
projects if successful.
Hands on experience in using Spark, Hive,Sqoop, Kafka, Oozie,Hue, Ambari and Zeppelin
for ingesting, cleaning and integrating Enterprise wide application data.
Excellent understanding on working of Hadoop and Spark internals such as HDFS,
MapReduce, YARN, RDD,Dataframes and Dataset.
Designing solution architectures using excellent understanding of various cloud offerings
and hands on experience in provisioning and managing resources such as Amazon EMR,
EC2, S3, RedShift, RDS, Kinesis, Lambda, Google Compute Engine, GCS, BigQuery, HDInsight
(Azure)
Setting up and using multi-node EMR cluster, Cloudera CDH cluster on AWS using Cloudera
Director and on premise HDP Hadoop Cluster using Ambari.
Activeparticipation in Summits, Sessions and hands-on workshops on large scale data
processing by Solution Architects and SMEs from Industry leaders such as AWS, Cloudera,
Teradata, Microsoftand Google to evaluate and understand Big Data appliances and data
lake solutions offered,on-Premise vs. on-Cloud architecture and how they fit-in current
enterprise ITlandscape.
2. Projects-
ProfixDataMart – Tata Motors (AWS) - AWS EMR,Spark, Hive,Sqoop,Oozie,SAS
Designed and deployed Datamart on top of Amazon S3 establishing ODBC connectivity to
SAS modelling team via Hiveusing Amazon EMR. Automated daily refresh of data from
Teradata EDWbox using combination of sqoop, spark and oozie to create daily ETLjobs
running on on-demand EMRcluster.
Project Wave-Tata Motors (AWS) - AWS EMR,Spark, Hive, S3, Tableau
Designed and deployed trend analysis dashboards in Tableau to predict impact of
fluctuations in commodity market on VC costusing Hive,Spark, S3, Hue and Zeppelin.
Automated provisioning of EMR clusters with spot instances forexecuting batch ETL
workloads.
Vehiclestoppageanalysis– TataMotors(GCP) - Google BigQuery, Python, MS-SQL, Tableau
Designed and deployed pure cloud Big data solution of telemetry data, to convertlive
tracking of vehiclesinto stoppage heat maps, used clustering algorithms to identify points of
interest to business based on most frequent stoppages across India.
POCs-
SAP BOM dataexplosionusingHive, Pig,Spark and Tableau(onpremise).
Deployed on-premise HDP cluster to develop interactiveTableau dashboards displaying
component/vehicle/plant wise cost variations. Used Pig, Hive, Spark, and Ambari for
Ingesting and integrating BOMdata from SAP BW withCRM
DesigninganddevelopingDataLakestrategy
Developing data lake strategy foringesting structured, semi-structured and unstructured
data generated by various applications in current enterprise landscape and exposing only
relevant data to end users.
SolutionDeveloper|ETLLead
January2014 to Jul'15
TataTechnologies
Worked as ETL lead in Business Intelligence team for Tata motors ltd, a major Indian
automotive manufacturer. Integrated Siebel CRM, SAP and Cordys Portal data using
Informatica in Teradata EDW (10 TB+). Delivered end-to-end business solution,
implemented PoCs, Performance tuned and provided production support for one of the
largest CRM deployment. Technical lead for ETL developers, responsible for deploying CRs
in production, monitoring and troubleshooting execution of nightly ETL.
3. Hands‐on workexperience in
Providing L2 and L3 support on production environment serving 5000+ application end
users on reporting engine (OBIEE).
PerformanceTuning – 40%+ improvement in ETL execution time from 4 hours down to 2.5
hours by implementing Informatica, DAC and Teradata best practices.
Understanding business requirement fordesigning end to end Module deployment.
Building and maintaining complex ETLsfor data integration across business applications
with about 1000+ mappings, 600+ tables and 10+ TBrelational database on Teradata using
Informatica Powercenterand DAC.
Building custom data models to integrate Enterprise data with CSV uploads tocreate
complete picture of business landscape.
Debugging and resolving ETL failures and data discrepancies of existing models.
Shell scripting and writing cronjobs for executing Teradata scripts, ftp/sftp transfers
between application servers.
Technical Proficiency
Big Data Technologies Hadoop (Cloudera CDH, HortonWorks HDP,AWS EMR)
HDFS, Hive, Kafka, MapReduce, Oozie,Pig, Spark, Sqoop,
Zookeeper, Zeppelin
Cloud Vendors Amazon AWS, Google Cloud, MicrosoftAzure
Database Teradata, MongoDB, Google BigQuery, Oracle,MySQL
Tools Informatica, DAC, Teradata Utilities
Monitoring and Reporting Apache Ambari, HUE, Cloudera Manager, TDViewpoint, OBIEE,
Tableau
Programming/Scripting
Languages
SQL, python, java, scala
Operating Systems RHEL, CentOS, Fedora, Windows
AcademicQualifications
B.E. in Information Technology from Walchand Institute of Technology,Solapur with 72%
Diploma in Computer Engineering from Govt.Polytechnic Mumbai with 81.3%
S.S.C. with 88.3%