SlideShare a Scribd company logo
Prashanth Shankar Kumar
Certified Teradata Developer
Certified Hadoop Developer
Profile
 8 years of experience in IT industry and strong experience in Application development, Data
Analytics, Hadoop Platform, Teradata and IBM Mainframes in Insurance and Financial sectors
 Around 2.5 years of Expertisein core Hadoop and Hadoop technology stackwhich includes HDFS,
Sqoop, Hive, HBase, Impala, Spark(on going) and Map Reduce programming
 Familiar with data architecture including data ingestion, pipeline design, Hadoop information
architecture, data modelling, data mining, machine learning, advanced data processing and
optimizing ETL workflows
 Experience in Continuous Integration tools such as Jenkins.
 Experience in developing and deploying Web Services (SOAP)
 Knowledge on the Restful Interface
 Able to assess business rules, collaborate with stakeholders and perform source-to-target data
mapping, design and review.
 Conducted induction and orientation sessions for the newly joined peers and acted as mentor
for Hadoop topology and cluster configuration in SF, life and auto insurance.
 Full exposure in development using Agile Methodology and good exposure in Agile Process
such as TDD (Test Driven Development), Scrum Iteration.
 Strong Knowledge of web-based architecture, Strong hands on technical debugging and
troubleshooting experience with distributed, enterprise applications and having knowledge of
full life cycle software development (SDLC)
 Received appreciation certificate from Client Director for data reusability and effectiveness.

 Worked inseveral areas ofData Warehouse including Business Analysis,Requirement Gathering,
Design, Development, Testing, and Implementation.
 Fully conversant with all aspects of systems analysis, design, testing and entire SDLC.
 Optimization of Queries in a Teradata database environment.
 Conversant with Teradata Utilities like BTEQ, FASTLOAD, MULTILOAD, FASTEXPORT, and
TPUMP.
 Developed UNIX shell script with BTEQ to dynamically generate SQL script and run it in order to
DROP the oldest range on Partitioned Primary Index tables using derived tables and parsing data
dictionary table dbc.index constraints.
 Completed upgrading entire Guardian environment from Teradata Version 6.2 to Teradata 12 in
the spring of 2009.
 Developed UNIX shell script with BTEQ to dynamically generate SQL script and run it in order to
COLLECT STATISTICS USING SAMPLE by parsing the data dictionary table’s dbc.columnstatistics
and dbc.indexstatistics
 Worked with Data Modelers, ETL staff, BI developers, and Business System Analysts in
Business/Functional requirements review and solution design.
 Worked with data modeling with Erwin and also Visio 2010.
 Worked on performance tuning on the sql query as part of performance improvement.
 Process oriented, focused on standardization, streamlining, and implementation of best
practices.
 Design, implementation and administration of a robust backup plan and recovery techniques.
 Implemented Dual Active systems for mission critical applications, and ensured availability of
system for the same.
 Excellent Documentation and Process Management skills with an ability to effectively
understand the business requirements to develop a quality product.
 Worked as a Lead/ Project Manager and have the expertise to monitor and work on the various
driven approaches.
 Skilled communicator Thorough in explaining complex IT knowledge to the subordinates,
management team and functional team.
Technical Skills
 RDBMS: Teradata V2R5/V12, SQL Server, Oracle
 Hadoop Ecosystem: HDFS, Hadoop MapReduce, HBase, Hive, Pig, Sqoop, Flume,
Zookeeper, Cloudera CDH-4, Kerberos, JSON and YARN
 ETL Tools: Data Stage
 RDBMS Utilities: BTEQ, FastLoad, MultiLoad, TPump, FastExport and
o Query manager
 Prog. Languages: Core Java, SQL, Teradata, Mainframe, JCL, BTEQ, MLOAD,
o FASTLOAD, FAST EXPORT, TPUMP,TPT
 Operating Systems: Linux, Unix, Windows Family
 Specialized Tools : Amazon AWS, Putty, SOAP UI & Restful Service, SoapUI, Tortoise SVN,
WAT, Puppet, Micro Focus Rumba, IBM Data studio, Abend-aid, File–aid, PgAdmin III,
WinSCP, TRAC,Quality Centre, CA7
 Protocol Knowledge: TCP/IP
Technical Training & Certification:
 Teradata Certified Developer
 Certified Hadoop Developer
 Successfully completed the following training programs and certified
o HBASE, YARN, SQOOP, HIVE, PIG
o Postgre SQL (2013, TCS)
o Data Stage (2012, TCS)
o Mainframes/COBOL (2010, TCS)
o Banking Concepts (2010, TCS)
o Teradata (2012, TCS)
o Java, JCL (2009, TCS)
Professional Experience
Bank of America, Charlotte, NC Sep’ 2015 – Present
Hadoop Developer/ Tech Lead
Quantitative Risk Technology
The project is intended to convert the existing Oracle code to Hadoop. Entire processing in HDFS
would be done through Impala, Hive, Sqoop, HBase, Map reduce, Autosys, Spark programs and
also handle performance tuning and conduct regular backups.
• Implemented Hive tables and HQL Queries for the reports.
• Developed Impala queries to analyze reducer output data.
• Developed MapReduce programs to parse the raw data, filtered data based on id for faster
processing and store the refined data in partitioned tables.
• Involved in troubleshooting the issues, errors reported by cluster monitoring software
provided by Cloudera Manager
• Create Insert Overwrite queries with dynamic partition to store the data.
• Setting Task Status to display debug information and display the status of the map reduce
job on job tracker web page.
• Used Oozie to automate data loading into the Hadoop Distributed File System and Hive to
pre-process the data on the daily catch up run basis.
• The pre-processed data in AVRO was used as an input to the map reduce program. AVRO
was used for multi output process and map reduce Is better with AVRO.
• Configure a cluster to periodically archive the log files for debugging and reduce the
processing load on the cluster and tune the cluster for better performance.
• Involved in Extracting, loading Data from RDBMS to hive using Sqoop.
• Involved in writing the Oozie workflow for the data and map reduce code to run. Used Fork
and Join in cases where parallel processing can be done.
• Coded a shell wrapper which will help in triggering jobs from UI.
• Involved in design decision for performing the Hadoop transformation.
• Worked on writing HQL for Sqooping data from Oracle to Hadoop. Wrote an oozie
workflow for moving data to stage and then to live.
• Worked on Sqoop import and export of data.
• Tested raw data, executed performance scripts and also shared responsibility for
administration of Hadoop, Hive and Pig.
• Involved in the design of the unstructured JSON data format and building required Serdes
for the Web services.
• Increasing the performance of the Hadoop cluster by using hashing and salting
methodologies to do load balancing.
• Optimizing the HBase service data retrieval calls native to region and improving range
based scans.
• Highly involved in designing the next generation data architecture for the unstructured
data.
Languages and DB: Hive, Impala, Spark, Oracle and HBase
Software and Tools: HDFS, Hadoop MapReduce, Hive, Pig, Sqoop, Oozie, Cloudera CDH-4, HUE,
Flume Impala, Micro Focus Rumba, SOAP Service, Mule ESB, Jenkins, SVN, PgAdmin-III,IBM data
studio, Teradata and Oracle
State Farm Insurance, Bloomington, IL Oct’ 2013 – Sep’ 2015
Hadoop Developer/ Development Lead
ICP - CDE – DC8 – MFI Base and Enhancements (Oct 2013 – Till date)
This project is intended to transform the existing Billing and Payments application to future state
by storing and processing the data entirely in HDFS. Entire processing in HDFS would be done
through Pig, Hive, Sqoop, HBase, Map reduce programs and also handle performance tuning and
conduct regular backups. MFI also involves migrating the State farm payment plan information to
the ICP platform to improve the user interface response.
• Worked as a Project Manager/Project lead and mentored all the peer on the system
and induced knowledge on the same.
• Understand business needs, analyze functional specifications and map those to mules and
web services of the existing applications to insert/update/retrieve data from No-SQL HBase
• Installed and managed a 4-node 4.8TB Hadoop cluster for SOW and eventually
configured 12-Node 36TB cluster for prod and implementation environment.
• Implemented Hive tables and HQL Queries for the reports.
• Created web services that would interact with HBase Client API to use get/put methods for
different applications.
• Used JSON data type in Hive. Developed Hive queries to analyze reducer output data.
• Developed MapReduce programs to parse the raw data, populate staging tables and store
the refined data in partitioned tables.
• Involved in troubleshooting the issues, errors reported by cluster monitoring software
provided by Cloudera Manager
• Creating simple rule based optimizations like pruning non referenced columns from table
scans.
• Setting Task Status to display debug information and display the status of the map reduce
job on job tracker web page.
• Used Oozie to automate data loading into the Hadoop Distributed File System and PIG to
pre-process the data on the daily catch up run basis.
• Configure a cluster to periodically archive the log files for debugging and reduce the
processing load on the cluster and tune the cluster for better performance.
• Involved in Extracting, loading Data from RDBMS to Hive using Sqoop.
• Tested raw data, executed performance scripts and also shared responsibility for
administration of Hadoop, Hive and Pig.
• Involved in the design of the unstructured JSON data format and building required Serdes
for the Web services.
• Increasing the performance of the Hadoop cluster by using hashing and salting
methodologies to do load balancing.
• Optimizing the HBase service data retrieval calls native to region and improving range
based scans.
• Highly involved in designing the next generation data architecture for the unstructured
data.
•
Languages and DB: DB2, Postgre SQL, MySQL, Expression and HBase
Software and Tools: HDFS, Hadoop MapReduce, Hive, Pig, Sqoop, Oozie, Cloudera CDH-4, HUE,
Flume Impala, Micro Focus Rumba, SOAP Service, Mule ESB, Jenkins, SVN, PgAdmin-III,IBM data
studio and Terdata
ICP - CDE – DC8 – Checkout (Oct 2012 – Oct 2013)
This project is focused on enhancing the customer experience across all products for the billing,
payment and disbursement processes.
• Worked on analyzing Hadoop cluster and different big data analytic tools including Pig,
HBase and Sqoop.
• Responsible for building scalable distributed data solutions using Hadoop.
• Involved in loading data from LINUX file system to HDFS.
• Worked on installing cluster, commissioning & decommissioning of data node, name node
recovery, capacity planning, and slots configuration.
• Created HBase tables to store variable data coming from different portfolios.
Bank of America, Bangalore, India Nov’ 2008 – Sep’ 2012
Teradata Developer
Technical Environment: Teradata V2R12, UNIX Shell Scripting, Teradata SQL
Assistant, TDWM, BTEQ, COBOL,JCL
Description: The main objective of the system is extracting the data from different legacy systems
and loading into mart. Developed the business intelligence system to quickly identify customer
needs and develop better target services using Data stage. And the database is Teradata with
Mainframes a large amount of customer-related data from diverse sources was consolidated,
including customer billing, ordering, and support and service usage. The idea is to build a Decision
Support System for executives.
Key Responsibilities & Achievements:
 Performed major role in understanding the business requirements and designing and loading
data into data warehouse (ETL).
 Working with utilities Like BTEQ, MLOAD, FLOAD etc…
 Collection of data source information from all the legacy systems and existing data stores.
 Imported various Application Sources, created Targets and Transformations using Data stage
Designer (Source analyzer, Warehouse developer, Transformation developer, and Mapping
designer.
 Worked on data modeling using the Visio and Erwin
 Involved in Data Extraction, Transformation and Loading from source systems to ODS.
 Developed complex mappings using multiple sources and targets in different databases.
 Actively participated in the performance of Data stage mappings.
 Worked with production support team for solving the production issues.
 Knowledge on Data stage and Informatica.
 Security administration including creating and maintaining user accounts, passwords, profiles,
roles and access rights
 Tuned various queries by COLLECTING STATISTICS on columns in the WHERE and JOIN
expressions
 Knowledge on Teradata Architecture.
 Knowledge on Star Schema and Snow Flakes Schema.
 Developing BTEQ scripts to load the data from the staging tables to the base tables
 Created BTEQ scripts to extract data from warehouse for downstream
 Performed Unit level of testing as part of development. Also assisted the Testing team for
running SIT/UAT/Pre-Production testing.
IGCAR(Indira Gandhi Centre of Atomic Research, Kalpakkam,Tamilnadu June 2007 – Nov
2008(Internship)
Modelling using Rhapsody:
 Worked as an intern and helped in creating a working model for a system.
 Worked on creating a logical diagram for a complex system for the plant.
 Analysis and design of integration for an instrumentation project using IBM Rational Rhapsody.
 Whereas Spectra CX is built on IBM RationalSoftware Architect (RSA), which provides arich UML
modelling capability, this project integrates Rhapsody as the front-end for UML modelling in the
Spectra CX tool.
 This entails updates to IGCAR’s instrumentation product using the Rhapsody API to generate
Rhapsody profiles in addition to RSA profiles to extend UML for a particular embedded software
domain, especially based on the Software Communications Architecture (SCA) for software-
defined radios.
 Also, plug-in extensions in the Rhapsody tooling push the user’s model for validation and code
generation.
Prashanth Kumar_Hadoop_NEW

More Related Content

What's hot

Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
Hitendra Kumar
 
Hive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenchesHive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenches
DataWorks Summit
 
Show me the Money! Cost & Resource Tracking for Hadoop and Storm
Show me the Money! Cost & Resource  Tracking for Hadoop and Storm Show me the Money! Cost & Resource  Tracking for Hadoop and Storm
Show me the Money! Cost & Resource Tracking for Hadoop and Storm
DataWorks Summit/Hadoop Summit
 
DoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics PlatformDoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics Platform
martinbpeters
 
Srikanth hadoop 3.6yrs_hyd
Srikanth hadoop 3.6yrs_hydSrikanth hadoop 3.6yrs_hyd
Srikanth hadoop 3.6yrs_hyd
srikanth K
 
a9TD6cbzTZotpJihekdc+w==.docx
a9TD6cbzTZotpJihekdc+w==.docxa9TD6cbzTZotpJihekdc+w==.docx
a9TD6cbzTZotpJihekdc+w==.docx
VasimMemon4
 
Big dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosql
Khanderao Kand
 
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at TwitterHadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at TwitterBill Graham
 
Apache Spark & Hadoop
Apache Spark & HadoopApache Spark & Hadoop
Apache Spark & Hadoop
MapR Technologies
 
Integration of SAP HANA with Hadoop
Integration of SAP HANA with HadoopIntegration of SAP HANA with Hadoop
Integration of SAP HANA with Hadoop
Ramkumar Rajendran
 
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQLIntroduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQLNick Dimiduk
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentationMapR Technologies
 
Rama prasad owk etl hadoop_developer
Rama prasad owk etl hadoop_developerRama prasad owk etl hadoop_developer
Rama prasad owk etl hadoop_developer
ramaprasad owk
 
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
Cloudera, Inc.
 
Harnessing Big Data in Real-Time
Harnessing Big Data in Real-TimeHarnessing Big Data in Real-Time
Harnessing Big Data in Real-TimeDataWorks Summit
 
Splice machine-bloor-webinar-data-lakes
Splice machine-bloor-webinar-data-lakesSplice machine-bloor-webinar-data-lakes
Splice machine-bloor-webinar-data-lakes
Edgar Alejandro Villegas
 
Hawq wp 042313_final
Hawq wp 042313_finalHawq wp 042313_final
Hawq wp 042313_finalEMC
 

What's hot (20)

Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Hive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenchesHive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenches
 
Show me the Money! Cost & Resource Tracking for Hadoop and Storm
Show me the Money! Cost & Resource  Tracking for Hadoop and Storm Show me the Money! Cost & Resource  Tracking for Hadoop and Storm
Show me the Money! Cost & Resource Tracking for Hadoop and Storm
 
DoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics PlatformDoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics Platform
 
Srikanth hadoop 3.6yrs_hyd
Srikanth hadoop 3.6yrs_hydSrikanth hadoop 3.6yrs_hyd
Srikanth hadoop 3.6yrs_hyd
 
a9TD6cbzTZotpJihekdc+w==.docx
a9TD6cbzTZotpJihekdc+w==.docxa9TD6cbzTZotpJihekdc+w==.docx
a9TD6cbzTZotpJihekdc+w==.docx
 
Big dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosql
 
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at TwitterHadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
 
Apache Spark & Hadoop
Apache Spark & HadoopApache Spark & Hadoop
Apache Spark & Hadoop
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
YUVAM17_BIGDATA
YUVAM17_BIGDATAYUVAM17_BIGDATA
YUVAM17_BIGDATA
 
Integration of SAP HANA with Hadoop
Integration of SAP HANA with HadoopIntegration of SAP HANA with Hadoop
Integration of SAP HANA with Hadoop
 
Introduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQLIntroduction to Hadoop, HBase, and NoSQL
Introduction to Hadoop, HBase, and NoSQL
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentation
 
Rama prasad owk etl hadoop_developer
Rama prasad owk etl hadoop_developerRama prasad owk etl hadoop_developer
Rama prasad owk etl hadoop_developer
 
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
 
Atul Mithe
Atul MitheAtul Mithe
Atul Mithe
 
Harnessing Big Data in Real-Time
Harnessing Big Data in Real-TimeHarnessing Big Data in Real-Time
Harnessing Big Data in Real-Time
 
Splice machine-bloor-webinar-data-lakes
Splice machine-bloor-webinar-data-lakesSplice machine-bloor-webinar-data-lakes
Splice machine-bloor-webinar-data-lakes
 
Hawq wp 042313_final
Hawq wp 042313_finalHawq wp 042313_final
Hawq wp 042313_final
 

Viewers also liked

Celia Beales CV 2015
Celia Beales CV 2015Celia Beales CV 2015
Celia Beales CV 2015Celia BEALES
 
DanBowersResume Updated Office
DanBowersResume Updated OfficeDanBowersResume Updated Office
DanBowersResume Updated OfficeDan Bowers
 
Teradata - Hadoop profile
Teradata - Hadoop profileTeradata - Hadoop profile
Teradata - Hadoop profile
Santosh Dandge
 
Aprimo /teradata CIm and DMC profile
Aprimo /teradata CIm and DMC profileAprimo /teradata CIm and DMC profile
Aprimo /teradata CIm and DMC profile
Shiva Kumar Reddy Pocha
 
Mukhtar resume etl_developer
Mukhtar resume etl_developerMukhtar resume etl_developer
Mukhtar resume etl_developer
Mukhtar Mohammed
 
Building a Data Driven Growth Organization by Heather Zynczak, CMO, Domo
Building a Data Driven Growth Organization by Heather Zynczak, CMO, Domo Building a Data Driven Growth Organization by Heather Zynczak, CMO, Domo
Building a Data Driven Growth Organization by Heather Zynczak, CMO, Domo
Traction Conf
 
Resume rodney matejek sql server bi developer
Resume rodney matejek sql server bi developerResume rodney matejek sql server bi developer
Resume rodney matejek sql server bi developer
rmatejek
 
Resume for oracle developer
Resume for oracle developerResume for oracle developer
Resume for oracle developerBalaji vinayagam
 
Civil engineer resume samples india
Civil engineer resume samples  indiaCivil engineer resume samples  india
Civil engineer resume samples india
santhose menon
 
Sql developer resume
Sql developer resumeSql developer resume
Sql developer resumeguest9db49b8
 
Carter Rees Resume
Carter Rees ResumeCarter Rees Resume
Carter Rees Resume
Carter Rees, PhD
 

Viewers also liked (13)

Celia Beales CV 2015
Celia Beales CV 2015Celia Beales CV 2015
Celia Beales CV 2015
 
DanBowersResume Updated Office
DanBowersResume Updated OfficeDanBowersResume Updated Office
DanBowersResume Updated Office
 
Teradata - Hadoop profile
Teradata - Hadoop profileTeradata - Hadoop profile
Teradata - Hadoop profile
 
Aprimo /teradata CIm and DMC profile
Aprimo /teradata CIm and DMC profileAprimo /teradata CIm and DMC profile
Aprimo /teradata CIm and DMC profile
 
Resume_ETL__Testing
Resume_ETL__TestingResume_ETL__Testing
Resume_ETL__Testing
 
Mukhtar resume etl_developer
Mukhtar resume etl_developerMukhtar resume etl_developer
Mukhtar resume etl_developer
 
Building a Data Driven Growth Organization by Heather Zynczak, CMO, Domo
Building a Data Driven Growth Organization by Heather Zynczak, CMO, Domo Building a Data Driven Growth Organization by Heather Zynczak, CMO, Domo
Building a Data Driven Growth Organization by Heather Zynczak, CMO, Domo
 
Resume rodney matejek sql server bi developer
Resume rodney matejek sql server bi developerResume rodney matejek sql server bi developer
Resume rodney matejek sql server bi developer
 
Resume for oracle developer
Resume for oracle developerResume for oracle developer
Resume for oracle developer
 
Civil engineer resume samples india
Civil engineer resume samples  indiaCivil engineer resume samples  india
Civil engineer resume samples india
 
Sql developer resume
Sql developer resumeSql developer resume
Sql developer resume
 
Carter Rees Resume
Carter Rees ResumeCarter Rees Resume
Carter Rees Resume
 
Yeswanth-Resume
Yeswanth-ResumeYeswanth-Resume
Yeswanth-Resume
 

Similar to Prashanth Kumar_Hadoop_NEW

Sunshine consulting Mopuru Babu CV_Java_J2ee_Spring_Bigdata_Scala_Spark
Sunshine consulting Mopuru Babu CV_Java_J2ee_Spring_Bigdata_Scala_SparkSunshine consulting Mopuru Babu CV_Java_J2ee_Spring_Bigdata_Scala_Spark
Sunshine consulting Mopuru Babu CV_Java_J2ee_Spring_Bigdata_Scala_Spark
Mopuru Babu
 
Sunshine consulting mopuru babu cv_java_j2_ee_spring_bigdata_scala_Spark
Sunshine consulting mopuru babu cv_java_j2_ee_spring_bigdata_scala_SparkSunshine consulting mopuru babu cv_java_j2_ee_spring_bigdata_scala_Spark
Sunshine consulting mopuru babu cv_java_j2_ee_spring_bigdata_scala_Spark
Mopuru Babu
 
Sureh hadoop 3 years t
Sureh hadoop 3 years tSureh hadoop 3 years t
Sureh hadoop 3 years t
suresh thallapelly
 
KOTI_RESUME_(1) (2)
KOTI_RESUME_(1) (2)KOTI_RESUME_(1) (2)
KOTI_RESUME_(1) (2)ch koti
 
Chandan's_Resume
Chandan's_ResumeChandan's_Resume
Chandan's_ResumeChandan Das
 
Rajeev kumar apache_spark & scala developer
Rajeev kumar apache_spark & scala developerRajeev kumar apache_spark & scala developer
Rajeev kumar apache_spark & scala developer
Rajeev Kumar
 
Nagarjuna_Damarla
Nagarjuna_DamarlaNagarjuna_Damarla
Nagarjuna_DamarlaNag Arjun
 
Deepankar Sehdev- Resume2015
Deepankar Sehdev- Resume2015Deepankar Sehdev- Resume2015
Deepankar Sehdev- Resume2015Deepankar Sehdev
 
Maharshi_Amin_416
Maharshi_Amin_416Maharshi_Amin_416
Maharshi_Amin_416mamin1411
 

Similar to Prashanth Kumar_Hadoop_NEW (20)

BigData_Krishna Kumar Sharma
BigData_Krishna Kumar SharmaBigData_Krishna Kumar Sharma
BigData_Krishna Kumar Sharma
 
Resume
ResumeResume
Resume
 
Sunshine consulting Mopuru Babu CV_Java_J2ee_Spring_Bigdata_Scala_Spark
Sunshine consulting Mopuru Babu CV_Java_J2ee_Spring_Bigdata_Scala_SparkSunshine consulting Mopuru Babu CV_Java_J2ee_Spring_Bigdata_Scala_Spark
Sunshine consulting Mopuru Babu CV_Java_J2ee_Spring_Bigdata_Scala_Spark
 
Sunshine consulting mopuru babu cv_java_j2_ee_spring_bigdata_scala_Spark
Sunshine consulting mopuru babu cv_java_j2_ee_spring_bigdata_scala_SparkSunshine consulting mopuru babu cv_java_j2_ee_spring_bigdata_scala_Spark
Sunshine consulting mopuru babu cv_java_j2_ee_spring_bigdata_scala_Spark
 
hadoop resume
hadoop resumehadoop resume
hadoop resume
 
Sureh hadoop 3 years t
Sureh hadoop 3 years tSureh hadoop 3 years t
Sureh hadoop 3 years t
 
Robin_Hadoop
Robin_HadoopRobin_Hadoop
Robin_Hadoop
 
Gubendran Lakshmanan
Gubendran LakshmananGubendran Lakshmanan
Gubendran Lakshmanan
 
KOTI_RESUME_(1) (2)
KOTI_RESUME_(1) (2)KOTI_RESUME_(1) (2)
KOTI_RESUME_(1) (2)
 
Monika_Raghuvanshi
Monika_RaghuvanshiMonika_Raghuvanshi
Monika_Raghuvanshi
 
Chandan's_Resume
Chandan's_ResumeChandan's_Resume
Chandan's_Resume
 
Rajeev kumar apache_spark & scala developer
Rajeev kumar apache_spark & scala developerRajeev kumar apache_spark & scala developer
Rajeev kumar apache_spark & scala developer
 
DeepeshRehi
DeepeshRehiDeepeshRehi
DeepeshRehi
 
Balamurugan.KM_Arch
Balamurugan.KM_Arch Balamurugan.KM_Arch
Balamurugan.KM_Arch
 
Mukul-Resume
Mukul-ResumeMukul-Resume
Mukul-Resume
 
Yasar resume 2
Yasar resume 2Yasar resume 2
Yasar resume 2
 
Nagarjuna_Damarla
Nagarjuna_DamarlaNagarjuna_Damarla
Nagarjuna_Damarla
 
Resume_VipinKP
Resume_VipinKPResume_VipinKP
Resume_VipinKP
 
Deepankar Sehdev- Resume2015
Deepankar Sehdev- Resume2015Deepankar Sehdev- Resume2015
Deepankar Sehdev- Resume2015
 
Maharshi_Amin_416
Maharshi_Amin_416Maharshi_Amin_416
Maharshi_Amin_416
 

Prashanth Kumar_Hadoop_NEW

  • 1. Prashanth Shankar Kumar Certified Teradata Developer Certified Hadoop Developer Profile  8 years of experience in IT industry and strong experience in Application development, Data Analytics, Hadoop Platform, Teradata and IBM Mainframes in Insurance and Financial sectors  Around 2.5 years of Expertisein core Hadoop and Hadoop technology stackwhich includes HDFS, Sqoop, Hive, HBase, Impala, Spark(on going) and Map Reduce programming  Familiar with data architecture including data ingestion, pipeline design, Hadoop information architecture, data modelling, data mining, machine learning, advanced data processing and optimizing ETL workflows  Experience in Continuous Integration tools such as Jenkins.  Experience in developing and deploying Web Services (SOAP)  Knowledge on the Restful Interface  Able to assess business rules, collaborate with stakeholders and perform source-to-target data mapping, design and review.  Conducted induction and orientation sessions for the newly joined peers and acted as mentor for Hadoop topology and cluster configuration in SF, life and auto insurance.  Full exposure in development using Agile Methodology and good exposure in Agile Process such as TDD (Test Driven Development), Scrum Iteration.  Strong Knowledge of web-based architecture, Strong hands on technical debugging and troubleshooting experience with distributed, enterprise applications and having knowledge of full life cycle software development (SDLC)  Received appreciation certificate from Client Director for data reusability and effectiveness.   Worked inseveral areas ofData Warehouse including Business Analysis,Requirement Gathering, Design, Development, Testing, and Implementation.  Fully conversant with all aspects of systems analysis, design, testing and entire SDLC.  Optimization of Queries in a Teradata database environment.  Conversant with Teradata Utilities like BTEQ, FASTLOAD, MULTILOAD, FASTEXPORT, and TPUMP.  Developed UNIX shell script with BTEQ to dynamically generate SQL script and run it in order to DROP the oldest range on Partitioned Primary Index tables using derived tables and parsing data dictionary table dbc.index constraints.  Completed upgrading entire Guardian environment from Teradata Version 6.2 to Teradata 12 in the spring of 2009.  Developed UNIX shell script with BTEQ to dynamically generate SQL script and run it in order to COLLECT STATISTICS USING SAMPLE by parsing the data dictionary table’s dbc.columnstatistics and dbc.indexstatistics  Worked with Data Modelers, ETL staff, BI developers, and Business System Analysts in Business/Functional requirements review and solution design.
  • 2.  Worked with data modeling with Erwin and also Visio 2010.  Worked on performance tuning on the sql query as part of performance improvement.  Process oriented, focused on standardization, streamlining, and implementation of best practices.  Design, implementation and administration of a robust backup plan and recovery techniques.  Implemented Dual Active systems for mission critical applications, and ensured availability of system for the same.  Excellent Documentation and Process Management skills with an ability to effectively understand the business requirements to develop a quality product.  Worked as a Lead/ Project Manager and have the expertise to monitor and work on the various driven approaches.  Skilled communicator Thorough in explaining complex IT knowledge to the subordinates, management team and functional team. Technical Skills  RDBMS: Teradata V2R5/V12, SQL Server, Oracle  Hadoop Ecosystem: HDFS, Hadoop MapReduce, HBase, Hive, Pig, Sqoop, Flume, Zookeeper, Cloudera CDH-4, Kerberos, JSON and YARN  ETL Tools: Data Stage  RDBMS Utilities: BTEQ, FastLoad, MultiLoad, TPump, FastExport and o Query manager  Prog. Languages: Core Java, SQL, Teradata, Mainframe, JCL, BTEQ, MLOAD, o FASTLOAD, FAST EXPORT, TPUMP,TPT  Operating Systems: Linux, Unix, Windows Family  Specialized Tools : Amazon AWS, Putty, SOAP UI & Restful Service, SoapUI, Tortoise SVN, WAT, Puppet, Micro Focus Rumba, IBM Data studio, Abend-aid, File–aid, PgAdmin III, WinSCP, TRAC,Quality Centre, CA7  Protocol Knowledge: TCP/IP Technical Training & Certification:  Teradata Certified Developer  Certified Hadoop Developer  Successfully completed the following training programs and certified o HBASE, YARN, SQOOP, HIVE, PIG o Postgre SQL (2013, TCS) o Data Stage (2012, TCS) o Mainframes/COBOL (2010, TCS) o Banking Concepts (2010, TCS) o Teradata (2012, TCS) o Java, JCL (2009, TCS)
  • 3. Professional Experience Bank of America, Charlotte, NC Sep’ 2015 – Present Hadoop Developer/ Tech Lead Quantitative Risk Technology The project is intended to convert the existing Oracle code to Hadoop. Entire processing in HDFS would be done through Impala, Hive, Sqoop, HBase, Map reduce, Autosys, Spark programs and also handle performance tuning and conduct regular backups. • Implemented Hive tables and HQL Queries for the reports. • Developed Impala queries to analyze reducer output data. • Developed MapReduce programs to parse the raw data, filtered data based on id for faster processing and store the refined data in partitioned tables. • Involved in troubleshooting the issues, errors reported by cluster monitoring software provided by Cloudera Manager • Create Insert Overwrite queries with dynamic partition to store the data. • Setting Task Status to display debug information and display the status of the map reduce job on job tracker web page. • Used Oozie to automate data loading into the Hadoop Distributed File System and Hive to pre-process the data on the daily catch up run basis. • The pre-processed data in AVRO was used as an input to the map reduce program. AVRO was used for multi output process and map reduce Is better with AVRO. • Configure a cluster to periodically archive the log files for debugging and reduce the processing load on the cluster and tune the cluster for better performance. • Involved in Extracting, loading Data from RDBMS to hive using Sqoop. • Involved in writing the Oozie workflow for the data and map reduce code to run. Used Fork and Join in cases where parallel processing can be done. • Coded a shell wrapper which will help in triggering jobs from UI. • Involved in design decision for performing the Hadoop transformation. • Worked on writing HQL for Sqooping data from Oracle to Hadoop. Wrote an oozie workflow for moving data to stage and then to live. • Worked on Sqoop import and export of data. • Tested raw data, executed performance scripts and also shared responsibility for administration of Hadoop, Hive and Pig. • Involved in the design of the unstructured JSON data format and building required Serdes for the Web services. • Increasing the performance of the Hadoop cluster by using hashing and salting methodologies to do load balancing.
  • 4. • Optimizing the HBase service data retrieval calls native to region and improving range based scans. • Highly involved in designing the next generation data architecture for the unstructured data. Languages and DB: Hive, Impala, Spark, Oracle and HBase Software and Tools: HDFS, Hadoop MapReduce, Hive, Pig, Sqoop, Oozie, Cloudera CDH-4, HUE, Flume Impala, Micro Focus Rumba, SOAP Service, Mule ESB, Jenkins, SVN, PgAdmin-III,IBM data studio, Teradata and Oracle State Farm Insurance, Bloomington, IL Oct’ 2013 – Sep’ 2015 Hadoop Developer/ Development Lead ICP - CDE – DC8 – MFI Base and Enhancements (Oct 2013 – Till date) This project is intended to transform the existing Billing and Payments application to future state by storing and processing the data entirely in HDFS. Entire processing in HDFS would be done through Pig, Hive, Sqoop, HBase, Map reduce programs and also handle performance tuning and conduct regular backups. MFI also involves migrating the State farm payment plan information to the ICP platform to improve the user interface response. • Worked as a Project Manager/Project lead and mentored all the peer on the system and induced knowledge on the same. • Understand business needs, analyze functional specifications and map those to mules and web services of the existing applications to insert/update/retrieve data from No-SQL HBase • Installed and managed a 4-node 4.8TB Hadoop cluster for SOW and eventually configured 12-Node 36TB cluster for prod and implementation environment. • Implemented Hive tables and HQL Queries for the reports. • Created web services that would interact with HBase Client API to use get/put methods for different applications. • Used JSON data type in Hive. Developed Hive queries to analyze reducer output data. • Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables. • Involved in troubleshooting the issues, errors reported by cluster monitoring software provided by Cloudera Manager • Creating simple rule based optimizations like pruning non referenced columns from table scans. • Setting Task Status to display debug information and display the status of the map reduce job on job tracker web page.
  • 5. • Used Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data on the daily catch up run basis. • Configure a cluster to periodically archive the log files for debugging and reduce the processing load on the cluster and tune the cluster for better performance. • Involved in Extracting, loading Data from RDBMS to Hive using Sqoop. • Tested raw data, executed performance scripts and also shared responsibility for administration of Hadoop, Hive and Pig. • Involved in the design of the unstructured JSON data format and building required Serdes for the Web services. • Increasing the performance of the Hadoop cluster by using hashing and salting methodologies to do load balancing. • Optimizing the HBase service data retrieval calls native to region and improving range based scans. • Highly involved in designing the next generation data architecture for the unstructured data. • Languages and DB: DB2, Postgre SQL, MySQL, Expression and HBase Software and Tools: HDFS, Hadoop MapReduce, Hive, Pig, Sqoop, Oozie, Cloudera CDH-4, HUE, Flume Impala, Micro Focus Rumba, SOAP Service, Mule ESB, Jenkins, SVN, PgAdmin-III,IBM data studio and Terdata ICP - CDE – DC8 – Checkout (Oct 2012 – Oct 2013) This project is focused on enhancing the customer experience across all products for the billing, payment and disbursement processes. • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase and Sqoop. • Responsible for building scalable distributed data solutions using Hadoop. • Involved in loading data from LINUX file system to HDFS. • Worked on installing cluster, commissioning & decommissioning of data node, name node recovery, capacity planning, and slots configuration. • Created HBase tables to store variable data coming from different portfolios. Bank of America, Bangalore, India Nov’ 2008 – Sep’ 2012 Teradata Developer Technical Environment: Teradata V2R12, UNIX Shell Scripting, Teradata SQL Assistant, TDWM, BTEQ, COBOL,JCL Description: The main objective of the system is extracting the data from different legacy systems and loading into mart. Developed the business intelligence system to quickly identify customer needs and develop better target services using Data stage. And the database is Teradata with Mainframes a large amount of customer-related data from diverse sources was consolidated, including customer billing, ordering, and support and service usage. The idea is to build a Decision Support System for executives. Key Responsibilities & Achievements:
  • 6.  Performed major role in understanding the business requirements and designing and loading data into data warehouse (ETL).  Working with utilities Like BTEQ, MLOAD, FLOAD etc…  Collection of data source information from all the legacy systems and existing data stores.  Imported various Application Sources, created Targets and Transformations using Data stage Designer (Source analyzer, Warehouse developer, Transformation developer, and Mapping designer.  Worked on data modeling using the Visio and Erwin  Involved in Data Extraction, Transformation and Loading from source systems to ODS.  Developed complex mappings using multiple sources and targets in different databases.  Actively participated in the performance of Data stage mappings.  Worked with production support team for solving the production issues.  Knowledge on Data stage and Informatica.  Security administration including creating and maintaining user accounts, passwords, profiles, roles and access rights  Tuned various queries by COLLECTING STATISTICS on columns in the WHERE and JOIN expressions  Knowledge on Teradata Architecture.  Knowledge on Star Schema and Snow Flakes Schema.  Developing BTEQ scripts to load the data from the staging tables to the base tables  Created BTEQ scripts to extract data from warehouse for downstream  Performed Unit level of testing as part of development. Also assisted the Testing team for running SIT/UAT/Pre-Production testing. IGCAR(Indira Gandhi Centre of Atomic Research, Kalpakkam,Tamilnadu June 2007 – Nov 2008(Internship) Modelling using Rhapsody:  Worked as an intern and helped in creating a working model for a system.  Worked on creating a logical diagram for a complex system for the plant.  Analysis and design of integration for an instrumentation project using IBM Rational Rhapsody.  Whereas Spectra CX is built on IBM RationalSoftware Architect (RSA), which provides arich UML modelling capability, this project integrates Rhapsody as the front-end for UML modelling in the Spectra CX tool.  This entails updates to IGCAR’s instrumentation product using the Rhapsody API to generate Rhapsody profiles in addition to RSA profiles to extend UML for a particular embedded software domain, especially based on the Software Communications Architecture (SCA) for software- defined radios.  Also, plug-in extensions in the Rhapsody tooling push the user’s model for validation and code generation.