Prashanth Kumar_Hadoop_NEW

Prashanth Shankar Kumar
Certified Teradata Developer
Certified Hadoop Developer
Profile
 8 years of experience in IT industry and strong experience in Application development, Data
Analytics, Hadoop Platform, Teradata and IBM Mainframes in Insurance and Financial sectors
 Around 2.5 years of Expertisein core Hadoop and Hadoop technology stackwhich includes HDFS,
Sqoop, Hive, HBase, Impala, Spark(on going) and Map Reduce programming
 Familiar with data architecture including data ingestion, pipeline design, Hadoop information
architecture, data modelling, data mining, machine learning, advanced data processing and
optimizing ETL workflows
 Experience in Continuous Integration tools such as Jenkins.
 Experience in developing and deploying Web Services (SOAP)
 Knowledge on the Restful Interface
 Able to assess business rules, collaborate with stakeholders and perform source-to-target data
mapping, design and review.
 Conducted induction and orientation sessions for the newly joined peers and acted as mentor
for Hadoop topology and cluster configuration in SF, life and auto insurance.
 Full exposure in development using Agile Methodology and good exposure in Agile Process
such as TDD (Test Driven Development), Scrum Iteration.
 Strong Knowledge of web-based architecture, Strong hands on technical debugging and
troubleshooting experience with distributed, enterprise applications and having knowledge of
full life cycle software development (SDLC)
 Received appreciation certificate from Client Director for data reusability and effectiveness.

 Worked inseveral areas ofData Warehouse including Business Analysis,Requirement Gathering,
Design, Development, Testing, and Implementation.
 Fully conversant with all aspects of systems analysis, design, testing and entire SDLC.
 Optimization of Queries in a Teradata database environment.
 Conversant with Teradata Utilities like BTEQ, FASTLOAD, MULTILOAD, FASTEXPORT, and
TPUMP.
 Developed UNIX shell script with BTEQ to dynamically generate SQL script and run it in order to
DROP the oldest range on Partitioned Primary Index tables using derived tables and parsing data
dictionary table dbc.index constraints.
 Completed upgrading entire Guardian environment from Teradata Version 6.2 to Teradata 12 in
the spring of 2009.
 Developed UNIX shell script with BTEQ to dynamically generate SQL script and run it in order to
COLLECT STATISTICS USING SAMPLE by parsing the data dictionary table’s dbc.columnstatistics
and dbc.indexstatistics
 Worked with Data Modelers, ETL staff, BI developers, and Business System Analysts in
Business/Functional requirements review and solution design.

 Worked with data modeling with Erwin and also Visio 2010.
 Worked on performance tuning on the sql query as part of performance improvement.
 Process oriented, focused on standardization, streamlining, and implementation of best
practices.
 Design, implementation and administration of a robust backup plan and recovery techniques.
 Implemented Dual Active systems for mission critical applications, and ensured availability of
system for the same.
 Excellent Documentation and Process Management skills with an ability to effectively
understand the business requirements to develop a quality product.
 Worked as a Lead/ Project Manager and have the expertise to monitor and work on the various
driven approaches.
 Skilled communicator Thorough in explaining complex IT knowledge to the subordinates,
management team and functional team.
Technical Skills
 RDBMS: Teradata V2R5/V12, SQL Server, Oracle
 Hadoop Ecosystem: HDFS, Hadoop MapReduce, HBase, Hive, Pig, Sqoop, Flume,
Zookeeper, Cloudera CDH-4, Kerberos, JSON and YARN
 ETL Tools: Data Stage
 RDBMS Utilities: BTEQ, FastLoad, MultiLoad, TPump, FastExport and
o Query manager
 Prog. Languages: Core Java, SQL, Teradata, Mainframe, JCL, BTEQ, MLOAD,
o FASTLOAD, FAST EXPORT, TPUMP,TPT
 Operating Systems: Linux, Unix, Windows Family
 Specialized Tools : Amazon AWS, Putty, SOAP UI & Restful Service, SoapUI, Tortoise SVN,
WAT, Puppet, Micro Focus Rumba, IBM Data studio, Abend-aid, File–aid, PgAdmin III,
WinSCP, TRAC,Quality Centre, CA7
 Protocol Knowledge: TCP/IP
Technical Training & Certification:
 Teradata Certified Developer
 Certified Hadoop Developer
 Successfully completed the following training programs and certified
o HBASE, YARN, SQOOP, HIVE, PIG
o Postgre SQL (2013, TCS)
o Data Stage (2012, TCS)
o Mainframes/COBOL (2010, TCS)
o Banking Concepts (2010, TCS)
o Teradata (2012, TCS)
o Java, JCL (2009, TCS)

Professional Experience
Bank of America, Charlotte, NC Sep’ 2015 – Present
Hadoop Developer/ Tech Lead
Quantitative Risk Technology
The project is intended to convert the existing Oracle code to Hadoop. Entire processing in HDFS
would be done through Impala, Hive, Sqoop, HBase, Map reduce, Autosys, Spark programs and
also handle performance tuning and conduct regular backups.
• Implemented Hive tables and HQL Queries for the reports.
• Developed Impala queries to analyze reducer output data.
• Developed MapReduce programs to parse the raw data, filtered data based on id for faster
processing and store the refined data in partitioned tables.
• Involved in troubleshooting the issues, errors reported by cluster monitoring software
provided by Cloudera Manager
• Create Insert Overwrite queries with dynamic partition to store the data.
• Setting Task Status to display debug information and display the status of the map reduce
job on job tracker web page.
• Used Oozie to automate data loading into the Hadoop Distributed File System and Hive to
pre-process the data on the daily catch up run basis.
• The pre-processed data in AVRO was used as an input to the map reduce program. AVRO
was used for multi output process and map reduce Is better with AVRO.
• Configure a cluster to periodically archive the log files for debugging and reduce the
processing load on the cluster and tune the cluster for better performance.
• Involved in Extracting, loading Data from RDBMS to hive using Sqoop.
• Involved in writing the Oozie workflow for the data and map reduce code to run. Used Fork
and Join in cases where parallel processing can be done.
• Coded a shell wrapper which will help in triggering jobs from UI.
• Involved in design decision for performing the Hadoop transformation.
• Worked on writing HQL for Sqooping data from Oracle to Hadoop. Wrote an oozie
workflow for moving data to stage and then to live.
• Worked on Sqoop import and export of data.
• Tested raw data, executed performance scripts and also shared responsibility for
administration of Hadoop, Hive and Pig.
• Involved in the design of the unstructured JSON data format and building required Serdes
for the Web services.
• Increasing the performance of the Hadoop cluster by using hashing and salting
methodologies to do load balancing.

• Optimizing the HBase service data retrieval calls native to region and improving range
based scans.
• Highly involved in designing the next generation data architecture for the unstructured
data.
Languages and DB: Hive, Impala, Spark, Oracle and HBase
Software and Tools: HDFS, Hadoop MapReduce, Hive, Pig, Sqoop, Oozie, Cloudera CDH-4, HUE,
Flume Impala, Micro Focus Rumba, SOAP Service, Mule ESB, Jenkins, SVN, PgAdmin-III,IBM data
studio, Teradata and Oracle
State Farm Insurance, Bloomington, IL Oct’ 2013 – Sep’ 2015
Hadoop Developer/ Development Lead
ICP - CDE – DC8 – MFI Base and Enhancements (Oct 2013 – Till date)
This project is intended to transform the existing Billing and Payments application to future state
by storing and processing the data entirely in HDFS. Entire processing in HDFS would be done
through Pig, Hive, Sqoop, HBase, Map reduce programs and also handle performance tuning and
conduct regular backups. MFI also involves migrating the State farm payment plan information to
the ICP platform to improve the user interface response.
• Worked as a Project Manager/Project lead and mentored all the peer on the system
and induced knowledge on the same.
• Understand business needs, analyze functional specifications and map those to mules and
web services of the existing applications to insert/update/retrieve data from No-SQL HBase
• Installed and managed a 4-node 4.8TB Hadoop cluster for SOW and eventually
configured 12-Node 36TB cluster for prod and implementation environment.
• Implemented Hive tables and HQL Queries for the reports.
• Created web services that would interact with HBase Client API to use get/put methods for
different applications.
• Used JSON data type in Hive. Developed Hive queries to analyze reducer output data.
• Developed MapReduce programs to parse the raw data, populate staging tables and store
the refined data in partitioned tables.
• Involved in troubleshooting the issues, errors reported by cluster monitoring software
provided by Cloudera Manager
• Creating simple rule based optimizations like pruning non referenced columns from table
scans.
• Setting Task Status to display debug information and display the status of the map reduce
job on job tracker web page.

• Used Oozie to automate data loading into the Hadoop Distributed File System and PIG to
pre-process the data on the daily catch up run basis.
• Configure a cluster to periodically archive the log files for debugging and reduce the
processing load on the cluster and tune the cluster for better performance.
• Involved in Extracting, loading Data from RDBMS to Hive using Sqoop.
• Tested raw data, executed performance scripts and also shared responsibility for
administration of Hadoop, Hive and Pig.
• Involved in the design of the unstructured JSON data format and building required Serdes
for the Web services.
• Increasing the performance of the Hadoop cluster by using hashing and salting
methodologies to do load balancing.
• Optimizing the HBase service data retrieval calls native to region and improving range
based scans.
• Highly involved in designing the next generation data architecture for the unstructured
data.
•
Languages and DB: DB2, Postgre SQL, MySQL, Expression and HBase
Software and Tools: HDFS, Hadoop MapReduce, Hive, Pig, Sqoop, Oozie, Cloudera CDH-4, HUE,
Flume Impala, Micro Focus Rumba, SOAP Service, Mule ESB, Jenkins, SVN, PgAdmin-III,IBM data
studio and Terdata
ICP - CDE – DC8 – Checkout (Oct 2012 – Oct 2013)
This project is focused on enhancing the customer experience across all products for the billing,
payment and disbursement processes.
• Worked on analyzing Hadoop cluster and different big data analytic tools including Pig,
HBase and Sqoop.
• Responsible for building scalable distributed data solutions using Hadoop.
• Involved in loading data from LINUX file system to HDFS.
• Worked on installing cluster, commissioning & decommissioning of data node, name node
recovery, capacity planning, and slots configuration.
• Created HBase tables to store variable data coming from different portfolios.
Bank of America, Bangalore, India Nov’ 2008 – Sep’ 2012
Teradata Developer
Technical Environment: Teradata V2R12, UNIX Shell Scripting, Teradata SQL
Assistant, TDWM, BTEQ, COBOL,JCL
Description: The main objective of the system is extracting the data from different legacy systems
and loading into mart. Developed the business intelligence system to quickly identify customer
needs and develop better target services using Data stage. And the database is Teradata with
Mainframes a large amount of customer-related data from diverse sources was consolidated,
including customer billing, ordering, and support and service usage. The idea is to build a Decision
Support System for executives.
Key Responsibilities & Achievements:

 Performed major role in understanding the business requirements and designing and loading
data into data warehouse (ETL).
 Working with utilities Like BTEQ, MLOAD, FLOAD etc…
 Collection of data source information from all the legacy systems and existing data stores.
 Imported various Application Sources, created Targets and Transformations using Data stage
Designer (Source analyzer, Warehouse developer, Transformation developer, and Mapping
designer.
 Worked on data modeling using the Visio and Erwin
 Involved in Data Extraction, Transformation and Loading from source systems to ODS.
 Developed complex mappings using multiple sources and targets in different databases.
 Actively participated in the performance of Data stage mappings.
 Worked with production support team for solving the production issues.
 Knowledge on Data stage and Informatica.
 Security administration including creating and maintaining user accounts, passwords, profiles,
roles and access rights
 Tuned various queries by COLLECTING STATISTICS on columns in the WHERE and JOIN
expressions
 Knowledge on Teradata Architecture.
 Knowledge on Star Schema and Snow Flakes Schema.
 Developing BTEQ scripts to load the data from the staging tables to the base tables
 Created BTEQ scripts to extract data from warehouse for downstream
 Performed Unit level of testing as part of development. Also assisted the Testing team for
running SIT/UAT/Pre-Production testing.
IGCAR(Indira Gandhi Centre of Atomic Research, Kalpakkam,Tamilnadu June 2007 – Nov
2008(Internship)
Modelling using Rhapsody:
 Worked as an intern and helped in creating a working model for a system.
 Worked on creating a logical diagram for a complex system for the plant.
 Analysis and design of integration for an instrumentation project using IBM Rational Rhapsody.
 Whereas Spectra CX is built on IBM RationalSoftware Architect (RSA), which provides arich UML
modelling capability, this project integrates Rhapsody as the front-end for UML modelling in the
Spectra CX tool.
 This entails updates to IGCAR’s instrumentation product using the Rhapsody API to generate
Rhapsody profiles in addition to RSA profiles to extend UML for a particular embedded software
domain, especially based on the Software Communications Architecture (SCA) for software-
defined radios.
 Also, plug-in extensions in the Rhapsody tooling push the user’s model for validation and code
generation.

Prashanth Kumar_Hadoop_NEW

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (13)

Similar to Prashanth Kumar_Hadoop_NEW

Similar to Prashanth Kumar_Hadoop_NEW (20)

Prashanth Kumar_Hadoop_NEW