The document provides a professional summary and details for Madhusudhn Reddy.Gujja including 3 years of experience in big data tools like Hadoop, Hive, Pig and Spark. He has extensive experience developing Pig Latin scripts, writing MapReduce programs in Java, and loading/transforming large datasets. He is proficient in technologies such as HDFS, HBase, Kafka, Flume, Impala and has worked on projects involving data analytics, ETL processes and clustering Hadoop.
1. Madhusudhn Reddy.Gujja Mail:madhuspark66@gmail.com
Mobile No:9550833742
Professional Summary:
3 years in Big Data Ecosystem experience in storage, querying, processing and analysis of Big Data.
Hands on experience with Hadoop Ecosystem, experienced in using Java Map Reduce(MR1/MR2), Hive, HBase,
PIG, Sqoop,Flume,Impala and Spark.
Very good understanding of Bigadata tools like Storm and Kafka.
Extensive experience in developing Pig Latin Scripts and Hive Query Language for data analytics.
Written Sqoop queries to import and export data between HDFS, hive, HBase and Relational Database
Management.
Experience and knowledge of NOSQL databases like Mongo DB, HBase and Cassandra.
Experience and Knowledge on My SQL Server and Oracle.
Extensively used Flume in collecting logs data from various sources and integrated in to HDFS.
Expertise in loading and transforming of large sets of semi structured and unstructured data using Pig Latin
operations.
Managing and scheduling batch Jobs on a Hadoop Cluster using Oozie.
Hadoop cluster setup, installation, capacity planning and administration experience of multi-node cluster
using Cloudera (CDH4.X & CDH5) and Horton works (HDP1.3 & HDP2) for Apache Hadoop
Having good experience in cloud based Hadoop platforms like AWS, EC2 and EMR deploying managing and
terminated the cluster with various services included.
Developed multiple Kafka Producers and Consumers from scratch implementing organization’s requirements
Collectingand aggregating large amounts of log data using Apache Flume and staging data in HDFS for further
analysis.
Have hands on experience in writing MapReduce code in Java.
Hands on experience in application development using Java, RDBMS and Linux Shell Scripting.
Hands on experience in developing Strategies for Extraction, Transformation and Loading (ETL) mechanism
using Informatica PowerCenter8.
Worked extensively in Teradata and Teradata Load Utilities.
Worked on Agile Methodology (Scrum).
Flexible and ready to take on new challenges.
Wide range of IT industry experience with Health, Banking and care / Media & Entertainment with strong
analytical, problem-solving, Organizational, communication, learning and team skills.
Technical Skills:
Big Data Ecosystems HDFS,MapReduce,Hive,Pig,Sqoop,HBase,Flume,Kafka,Spark,Storm ,Impala and Oozie
Languages Core Java, Python ,Scala,C#.Net,VB.Net
ETL Tool Informatica Power Center 8.x
No Sql Data Base MongoDB,Cassandra
Data Bases Oracle,TeraData,My SQL
Educational Experience:
Master of Computer Application (MCA)
Bachelor of Science (B.Sc.)
Professional Experience:
Worked as a Tech Lead for Wipro Ltd from July 2015 to March 2016.
Worked as a Senior Software Engineer for Mahindra Satyam(Tech Mahindra) from November 2006 to
June 2015.
Project Details
2. Madhusudhn Reddy.Gujja Mail:madhuspark66@gmail.com
Mobile No:9550833742
Project Name : Royalty Processing
Environment : HDP, Hadoop, Java, Linux, Hive, Pig
:
Description:
Royalty project is to work on royalty data,the royalty data clients receives from all its vendors,Different
files in different formats, usingbusiness caseof validity of records files needs to be converted into JSON
format. These files aretotally transactional data,which help business user to understand the revenue
trends & to create the market strategy for future.
Responsibilities:
Analyzed the scope of the project.
Responsiblefor end to end development for the client.
Developed multipleMap Reduce code to convert fixed width, csv records,tab delimiter into JSON.
Developed Pig scripts to implement ETL transformations includingCleaning,load and extract.
Customized Hadoop input format.
Written Hive queries for data analysisto meet the business requirements.
Experienced with optimizingtechniques to get better performance from Hive queries.
Developed Hive UDFs to incorporateexternal business logic into pigscripts.
Oozie workflow engine to run multipleMR jobs.
Project Name : Incites 2.0
Client : Thomson Reuters
Environment : Hadoop, Spark, Scala,Kafka
:
Description:Incites 2.0 is web based research evaluation application thatcan be used to analyses institutional
productivity,Compare and benchmark researches work againsttheir peers worldwide. Itis a complete re-
architectureof Incites 1.0 analyticsplatformgeared towards new and better portal with customizabledashboards,
Adoc report Generation, high relevancy query result, filteringand much more. Incites datasetis builtusing
bibliography and citation information fromWoS.
Responsiblefor end to end development for the client.
Reporting based on filtered attributes
Reporting based on indicators
Custom reports
Search and filter(people,Journal,Organization,Profile)
Ranking,sorting,filteringon Attributes.
Project Name : Big Data & Analytics
Client : Warner Music Group
Environment : HDP, Hadoop, Hive, sqoop, Cassandra,Java
:
Description:
Warner Music Group (WMG is an American major global record company headquartered in New York City. The
largestAmerican-owned music conglomerate worldwide, itis one of the 'bigthree' recordingcompanies (the three
largestin the global music industry)Projectincludes analyticaround Social Media sentiments,Customer
Profitability,Customer Retention, Artist popularity index,Brand Effectiveness & Consumer accepta nceon “Record
Labels” launch.
Responsibilities:
3. Madhusudhn Reddy.Gujja Mail:madhuspark66@gmail.com
Mobile No:9550833742
Responsiblefor analysing and understandingof data Sources likeiTunes, Spotify and census data.
Responsiblefor end to end development for the client.
Extensively worked on creating combiners,Partitioning,Distributed cacheto improve the performance of
Map Reduce jobs.
Load and transform largesets of structured, semi structured usingHive.
Responsibleto manage data coming from different sources.
Creating Hive tables,dynamic partitions,buckets for samplingand worked on them usingHive QL
Experienced with optimizingtechniques to get better performance from Hive queries
Importing and exporting data into HDFS and HIVE tables usingSqoop.
Created the customized hive UDFs.
UsingHive tables and Hive SerDe’s to store data in tabular format.
Collectingand aggregating largeamounts of log data usingApache Flume and stagingdata in HDFS for
further analysis.
Have developed shell scripts for all processto be automated.
Project Name : Artist Dashboard
Client : Warner Music Group
Environment : Sqoop,Hive, MR,Cassandra
Description:
Warner Music Group (WMG is an American major global record company headquartered in New York City. The
largestAmerican-owned music conglomerate worldwide, itis one of the 'bigthree' recordingcompanies (the three
largestin the global music industry)
This Projectis to build a dash board which is intended to do analytics on top of the Time series Data of various
artists of music industry and represent them in Graphs and Charts. The required Time series data is been
generated from the Parcel DB and stored in JSON format and finally loaded to Cassandra,API has builton
Cassandradata which is used by dash board to fetch the data.
Responsibilities:
Involved in orchestration of delta generation for time series data.
Troubleshoot failures in data loadsatmultiplestages.
Monitor and Maintained Cassandracluster and data load jobs.
Performed thorough Validationsafter the data loads with Artistdash board.
Involved with multipleteams to meet the deliverables in terms of data ingestion.
Involved in spacemanagement to not to hamper the delta generation.
Involved in pre check and post data load validations.
Developed Standard operating procedure in terms of data ingestion/loads.
Responsibleto load the Time series data from Hadoop to Cassandrain respectiveColumn families.
Responsiblefor validation of data in Cassandraas well as Frontend.
Project Name : Swiss Re Business Applications
Client : Swiss ReinsuranceCompany
Environment : Informatica8.x,Oracle,Unix
Description:
Swiss Re is a diversereinsuranceand commercial insurancebusinessentity.The company's world headquarters is
located in Overland Park,Kansas,USA. Business Application Supportgroup provides supportto the following
applications:
STAT
TRAC
4. Madhusudhn Reddy.Gujja Mail:madhuspark66@gmail.com
Mobile No:9550833742
FINANCE & HR
STAT is the data store for reporting externally and internally. Itcontains detailed information,which relates to the
technical balancefrom InsuranceTransactions.TRAC is an operational system,which supports underwriting,
claims,accountingand retrocession.FINANCE & HR application supportdeals with three sub systems ABC
(Accounting, Budget and Cost), CDS (Check DisbursementSystem) and Oraclebased HR system.
Responsibilities:
Creating the Workflows in Informatica as per the requests from customer.
Testing the workflows and providingthe valid test data to users as per requirement.
After getting sign-off,responsiblefor movingthem to the production environment.
Providingsupportto the existingworkflows in Production by debugging them whenever they arefailed
and resolvingthe issueatthe earliest.
Project Name : PRDCOPY
Client : Glaxo Smith Kline
Environment : Informatica,TeraData,Unix
Description:
This data pull is usinginformatica thatuses Teradata fastexport to pull the data from Prod and Teradata fast
load to push data to VAL. The Dimensions’and facts data is moved from Prod to Val.This data pull from source
to stage with transformation.The stage data is moved to target area usingTeradata Utilities.
Responsibilities:
Developing the mapping, sessions,work lets as per the mappingspecs.
Usingdifferent transformation likeSourceQualifier,expression,look-up,joiner,sorter,update strategy,
filter,router etc.
Creating the workflows with work lets as per the technical specs.
Extensively used various Performancetuning Techniques to improve the Mapping and session
Performance.
Accordingto the doc’s created and manipulated BTEQ, FASTLOAD Scripts.
Implementing in performance tuning at various levels.
Involved in schedulingthe ETL Jobs usingIPM.
Created unittest cases and implemented unittest results.
Project Name : Barclays Managed Services
Client : Barclays Bank
Environment : Teradata, UNIX
Description
As a part of the UK Retail Business BankingDevelopment team, I used to take careof more than 250+
applications.This is a Development cum supportwork where we need to ensure that the transaction thathas
been done duringthe bank hours in UK were loaded into appropriatetables and the data should be available
to the users within the next bank hour starts.
Responsibilities
5. Madhusudhn Reddy.Gujja Mail:madhuspark66@gmail.com
Mobile No:9550833742
Accordingto the doc’s created and manipulated BTEQ, FASTLOAD Scripts.
Execution of the scripts in development environment.
Performing Code Review as per Barclay’s codestandards.
Project Name : Faster Compensation Pay-out
Client : Medi Bank
Environment : Java,JavaScript,HTML, CSS, JDK 1.5.1,
JDBC, Oracle10g,XML, and UML
Description:The objectiveof this projectis to design a system to keep track of employee data such as personal
information,title, workinghours, salary,department info,etc. The system allows management to keep track of the
employees and optimize the usageof their skills.
Responsibilities
Involved in Design, Development and Support phases of Software Development Life Cycle(SDLC)
Reviewed the functional,design,sourcecode and test specifications.
Involved in developing the complete front end development using Java Scriptand CSS.
Author for Functional,Design and Test Specifications.
Implemented Backend, Configuration DAO, XML generation modules of DIS.
Analyzed, designed and developed the component.
Used JDBC for databaseaccess.