This document contains a professional summary and experience for Nagarjuna Damarla. He has 3+ years of experience in application development using Big Data technologies like Hadoop, Java, and BI tools. Some of his skills and responsibilities include writing Pig and Hive scripts; developing Sqoop scripts to interface between Hadoop and MySQL; and gathering requirements, designing, developing, and testing multiple projects involving Hadoop, Pig, Hive, and other technologies. His experience includes roles as a Hadoop developer and Endeca developer on various projects dealing with log analysis, data processing, and business intelligence reporting.
IBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love ItIBM Analytics
Originally Published on Oct 15, 2014
IBM InfoSphere BigInsights is an industry-standard Hadoop offering that combines the best of open-source software with enterprise-grade features.
- #1 InfoSphere BigInsights is 100% standard, open-source Hadoop
- #2 Big SQL - Lightning fast, ANSI-compliant, native Hadoop formats
- #3 BigSheets - Spreadsheet-like data access for business users
- #4 Big Text - Simplify text analytics and natural language
- #5 Adaptive MapReduce - Fully compatible, four times faster
- #6 In-Hadoop Analytics - Deploy the analytics to the data
- #7 HDFS and POSIX - a more capable enterprise file system
- #8 Big R - Deep R Language integration in Hadoop
- #9 IBM Watson Explorer - Search, explore and visualize all your data
- #10 Accelerators - Get to market faster leveraging pre-written code
To learn more about IBM InfoSphere BigInsights, download the free InfoSphere BigInsights QuickStart Edition from http://ibm.com/hadoop.
IBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love ItIBM Analytics
Originally Published on Oct 15, 2014
IBM InfoSphere BigInsights is an industry-standard Hadoop offering that combines the best of open-source software with enterprise-grade features.
- #1 InfoSphere BigInsights is 100% standard, open-source Hadoop
- #2 Big SQL - Lightning fast, ANSI-compliant, native Hadoop formats
- #3 BigSheets - Spreadsheet-like data access for business users
- #4 Big Text - Simplify text analytics and natural language
- #5 Adaptive MapReduce - Fully compatible, four times faster
- #6 In-Hadoop Analytics - Deploy the analytics to the data
- #7 HDFS and POSIX - a more capable enterprise file system
- #8 Big R - Deep R Language integration in Hadoop
- #9 IBM Watson Explorer - Search, explore and visualize all your data
- #10 Accelerators - Get to market faster leveraging pre-written code
To learn more about IBM InfoSphere BigInsights, download the free InfoSphere BigInsights QuickStart Edition from http://ibm.com/hadoop.
InfoSphere BigInsights - Analytics power for Hadoop - field experienceWilfried Hoge
How to analyze binary data as a technical business user. Use InfoSphere BigInsights to bring analytics on Hadoop closer to a user.
Presented at the OOP conference in Munich, 27.01.2015
To transform your organization and unlock the value of your data, you need a way to ingest, store and analyze every type of data in your organization.
This presentation covers the Data Access Layer of the Hadoop Ecosystem which enables you to achieve this.
We will use the HDP (Hortonworks Data Platform) reference architecture to walk through the Hadoop core and its ecosystem with focus on the data access layer.
We will cover some of the prominent tools of the ecosystem such as Pig, Hive, Sqoop, Flume and Oozie and how they are used for ingesting data into Hadoop from structured, unstructured and streaming sources.
Talk to us at +91 80 6567 9700 or send an email to training@springpeople.com for more information.
Learn about IBM's Hadoop offering called BigInsights. We will look at the new features in version 4 (including a discussion on the Open Data Platform), review a couple of customer examples, talk about the overall offering and differentiators, and then provide a brief demonstration on how to get started quickly by creating a new cloud instance, uploading data, and generating a visualization using the built-in spreadsheet tooling called BigSheets.
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015Abdul Nasir
Hadoop is a quickly budding ecosystem of components based on Google’s MapReduce algorithm and file system work for implementing MapReduce[3] algorithms in a scalable fashion and distributed on commodity hardware. Hadoop enables users to store and process large volumes of data and analyze it in ways not previously possible with SQL-based approaches or less scalable solutions. Remarkable improvements in conventional compute and storage resources help make Hadoop clusters feasible for most organizations. This paper begins with the discussion of Big Data [1][7][9] evolution and the future of Big Data based on Gartner’s Hype Cycle. We have explained how Hadoop Distributed File System (HDFS) works and its architecture with suitable illustration. Hadoop’s MapReduce paradigm for distributing a task across multiple nodes in Hadoop is discussed with sample data sets. The working of MapReduce and HDFS when they are put all together is discussed. Finally the paper ends with a discussion on Big Data Hadoop sample use cases which shows how enterprises can gain a competitive benefit by being early adopters of big data analytics. Hadoop Distributed File System (HDFS) is the core component of Apache Hadoop project. In HDFS, the computation is carried out in the nodes where relevant data is stored. Hadoop also implemented a parallel computational paradigm named as Map-Reduce. In this paper, we have measured the performance of read and write operations in HDFS by considering small and large files. For performance evaluation, we have used a Hadoop cluster with five nodes. The results indicate that HDFS performs well for the files with the size greater than the default block size and performs poorly for the files with the size less than the default block size.
An Introduction to Big data and problems associated with storing and analyzing big data and How Hadoop solves the problem with its HDFS and MapReduce frameworks. A little intro to HDInsight, Hadoop on windows azure.
Real time analytics is a beautiful thing, especially if you can build it in quick, scalable & robust way. We built a digital command center for our marketing team, which provided real time analytics on social media, clickstream and google search term in a span of couple of months. This solution was entirely build on open source technologies, using a combination of Apache Nifi, Elastic search & Hadoop. Simple but very effective. In this presentation i would like to share the architecture, learning and business benefits of this solution.
NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCEcsandit
Big data analysis has become much popular in the present day scenario and the manipulation of
big data has gained the keen attention of researchers in the field of data analytics. Analysis of
big data is currently considered as an integral part of many computational and statistical
departments. As a result, novel approaches in data analysis are evolving on a daily basis.
Thousands of transaction requests are handled and processed everyday by different websites
associated with e-commerce, e-banking, e-shopping carts etc. The network traffic and weblog
analysis comes to play a crucial role in such situations where Hadoop can be suggested as an
efficient solution for processing the Netflow data collected from switches as well as website
access-logs during fixed intervals.
Pouring the Foundation: Data Management in the Energy IndustryDataWorks Summit
At CenterPoint Energy, both structured and unstructured data are continuing to grow at a rapid pace. This growth presents many opportunities to deliver business value and many challenges to control costs. To maximize the value of this data while controlling costs, CenterPoint Energy created a data lake using SAP HANA and Hadoop. During this presentation, CenterPoint will discuss their journey of moving smart meter data to Hadoop, how Hadoop is allowing CenterPoint to derive value from big data and their future use case road map.
This Big Data Hadoop certification program is structured by professionals and experienced course curators to provide you with an in-depth understanding of the Hadoop and Spark Big Data platforms and the frameworks which are used by them. With the help of the Integrated Laboratory session, you will work upon and complete real-world, industry-based projects in this course.
InfoSphere BigInsights - Analytics power for Hadoop - field experienceWilfried Hoge
How to analyze binary data as a technical business user. Use InfoSphere BigInsights to bring analytics on Hadoop closer to a user.
Presented at the OOP conference in Munich, 27.01.2015
To transform your organization and unlock the value of your data, you need a way to ingest, store and analyze every type of data in your organization.
This presentation covers the Data Access Layer of the Hadoop Ecosystem which enables you to achieve this.
We will use the HDP (Hortonworks Data Platform) reference architecture to walk through the Hadoop core and its ecosystem with focus on the data access layer.
We will cover some of the prominent tools of the ecosystem such as Pig, Hive, Sqoop, Flume and Oozie and how they are used for ingesting data into Hadoop from structured, unstructured and streaming sources.
Talk to us at +91 80 6567 9700 or send an email to training@springpeople.com for more information.
Learn about IBM's Hadoop offering called BigInsights. We will look at the new features in version 4 (including a discussion on the Open Data Platform), review a couple of customer examples, talk about the overall offering and differentiators, and then provide a brief demonstration on how to get started quickly by creating a new cloud instance, uploading data, and generating a visualization using the built-in spreadsheet tooling called BigSheets.
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015Abdul Nasir
Hadoop is a quickly budding ecosystem of components based on Google’s MapReduce algorithm and file system work for implementing MapReduce[3] algorithms in a scalable fashion and distributed on commodity hardware. Hadoop enables users to store and process large volumes of data and analyze it in ways not previously possible with SQL-based approaches or less scalable solutions. Remarkable improvements in conventional compute and storage resources help make Hadoop clusters feasible for most organizations. This paper begins with the discussion of Big Data [1][7][9] evolution and the future of Big Data based on Gartner’s Hype Cycle. We have explained how Hadoop Distributed File System (HDFS) works and its architecture with suitable illustration. Hadoop’s MapReduce paradigm for distributing a task across multiple nodes in Hadoop is discussed with sample data sets. The working of MapReduce and HDFS when they are put all together is discussed. Finally the paper ends with a discussion on Big Data Hadoop sample use cases which shows how enterprises can gain a competitive benefit by being early adopters of big data analytics. Hadoop Distributed File System (HDFS) is the core component of Apache Hadoop project. In HDFS, the computation is carried out in the nodes where relevant data is stored. Hadoop also implemented a parallel computational paradigm named as Map-Reduce. In this paper, we have measured the performance of read and write operations in HDFS by considering small and large files. For performance evaluation, we have used a Hadoop cluster with five nodes. The results indicate that HDFS performs well for the files with the size greater than the default block size and performs poorly for the files with the size less than the default block size.
An Introduction to Big data and problems associated with storing and analyzing big data and How Hadoop solves the problem with its HDFS and MapReduce frameworks. A little intro to HDInsight, Hadoop on windows azure.
Real time analytics is a beautiful thing, especially if you can build it in quick, scalable & robust way. We built a digital command center for our marketing team, which provided real time analytics on social media, clickstream and google search term in a span of couple of months. This solution was entirely build on open source technologies, using a combination of Apache Nifi, Elastic search & Hadoop. Simple but very effective. In this presentation i would like to share the architecture, learning and business benefits of this solution.
NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCEcsandit
Big data analysis has become much popular in the present day scenario and the manipulation of
big data has gained the keen attention of researchers in the field of data analytics. Analysis of
big data is currently considered as an integral part of many computational and statistical
departments. As a result, novel approaches in data analysis are evolving on a daily basis.
Thousands of transaction requests are handled and processed everyday by different websites
associated with e-commerce, e-banking, e-shopping carts etc. The network traffic and weblog
analysis comes to play a crucial role in such situations where Hadoop can be suggested as an
efficient solution for processing the Netflow data collected from switches as well as website
access-logs during fixed intervals.
Pouring the Foundation: Data Management in the Energy IndustryDataWorks Summit
At CenterPoint Energy, both structured and unstructured data are continuing to grow at a rapid pace. This growth presents many opportunities to deliver business value and many challenges to control costs. To maximize the value of this data while controlling costs, CenterPoint Energy created a data lake using SAP HANA and Hadoop. During this presentation, CenterPoint will discuss their journey of moving smart meter data to Hadoop, how Hadoop is allowing CenterPoint to derive value from big data and their future use case road map.
This Big Data Hadoop certification program is structured by professionals and experienced course curators to provide you with an in-depth understanding of the Hadoop and Spark Big Data platforms and the frameworks which are used by them. With the help of the Integrated Laboratory session, you will work upon and complete real-world, industry-based projects in this course.
1. Nagarjuna Damarla
Phone: +91-9941618664 e-mail: nagarjuna.bca@gmail.com
Professional Summary:
3+ years of overall IT experience in Application Development using Big Data Hadoop, Java and
BI technologies.
Proficient working experience in Hadoop components like HDFS, Map Reduce, Hive, Pig,
HBase, Sqoop and Flume.
Good designing skills in writing Map Reduce programs.
Involved in writing Pig and Hive scripts to reduce the job execution time.
Able to handle export and import data to other database through Sqoop.
Developed API to interact with MySQL data using java swings.
Good communication, interpersonal, analytical skills, and strong ability to perform as part of
team.
Interested in learning new concepts to keep updated in technology trends.
Smart working and enthusiastic.
Knowledge on FLUME and No-SQL Databases like Mongo DB.
Got appreciations from clients and got Q1 Quarterly Award – 2015 for my contribution in
project.
Professional Experience:
Programmer Analyst in Cognizant Technology Solutions, Hyderabad, India since 2013.
Education:
Master of Computer Applications, Osmania University.
Technical Skills:
Skills BIG DATA MapReduce, Pig, Sqoop, Pig, Hive, Hbase, JAVA/J2EE
Technologies, Swings, JDBC, OJDBC
Frameworks Hadoop
Java IDEs Eclipse and NetBeans
Databases SQL, Mysql, MongoDB
Operating Systems Windows XP, 2000, 2003, Unix and Linux
Project Details:
Project #1
Title : Target Re-hosting of WebIntelligence Project
Environment : Hadoop , Apache Pig, Hive, SQOOP,Java , Unix , PHP , MySQL
Role : Hadoop Developer.
Hardware : Virtual Machines, UNIX.
Duration : March 2015 to Till Date
2. Description:
The purpose of the project is to store terabytes of log information generated by the ecommerce
website and extract meaningful information out of it. The solution is based on the open source BigData
s/w Hadoop .The data will be stored in Hadoop file system and processed using Map/Reduce jobs. Which
intern includes getting the raw html data from the websites ,Process the html to obtain product and pricing
information,Extract various reports out of the product pricing information and Export the information for
further processing.
This project is mainly for the re-platforming of the current existing system which is running on
WebHarvest a third party JAR and in MySQL DB to a new cloud solution technology called Hadoop which
can able to process large date sets (i.e. Tera bytes and Peta bytes of data) in order to meet the client
requirements with the incresing competion from his retailers.
Contributions:
1. Moved all crawl data flat files generated from various retailers to HDFS for further processing.
2. Written the Apache PIG scripts to process the HDFS data.
3. Created Hive tables to store the processed results in a tabular format.
4. Developed the sqoop scripts inorder to make the intraction between Pig and MySQL Database.
5. Involved in gathering the requirements, designing, development and testing
6. Writing CLI commands using HDFS.
7. Completely involved in Hadoop, hive, pig and mysql installation setup.
8. Analyzed log file to understand the user behavior.
9. Unit testing of MapReduce and Pig scripts.
Project #2
Title : Device Fault Predection
Environment : Hadoop , MapReduce, Hive, Sqoop, pig.
Role : Hadoop Developer.
Hardware : Virtual Machines, UNIX.
Duration : Jan 2014 – Feb 2015
Description:
Cisco’s support team on a day-to-day basis deals with huge volumes of issues related to their network
products like routers, switches etc. The support teams have been operating on a reactive model i.e.
based on the customer tickets/queries being raised. Hence, to improve customer satisfaction, they would
like the system to predict network faults based on the logs being generated by various network devices
i.e. by loading them into Hadoop cluster and analyzing them using some of the machine learning
algorithms implemented in Apache Mahout or custom built.
Responsibilities:
1. Moved all log files generated by various network devices into hdfs location
2. Written MapReduce code that will take input as log files and parse the logs and structure them in
tabular format to facilitate effective querying on the log data.
3. Created External Hive Table on top of parsed data
4. .Developed the sqoop scripts inorder to make the intraction between Pig and MySQL Database.
3. 5. Involved in gathering the requirements, designing, development and testing
6. Analyzed log file to understand the user behavior.
7. Unit testing of MapReduce and Pig scripts.
Project #3
Title : Endeca iPlus 3.1
Environment : Endeca 3.1, SQL developer
Role : Endeca 3.1 developer
Hardware : Virtual Machines, UNIX.
Duration : Mar 2013– Dec 2013
Description:
Deliver the best Incentive System, to meet the needs of Customers and Dealers, provide a state of the art
system that will allow to be more efficient, flexible and capture increased sales, market share and profit ,
the existing SIMS R2.2 Business Processes have been modified, new functionalities and new reports
have been added. Business Intelligence has been introduced to provide better reporting solutions through
Endeca. Following are the key changes that have been implemented in ISYS.
Responsibilities:
1. Create Endeca pages with respective components like charts, results table, crosstabs.
2. Fine Tuning of queries, for the better performance.
3. Configure the instances according to the business rules and filter conditions.
4. Involving in Requirement gathering analysis.
5. Validate 3.1 endeca reports against 2.2.1 reports.
6. Prepared unit test cases for the reports.