1. Big Data Project Experience:
Industry: Manufacturing Project: Panera,LLC
Company: CenturyLink Technology, Noida,IN Duration: April 2016 – Present ( 7 Months)
Designation: Consultant Role:Big Data Developer
Project Description: Panera, LLC is American chain of bakery-café fask casual restaurants in
United States and Canada. CenturyLink have SOW with Panera, LLC for Capacity Planning and
Production setup. Client required Identification of methodology for tying the online business
work load at an order level to the actual utilization of the IT infrastructure and building of
sample/representative dashboard/s depicting measures defining the IT resource utilization per
order.
My responsibilities are to develop and test ETL jobs with Spark Scala (previously
python) to speed up parsing distributed unstructured data from different sources with Flume-
Kafka and to create regression data modeling like Random forest Gradient-Boosted Trees on
LIBSVM data files and to fix spark environment issues.
Responsibilities/Deliverables:
Developed Spark ETL jobs to parse huge amount of unstructured data.
Developed Spark MLLIB jobs to create regression data model on structured data
Worked in IntelliJ Idea and SBT Build tool.
Developing UI for visualization of reports in D3 JS and Zoomdata.
Software Development and Automations for applications and system monitoring.
Working on Cloudera distribution of Hadoop (CDH).
Exposure to data manipulation with Hive queries
Exposure to schedule jobs in Oozie.
Exposure to create detailed document design of project.
Secure data by using apache Sentry authorization.
Industry: Telecom Project: CTL-Cloudera Big Data As Service
Company: CenturyLink Technology, Noida, IN Duration: September 2016 to Present (2 Months)
Designation: Consultant Role: Big Data Developer
Project Description:
Press Report : j.mp/2cDr5nO
My responsibilities are to develop Automation API framework in Java/Python which will
setup and manage clusters with all services up and running automatically.
Responsibilities/Deliverables:
Developed Automation API to deploy clusters on Cloudera manager Rest API.
Developed Structured Cluster templates for automation.
Software Development and Automations for applications and Ganglia system
monitoring.
2. Industry: Telecom Project: CTL Data Lake – PD
Company: CenturyLink Technology, Noida, IN Duration: April 2016 – June 2016 (3 Months)
Designation: Consultant Role:Big Data Developer
Project Description: CTL Data Lake is CenturyLink internal project for creating application
for comprehensive data access and management and then applies data analytics on scalable
data.
My responsibilities were to develop and test REST interface for data pipeline which take
data from customer and dumps to Kafka topic as well parse with Spark Streaming and stored to
HBase table and HDFS.
Responsibilities/Deliverables:
Developed data pipeline with REST Java API which passes Kafka and HBase as
consumer.
Developed flume integration with Kafka.
Worked Eclipse Mars with Maven build tool.
Developed Spark streaming API integrated with Kafka.
Exposed to real time streaming jobs.
Industry: Telecom Project: AT&T Insights
Company: Amdocs, Gurgaon, IN Duration: November 2014 to March 2016 (1 Year 5 Months)
Designation: Software Engineer Role: Big Data Developer
Project Description:
AT&T is the second largest provider of mobile telephone services and the
largest provider of fixed telephone services in the United States and also
provides broadband subscription television services through DirecTV.
AT&T Insights is a module in CRM application in Amdocs.
My Responsibilities in Insight project: Data Ingestion to HBase from structured and
unstructured data source and development of Insights Spark API Development for reading data
from HBase Storage with Kafka producer for providing fast data access to multiple applications
like u-verse, CRM, testing application at same time.
Responsibilities/Deliverables:
Developed Spark framework development in scala and HBase as storage.
Developed flume integration with Kafka to loading unstructured data.
Developed HiveQL for analysis on Huge Telecom data.
Developed MapReduce jobs and UDFs with core-java.
Automatic Data ingestion platform for migration of data from Oracle and H-Base.
Worked in distributed Gigaspaces Grid clusters for Insight Application.
Exposure with real time stream jobs and batch jobs.
Software Development and Automations for applications and system monitoring.
Developed modules for database connections and for structured programming.
3. Experience with both Hortanworks and Cloudera distribution of Hadoop (CDH).
Developed log analysis and real time monitoring tools for production application.
Exposure to ETL jobs creation, flow diagrams, jobs scheduling in DAG.
Visualization of reports to client using Tableau.
Exposure to Ganglia, Kerberos, Hadoop metrics.