SlideShare a Scribd company logo
1 of 3
Big Data Project Experience:
Industry: Manufacturing Project: Panera,LLC
Company: CenturyLink Technology, Noida,IN Duration: April 2016 – Present ( 7 Months)
Designation: Consultant Role:Big Data Developer
Project Description: Panera, LLC is American chain of bakery-café fask casual restaurants in
United States and Canada. CenturyLink have SOW with Panera, LLC for Capacity Planning and
Production setup. Client required Identification of methodology for tying the online business
work load at an order level to the actual utilization of the IT infrastructure and building of
sample/representative dashboard/s depicting measures defining the IT resource utilization per
order.
My responsibilities are to develop and test ETL jobs with Spark Scala (previously
python) to speed up parsing distributed unstructured data from different sources with Flume-
Kafka and to create regression data modeling like Random forest Gradient-Boosted Trees on
LIBSVM data files and to fix spark environment issues.
Responsibilities/Deliverables:
 Developed Spark ETL jobs to parse huge amount of unstructured data.
 Developed Spark MLLIB jobs to create regression data model on structured data
 Worked in IntelliJ Idea and SBT Build tool.
 Developing UI for visualization of reports in D3 JS and Zoomdata.
 Software Development and Automations for applications and system monitoring.
 Working on Cloudera distribution of Hadoop (CDH).
 Exposure to data manipulation with Hive queries
 Exposure to schedule jobs in Oozie.
 Exposure to create detailed document design of project.
 Secure data by using apache Sentry authorization.
Industry: Telecom Project: CTL-Cloudera Big Data As Service
Company: CenturyLink Technology, Noida, IN Duration: September 2016 to Present (2 Months)
Designation: Consultant Role: Big Data Developer
Project Description:
Press Report : j.mp/2cDr5nO
My responsibilities are to develop Automation API framework in Java/Python which will
setup and manage clusters with all services up and running automatically.
Responsibilities/Deliverables:
 Developed Automation API to deploy clusters on Cloudera manager Rest API.
 Developed Structured Cluster templates for automation.
 Software Development and Automations for applications and Ganglia system
monitoring.
Industry: Telecom Project: CTL Data Lake – PD
Company: CenturyLink Technology, Noida, IN Duration: April 2016 – June 2016 (3 Months)
Designation: Consultant Role:Big Data Developer
Project Description: CTL Data Lake is CenturyLink internal project for creating application
for comprehensive data access and management and then applies data analytics on scalable
data.
My responsibilities were to develop and test REST interface for data pipeline which take
data from customer and dumps to Kafka topic as well parse with Spark Streaming and stored to
HBase table and HDFS.
Responsibilities/Deliverables:
 Developed data pipeline with REST Java API which passes Kafka and HBase as
consumer.
 Developed flume integration with Kafka.
 Worked Eclipse Mars with Maven build tool.
 Developed Spark streaming API integrated with Kafka.
 Exposed to real time streaming jobs.
Industry: Telecom Project: AT&T Insights
Company: Amdocs, Gurgaon, IN Duration: November 2014 to March 2016 (1 Year 5 Months)
Designation: Software Engineer Role: Big Data Developer
Project Description:
AT&T is the second largest provider of mobile telephone services and the
largest provider of fixed telephone services in the United States and also
provides broadband subscription television services through DirecTV.
AT&T Insights is a module in CRM application in Amdocs.
My Responsibilities in Insight project: Data Ingestion to HBase from structured and
unstructured data source and development of Insights Spark API Development for reading data
from HBase Storage with Kafka producer for providing fast data access to multiple applications
like u-verse, CRM, testing application at same time.
Responsibilities/Deliverables:
 Developed Spark framework development in scala and HBase as storage.
 Developed flume integration with Kafka to loading unstructured data.
 Developed HiveQL for analysis on Huge Telecom data.
 Developed MapReduce jobs and UDFs with core-java.
 Automatic Data ingestion platform for migration of data from Oracle and H-Base.
 Worked in distributed Gigaspaces Grid clusters for Insight Application.
 Exposure with real time stream jobs and batch jobs.
 Software Development and Automations for applications and system monitoring.
 Developed modules for database connections and for structured programming.
 Experience with both Hortanworks and Cloudera distribution of Hadoop (CDH).
 Developed log analysis and real time monitoring tools for production application.
 Exposure to ETL jobs creation, flow diagrams, jobs scheduling in DAG.
 Visualization of reports to client using Tableau.
 Exposure to Ganglia, Kerberos, Hadoop metrics.

More Related Content

What's hot

Yoann Clombe : Fail fast, iterate quickly with power bi and google analytics
Yoann Clombe : Fail fast, iterate quickly with power bi and google analyticsYoann Clombe : Fail fast, iterate quickly with power bi and google analytics
Yoann Clombe : Fail fast, iterate quickly with power bi and google analyticsMSDEVMTL
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big dataSigmoid
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jFred Madrid
 
Odp - On demand profiler (ICPE 2018)
Odp - On demand profiler (ICPE 2018)Odp - On demand profiler (ICPE 2018)
Odp - On demand profiler (ICPE 2018)Tao Feng
 
Airbyte - Seed deck
Airbyte  - Seed deckAirbyte  - Seed deck
Airbyte - Seed deckAirbyte
 
MLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at ScaleMLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at ScaleDatabricks
 
Multi runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningMulti runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningStepan Pushkarev
 
Deep Learning for Recommender Systems with Nick pentreath
Deep Learning for Recommender Systems with Nick pentreathDeep Learning for Recommender Systems with Nick pentreath
Deep Learning for Recommender Systems with Nick pentreathDatabricks
 
Bridging the Gap Between Datasets and DataFrames
Bridging the Gap Between Datasets and DataFramesBridging the Gap Between Datasets and DataFrames
Bridging the Gap Between Datasets and DataFramesDatabricks
 
Ai platform at scale
Ai platform at scaleAi platform at scale
Ai platform at scaleHenry Saputra
 
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJAEvaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJADataWorks Summit
 
Detecting Mobile Malware with Apache Spark with David Pryce
Detecting Mobile Malware with Apache Spark with David PryceDetecting Mobile Malware with Apache Spark with David Pryce
Detecting Mobile Malware with Apache Spark with David PryceDatabricks
 
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARNHadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARNJosh Patterson
 
Serverless machine learning operations
Serverless machine learning operationsServerless machine learning operations
Serverless machine learning operationsStepan Pushkarev
 
PoolParty - Metadata Management made easy
PoolParty - Metadata Management made easyPoolParty - Metadata Management made easy
PoolParty - Metadata Management made easyMartin Kaltenböck
 
Deep Learning for Large-Scale Online Fraud Detection—Fighting Fraudsters Amon...
Deep Learning for Large-Scale Online Fraud Detection—Fighting Fraudsters Amon...Deep Learning for Large-Scale Online Fraud Detection—Fighting Fraudsters Amon...
Deep Learning for Large-Scale Online Fraud Detection—Fighting Fraudsters Amon...Databricks
 

What's hot (20)

Yoann Clombe : Fail fast, iterate quickly with power bi and google analytics
Yoann Clombe : Fail fast, iterate quickly with power bi and google analyticsYoann Clombe : Fail fast, iterate quickly with power bi and google analytics
Yoann Clombe : Fail fast, iterate quickly with power bi and google analytics
 
Introduction To Pentaho
Introduction To PentahoIntroduction To Pentaho
Introduction To Pentaho
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big data
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
 
Odp - On demand profiler (ICPE 2018)
Odp - On demand profiler (ICPE 2018)Odp - On demand profiler (ICPE 2018)
Odp - On demand profiler (ICPE 2018)
 
Airbyte - Seed deck
Airbyte  - Seed deckAirbyte  - Seed deck
Airbyte - Seed deck
 
MLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at ScaleMLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at Scale
 
Spark ML Pipeline serving
Spark ML Pipeline servingSpark ML Pipeline serving
Spark ML Pipeline serving
 
Multi runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningMulti runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learning
 
Deep Learning for Recommender Systems with Nick pentreath
Deep Learning for Recommender Systems with Nick pentreathDeep Learning for Recommender Systems with Nick pentreath
Deep Learning for Recommender Systems with Nick pentreath
 
Bridging the Gap Between Datasets and DataFrames
Bridging the Gap Between Datasets and DataFramesBridging the Gap Between Datasets and DataFrames
Bridging the Gap Between Datasets and DataFrames
 
Ai platform at scale
Ai platform at scaleAi platform at scale
Ai platform at scale
 
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJAEvaluation of TPC-H on Spark and Spark SQL in ALOJA
Evaluation of TPC-H on Spark and Spark SQL in ALOJA
 
Detecting Mobile Malware with Apache Spark with David Pryce
Detecting Mobile Malware with Apache Spark with David PryceDetecting Mobile Malware with Apache Spark with David Pryce
Detecting Mobile Malware with Apache Spark with David Pryce
 
Big data cv
Big data cvBig data cv
Big data cv
 
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARNHadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
 
Serverless machine learning operations
Serverless machine learning operationsServerless machine learning operations
Serverless machine learning operations
 
PoolParty - Metadata Management made easy
PoolParty - Metadata Management made easyPoolParty - Metadata Management made easy
PoolParty - Metadata Management made easy
 
Newest mmis resume
Newest mmis  resumeNewest mmis  resume
Newest mmis resume
 
Deep Learning for Large-Scale Online Fraud Detection—Fighting Fraudsters Amon...
Deep Learning for Large-Scale Online Fraud Detection—Fighting Fraudsters Amon...Deep Learning for Large-Scale Online Fraud Detection—Fighting Fraudsters Amon...
Deep Learning for Large-Scale Online Fraud Detection—Fighting Fraudsters Amon...
 

Viewers also liked

Manikyam_Hadoop_5+Years
Manikyam_Hadoop_5+YearsManikyam_Hadoop_5+Years
Manikyam_Hadoop_5+YearsManikyam M
 
Consumer behaviour (1)
Consumer behaviour (1)Consumer behaviour (1)
Consumer behaviour (1)smumbahelp
 
Fisiologi asuhan kebidanan
Fisiologi asuhan kebidananFisiologi asuhan kebidanan
Fisiologi asuhan kebidananGoogle
 
Dubai Chamber - Strategic Platforms for Business Growth
Dubai Chamber - Strategic Platforms for Business GrowthDubai Chamber - Strategic Platforms for Business Growth
Dubai Chamber - Strategic Platforms for Business GrowthDubaiChamber
 

Viewers also liked (11)

Manikyam_Hadoop_5+Years
Manikyam_Hadoop_5+YearsManikyam_Hadoop_5+Years
Manikyam_Hadoop_5+Years
 
SME Profile: Tourism Industries in Canada (March 2015)
SME Profile: Tourism Industries in Canada (March 2015)SME Profile: Tourism Industries in Canada (March 2015)
SME Profile: Tourism Industries in Canada (March 2015)
 
Periodo Prehispánico de Panamá
Periodo Prehispánico de PanamáPeriodo Prehispánico de Panamá
Periodo Prehispánico de Panamá
 
Practica 3 ptmpbl
Practica 3 ptmpblPractica 3 ptmpbl
Practica 3 ptmpbl
 
Civil Engineer
Civil EngineerCivil Engineer
Civil Engineer
 
Consumer behaviour (1)
Consumer behaviour (1)Consumer behaviour (1)
Consumer behaviour (1)
 
2013-2014 Parent Handbook
2013-2014 Parent Handbook2013-2014 Parent Handbook
2013-2014 Parent Handbook
 
康治本傷寒論
康治本傷寒論康治本傷寒論
康治本傷寒論
 
Fisiologi asuhan kebidanan
Fisiologi asuhan kebidananFisiologi asuhan kebidanan
Fisiologi asuhan kebidanan
 
Dubai Chamber - Strategic Platforms for Business Growth
Dubai Chamber - Strategic Platforms for Business GrowthDubai Chamber - Strategic Platforms for Business Growth
Dubai Chamber - Strategic Platforms for Business Growth
 
Gas perfecto -- Estructura de la Materia y Ondas
Gas perfecto -- Estructura de la Materia y Ondas Gas perfecto -- Estructura de la Materia y Ondas
Gas perfecto -- Estructura de la Materia y Ondas
 

Similar to Sudhanshu'sProjects

Similar to Sudhanshu'sProjects (20)

Supreet Resume
Supreet ResumeSupreet Resume
Supreet Resume
 
Nagarjuna_Damarla_Resume
Nagarjuna_Damarla_ResumeNagarjuna_Damarla_Resume
Nagarjuna_Damarla_Resume
 
Resume (1)
Resume (1)Resume (1)
Resume (1)
 
Resume (1)
Resume (1)Resume (1)
Resume (1)
 
ChandanResume
ChandanResumeChandanResume
ChandanResume
 
Yasar resume 2
Yasar resume 2Yasar resume 2
Yasar resume 2
 
Nagarjuna_Damarla
Nagarjuna_DamarlaNagarjuna_Damarla
Nagarjuna_Damarla
 
Abhishek jaiswal
Abhishek jaiswalAbhishek jaiswal
Abhishek jaiswal
 
Spark and machine learning in microservices architecture
Spark and machine learning in microservices architectureSpark and machine learning in microservices architecture
Spark and machine learning in microservices architecture
 
Charles harper Resume
Charles harper ResumeCharles harper Resume
Charles harper Resume
 
AnilKumarT_Resume_latest
AnilKumarT_Resume_latestAnilKumarT_Resume_latest
AnilKumarT_Resume_latest
 
Rajeev Tiwari Projects Xavient
Rajeev Tiwari Projects XavientRajeev Tiwari Projects Xavient
Rajeev Tiwari Projects Xavient
 
Aamod_Chandra
Aamod_ChandraAamod_Chandra
Aamod_Chandra
 
William-Timpany-2016-03-09-v4-Resume
William-Timpany-2016-03-09-v4-ResumeWilliam-Timpany-2016-03-09-v4-Resume
William-Timpany-2016-03-09-v4-Resume
 
Informatica_Rajesh-CV 28_03_16
Informatica_Rajesh-CV 28_03_16Informatica_Rajesh-CV 28_03_16
Informatica_Rajesh-CV 28_03_16
 
Jeevan_Resume
Jeevan_ResumeJeevan_Resume
Jeevan_Resume
 
Updated SAKET MRINAL Resume
Updated SAKET MRINAL ResumeUpdated SAKET MRINAL Resume
Updated SAKET MRINAL Resume
 
Resume
ResumeResume
Resume
 
Mohit Kalra 25th August
Mohit Kalra 25th AugustMohit Kalra 25th August
Mohit Kalra 25th August
 
Mallikharjun_Vemana
Mallikharjun_VemanaMallikharjun_Vemana
Mallikharjun_Vemana
 

Sudhanshu'sProjects

  • 1. Big Data Project Experience: Industry: Manufacturing Project: Panera,LLC Company: CenturyLink Technology, Noida,IN Duration: April 2016 – Present ( 7 Months) Designation: Consultant Role:Big Data Developer Project Description: Panera, LLC is American chain of bakery-café fask casual restaurants in United States and Canada. CenturyLink have SOW with Panera, LLC for Capacity Planning and Production setup. Client required Identification of methodology for tying the online business work load at an order level to the actual utilization of the IT infrastructure and building of sample/representative dashboard/s depicting measures defining the IT resource utilization per order. My responsibilities are to develop and test ETL jobs with Spark Scala (previously python) to speed up parsing distributed unstructured data from different sources with Flume- Kafka and to create regression data modeling like Random forest Gradient-Boosted Trees on LIBSVM data files and to fix spark environment issues. Responsibilities/Deliverables:  Developed Spark ETL jobs to parse huge amount of unstructured data.  Developed Spark MLLIB jobs to create regression data model on structured data  Worked in IntelliJ Idea and SBT Build tool.  Developing UI for visualization of reports in D3 JS and Zoomdata.  Software Development and Automations for applications and system monitoring.  Working on Cloudera distribution of Hadoop (CDH).  Exposure to data manipulation with Hive queries  Exposure to schedule jobs in Oozie.  Exposure to create detailed document design of project.  Secure data by using apache Sentry authorization. Industry: Telecom Project: CTL-Cloudera Big Data As Service Company: CenturyLink Technology, Noida, IN Duration: September 2016 to Present (2 Months) Designation: Consultant Role: Big Data Developer Project Description: Press Report : j.mp/2cDr5nO My responsibilities are to develop Automation API framework in Java/Python which will setup and manage clusters with all services up and running automatically. Responsibilities/Deliverables:  Developed Automation API to deploy clusters on Cloudera manager Rest API.  Developed Structured Cluster templates for automation.  Software Development and Automations for applications and Ganglia system monitoring.
  • 2. Industry: Telecom Project: CTL Data Lake – PD Company: CenturyLink Technology, Noida, IN Duration: April 2016 – June 2016 (3 Months) Designation: Consultant Role:Big Data Developer Project Description: CTL Data Lake is CenturyLink internal project for creating application for comprehensive data access and management and then applies data analytics on scalable data. My responsibilities were to develop and test REST interface for data pipeline which take data from customer and dumps to Kafka topic as well parse with Spark Streaming and stored to HBase table and HDFS. Responsibilities/Deliverables:  Developed data pipeline with REST Java API which passes Kafka and HBase as consumer.  Developed flume integration with Kafka.  Worked Eclipse Mars with Maven build tool.  Developed Spark streaming API integrated with Kafka.  Exposed to real time streaming jobs. Industry: Telecom Project: AT&T Insights Company: Amdocs, Gurgaon, IN Duration: November 2014 to March 2016 (1 Year 5 Months) Designation: Software Engineer Role: Big Data Developer Project Description: AT&T is the second largest provider of mobile telephone services and the largest provider of fixed telephone services in the United States and also provides broadband subscription television services through DirecTV. AT&T Insights is a module in CRM application in Amdocs. My Responsibilities in Insight project: Data Ingestion to HBase from structured and unstructured data source and development of Insights Spark API Development for reading data from HBase Storage with Kafka producer for providing fast data access to multiple applications like u-verse, CRM, testing application at same time. Responsibilities/Deliverables:  Developed Spark framework development in scala and HBase as storage.  Developed flume integration with Kafka to loading unstructured data.  Developed HiveQL for analysis on Huge Telecom data.  Developed MapReduce jobs and UDFs with core-java.  Automatic Data ingestion platform for migration of data from Oracle and H-Base.  Worked in distributed Gigaspaces Grid clusters for Insight Application.  Exposure with real time stream jobs and batch jobs.  Software Development and Automations for applications and system monitoring.  Developed modules for database connections and for structured programming.
  • 3.  Experience with both Hortanworks and Cloudera distribution of Hadoop (CDH).  Developed log analysis and real time monitoring tools for production application.  Exposure to ETL jobs creation, flow diagrams, jobs scheduling in DAG.  Visualization of reports to client using Tableau.  Exposure to Ganglia, Kerberos, Hadoop metrics.