This document contains the resume of Vipin KP, who has over 5 years of experience as a Big Data Hadoop Developer. He has extensive experience developing Hadoop applications for clients such as EMC, Apple, Dun & Bradstreet, Neilsen, Commonwealth Bank of Australia, and Nokia Siemens Network. He has expertise in technologies such as Hadoop, Hive, Pig, Sqoop, Oozie, and Spark and has developed ETL processes, data pipelines, and analytics solutions on Hadoop clusters. He holds a Master's degree in Computer Science and is Cloudera certified in Hadoop development.
The Briefing Room with Dr. Robin Bloor and RedPoint Global
Live Webcast on September 23, 2014
Watch the archive:
https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=6cd94ed2ed7cc7090f7d5db1bf343438
Ask anyone who knows, and they’ll tell you candidly: traditional Master Data Management programs require not just tools, technologies and people, but also a level of cooperation and collaboration in the business that can be very difficult to manage. Many of the consequent hurdles that appear stem from long cycle times and lack of transparency into the creation and management of the rules that govern such programs. But now, the power of Hadoop 2.0 has opened up a very different method of action.
Register for this episode of The Briefing Room to learn from veteran Analyst Dr. Robin Bloor who will explain how the Hadoop ecosystem, powered by YARN, can transform MDM from a segmented, often disjointed set of business processes, into a much tighter platform that can finally deliver on the original promise of the discipline. He’ll be briefed by George Corugedo of RedPoint Global, who will showcase his company’s unified data management platform, which weaves together the best practices of traditional MDM with the power and flexibility of Hadoop.
Visit InsideAnlaysis.com for more information.
At ING we needed a way to implement Data science models from exploration into production. I will do this talk from my experience on the exploration and production Hadoop environment as a senior Ops engineer. For this we are using OpenShift to run Docker containers that connect to the big data Hadoop environment.
During this talk I will explain why we need this and how this is done at ING. Also how to set up a docker container running a data science model using Hive, Python, and Spark. I’ll explain how to use Docker files to build Docker images, add all the needed components inside the Docker image, and how to run different versions of software in different containers.
In the end I will also give a demo of how it runs and is automated using Git with webhook connecting to Jenkins and start the docker service that will connect to a big data Hadoop environment.
This is going to be a great technical talk for engineers and data scientist. LENNARD CORNELIS, Ops Engineer, ING
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Agile Testing Alliance
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Processing by "Sampat Kumar" from "Harman". The presentation was done at #doppa17 DevOps++ Global Summit 2017. All the copyrights are reserved with the author
Pig has added some exciting new features in 0.10, including a boolean type, UDFs in JRuby, load and store functions for JSON, bloom filters, and performance improvements. Join Alan Gates, Hortonworks co-founder and long-time contributor to the Apache Pig and HCatalog projects, to discuss these new features, as well as talk about work the project is planning to do in the near future. In particular, we will cover how Pig can take advantage of changes in Hadoop 0.23.
Game Changed – How Hadoop is Reinventing Enterprise ThinkingInside Analysis
The Briefing Room with Dr. Robin Bloor and RedPoint Global
Live Webcast on April 8, 2014
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=cfa1bffdd62dc6677fa225bdffe4a0b9
The innovation curve often arcs slowly before picking up speed. Companies that harness a major transformation early in the game can make serious headway before challengers enter the picture. The world of Hadoop features several of these upstarts, each of which uses the open-source foundation as an engine to drive vastly greater performance to a wide range of services, and even create new ones.
Register for this episode of The Briefing Room to hear veteran Analyst Dr. Robin Bloor explain how the Hadoop engine is being used to architect a new generation of enterprise applications. He’ll be briefed by George Corugedo, RedPoint Global CTO and Co-founder, who will showcase how enterprises can cost-effectively take advantage of the scalability, processing power and lower costs that Hadoop 2.0/YARN applications offer by eliminating the long-term expense of hiring MapReduce programmers.
Visit InsideAnlaysis.com for more information.
Security Updates: More Seamless Access Controls with Apache Spark and Apache ...DataWorks Summit
Security has always been a fundamental requirement for enterprise adoption. For example, in a company, billing, data science, and regional marketing teams may all have the required access privileges to view customer data, while sensitive data like credit card numbers should be accessible only to the finance team. Previously, Apache Hive™ with Apache Ranger™ policies was used to manage such scenarios. In this talk, we shows that Apache Spark™ SQL is aware of the existing Apache Ranger policies defined for Apache Hive. In other words, for SQL users, access to databases, tables, rows and columns are controlled in a fine-grained manner, irrespective of whether the data is analyzed using Apache Spark SQL or Hive. If a policy is updated, both Apache Spark and Apache Hive users get their result consistently. In addition, all fine-grained access via Apache Spark SQL can be monitored and searched through a centralized interface via Apache Ranger.
Cloudera's open-source Apache Hadoop distribution, CDH (Cloudera Distribution Including Apache Hadoop), targets enterprise-class deployments of that technology. Cloudera says that more than 50% of its engineering output is donated upstream to the various Apache-licensed open source projects.
https://www.pass4sureexam.com/ccD-410.html
WCF is not just for SOAP based services and can be used with popular protocols like RSS, REST and JSON. Rob Windsor covers URI templates, the importance of HTTP GET in the programmable web, how to expose service operations via HTTP GET, how to control the format of data exposed by service operations, and finally how to use the WebOperationContext to access the specifics of HTTP.
The Briefing Room with Dr. Robin Bloor and RedPoint Global
Live Webcast on September 23, 2014
Watch the archive:
https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=6cd94ed2ed7cc7090f7d5db1bf343438
Ask anyone who knows, and they’ll tell you candidly: traditional Master Data Management programs require not just tools, technologies and people, but also a level of cooperation and collaboration in the business that can be very difficult to manage. Many of the consequent hurdles that appear stem from long cycle times and lack of transparency into the creation and management of the rules that govern such programs. But now, the power of Hadoop 2.0 has opened up a very different method of action.
Register for this episode of The Briefing Room to learn from veteran Analyst Dr. Robin Bloor who will explain how the Hadoop ecosystem, powered by YARN, can transform MDM from a segmented, often disjointed set of business processes, into a much tighter platform that can finally deliver on the original promise of the discipline. He’ll be briefed by George Corugedo of RedPoint Global, who will showcase his company’s unified data management platform, which weaves together the best practices of traditional MDM with the power and flexibility of Hadoop.
Visit InsideAnlaysis.com for more information.
At ING we needed a way to implement Data science models from exploration into production. I will do this talk from my experience on the exploration and production Hadoop environment as a senior Ops engineer. For this we are using OpenShift to run Docker containers that connect to the big data Hadoop environment.
During this talk I will explain why we need this and how this is done at ING. Also how to set up a docker container running a data science model using Hive, Python, and Spark. I’ll explain how to use Docker files to build Docker images, add all the needed components inside the Docker image, and how to run different versions of software in different containers.
In the end I will also give a demo of how it runs and is automated using Git with webhook connecting to Jenkins and start the docker service that will connect to a big data Hadoop environment.
This is going to be a great technical talk for engineers and data scientist. LENNARD CORNELIS, Ops Engineer, ING
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Agile Testing Alliance
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Processing by "Sampat Kumar" from "Harman". The presentation was done at #doppa17 DevOps++ Global Summit 2017. All the copyrights are reserved with the author
Pig has added some exciting new features in 0.10, including a boolean type, UDFs in JRuby, load and store functions for JSON, bloom filters, and performance improvements. Join Alan Gates, Hortonworks co-founder and long-time contributor to the Apache Pig and HCatalog projects, to discuss these new features, as well as talk about work the project is planning to do in the near future. In particular, we will cover how Pig can take advantage of changes in Hadoop 0.23.
Game Changed – How Hadoop is Reinventing Enterprise ThinkingInside Analysis
The Briefing Room with Dr. Robin Bloor and RedPoint Global
Live Webcast on April 8, 2014
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=cfa1bffdd62dc6677fa225bdffe4a0b9
The innovation curve often arcs slowly before picking up speed. Companies that harness a major transformation early in the game can make serious headway before challengers enter the picture. The world of Hadoop features several of these upstarts, each of which uses the open-source foundation as an engine to drive vastly greater performance to a wide range of services, and even create new ones.
Register for this episode of The Briefing Room to hear veteran Analyst Dr. Robin Bloor explain how the Hadoop engine is being used to architect a new generation of enterprise applications. He’ll be briefed by George Corugedo, RedPoint Global CTO and Co-founder, who will showcase how enterprises can cost-effectively take advantage of the scalability, processing power and lower costs that Hadoop 2.0/YARN applications offer by eliminating the long-term expense of hiring MapReduce programmers.
Visit InsideAnlaysis.com for more information.
Security Updates: More Seamless Access Controls with Apache Spark and Apache ...DataWorks Summit
Security has always been a fundamental requirement for enterprise adoption. For example, in a company, billing, data science, and regional marketing teams may all have the required access privileges to view customer data, while sensitive data like credit card numbers should be accessible only to the finance team. Previously, Apache Hive™ with Apache Ranger™ policies was used to manage such scenarios. In this talk, we shows that Apache Spark™ SQL is aware of the existing Apache Ranger policies defined for Apache Hive. In other words, for SQL users, access to databases, tables, rows and columns are controlled in a fine-grained manner, irrespective of whether the data is analyzed using Apache Spark SQL or Hive. If a policy is updated, both Apache Spark and Apache Hive users get their result consistently. In addition, all fine-grained access via Apache Spark SQL can be monitored and searched through a centralized interface via Apache Ranger.
Cloudera's open-source Apache Hadoop distribution, CDH (Cloudera Distribution Including Apache Hadoop), targets enterprise-class deployments of that technology. Cloudera says that more than 50% of its engineering output is donated upstream to the various Apache-licensed open source projects.
https://www.pass4sureexam.com/ccD-410.html
WCF is not just for SOAP based services and can be used with popular protocols like RSS, REST and JSON. Rob Windsor covers URI templates, the importance of HTTP GET in the programmable web, how to expose service operations via HTTP GET, how to control the format of data exposed by service operations, and finally how to use the WebOperationContext to access the specifics of HTTP.
SriSeshaa's Healthcare offerings address the ever-evolving needs of the Healthcare domain.Our consulting and development services are designed to help healthcare organizations transform operations by reducing costs, streamlining processes and enhancing patient care services.
Các điểm nổi bật hệ thống Infor
Luôn tự tin khi có giải pháp Infor
• Giải pháp toàn cầu
• Hỗ trợ mô hình sản suất đa dạng
• Có nhiều kinh nghiệm và thành công trong nhiều ngành sản xuất
Kiến trúc hệ thống linh hoạt và nhiều khả năng
• Một cty hoặc nhiều cty, Cloud/Tại đơn vị, Toàn cầu/có các đặc thù theo Việt Nam
Hệ thống ERP đầy đủ các chức năng mở rộng theo lĩnh vực:
• Quản lý dịch vụ, Dự báo tồn kho, sửa chữa, vận hành,Quản lý chất lượng, CRM
Sản phẩm rất mạnh và sâu về sản xuất
• Tích hợp hệ thống kế hoạch sản xuất nâng cao, hoạch định từ đơn giản đến phức tạp
Các chức năng được xây dựng hoàn toàn do hãng Infor
• theo ngành, CRM, KPI’s / Dashboards, Quản lý tài liệu, Quy trình và hơn thế...
Có thể làm chủ hoàn toàn giải pháp
• Phần mềm hoạt động theo yêu cầu của doanh nghiệp.
• Có thể cấu hình riêng cho từng người dùng.
• Truy cập thông tin mọi nơi
• Có cả các cổng thông tin cho khách hàng tương tác
Xây dựng trên công nghệ tiên tiến
nhất của Microsoft
Tích hợp hoàn toàn Ms Office
• Cơ sở dữ liệu SQL
• Các loại trình duyệt
• Triển khai Cloud hoặc tại Đơn vị
• Chi phí sở hữu tối ưu
• Yêu cầu nguồn lực IT tối thiểu.
• Yêu cầu phần cứng tối thiểu
• Hỗ trợ cả máy chủ ảo
• Capable of processing large sets of structured, semi-structured and unstructured data and supporting system architecture
• Implemented Proof of concepts on Hadoop stack and different big data analytic tools, migration from different databases to Hadoop.
• Developed multiple Map Reduce jobs in java for data cleaning and pre-processing according to the business requirements, Importing and exporting data into HDFS and Hive using Sqoop.
Having Experience in writing HIVE queries & Pig scripts.
• Excellent analytical and problem solving skills.
• Excellent communication skills.
• Quick Learner, Self-Motivated and team player traits.
• Ability to mentor and educate peers whenever needed for the greater good of the team as a whole.
Innovation in the Enterprise Rent-A-Car Data WarehouseDataWorks Summit
Big Data adoption is a journey. Depending on the business the process can take weeks, months, or even years. With any transformative technology the challenges have less to do with the technology and more to do with how a company adapts itself to a new way of thinking about data. Building a Center of Excellence is one way for IT to help drive success.
This talk will explore Enterprise Holdings Inc. (which operates the Enterprise Rent-A-Car, National Car Rental and Alamo Rent A Car) and their experience with Big Data. EHI’s journey started in 2013 with Hadoop as a POC and today are working to create the next generation data warehouse in Microsoft’s Azure cloud utilizing a lambda architecture.
We’ll discuss the Center of Excellence, the roles in the new world, share the things which worked well, and rant about those which didn’t.
No deep Hadoop knowledge is necessary, architect or executive level.
1. VIPIN K P
Email:vipinkprc@gmail.com Phone: +91 773 6531 979
+91 990 2321 979
Professional Summary
5+ years of Big Data experience in leading IT organization and extensive experience in various
technology implementations. Experienced Hadoop Developer has a strong background with file
distribution systems in a big-data arena. Understands the complex processing needs of big data
and has experience developing codes and modules to address those needs. Brings a Master's
Degree in Computer Science along with Certification as a Developer using Apache Hadoop.
Core Qualifications
Cloudera Certified Hadoop developer.
In depth Knowledge and experience in Distributed computing platform such as Hadoop.
Expertise in Object Oriented Analysis and Design and core Java development.
Strong knowledge in SQL like Big Data supporting Hive data warehouse.
Able to assess business rules, collaborate with stakeholders and perform source-to-target
data mapping, design and review.
Hands on experience on Hortonworks and Cloudera Hadoop environments.
Expertise in writing Hadoop Jobs for analyzing data using Hive QL (Queries), Pig Latin
(Data flow language), and custom MapReduce programs in Java.
Solid understanding on NoSql tools like HBase, MongoDB, Cassandra.
Involved in the full development lifecycle from requirements gathering through
development, Repository Manager, Designer, Workflow Manager, and Workflow
Monitor. Extensively worked with large Databases in Production environments.
Extending Hive and Pig core functionality by writing custom UDFs.
Good Knowledge onHadoopCluster architecture and monitoring the cluster.
An excellent team player and self-starter with good communication skills and proven
abilities to finish tasks before target deadlines.
Strong analytical and Problem solving skills.
Experience in ETL processes.
Expertise in Spark-Scala.
2. Areas of Expertise
• Big Data Ecosystems: Hadoop, MapReduce, HDFS, HBase, Zookeeper, Hive,
Pig, Sqoop, Cassandra, Oozie, Flume, Pentaho, Spark
• Programming Languages: Java, C/C++, Scala
• Scripting Languages: Bash
• Databases: NoSQL, Oracle, MySQL, MongoDb
• Scheduler: Autosys, Oozie
• Tools: Eclipse
• Platform: Windows, Linux, Mac
• Application Servers: Tomcat, Jboss
• Testing Tools: Eclipse, NetBeans
• Methodologies: Agile
• Version Control: Git, SVN
Professional Experience
Hadoop Developer
Isilon BDL product Jan 2017 – present
Client : EMC
Description: Facilitated insightful daily analyses of 60 to 80GB of Iphone home log data collected by
Isilon cluster. Cluster usage of each customer, cluster health check, Spawning recommendations,
tableau visualization.
Responsibilities:
• Interacted with client as per requirement gathering and suggested relevant technologies to
build the solution.
• Provided design recommendations and thought leadership to sponsors/stakeholders that
improved review processes and resolved technical problems.
• Developed MapReduce programs to parse the raw data, populate staging tables and store the
refined data in partitioned tables in the EDW.
• Enabled speedy reviews and first mover advantages by using Oozie to automate data loading
into the Hadoop Distributed File System and PIG to pre-process the data.
• Exported parsed data from hdfs to postgresql with Sqoop.
• Managed and reviewed Hadoop log files.
• Completed testing of integration and tracked and solved defects.
• Tested raw data and executed performance scripts.
.
Hadoop Developer
PX4-Hadoop May 2015 – Dec 2016
Client : Apple
Description: Apple Music is an application developed by Apple to provide music for users. Users can
play, skip, scrub forward, scrub back and pause songs. Apple Music has Beats1 Radio as well to provide
3. broadcast music for its users. Autosys is the scheduler for scheduling the job. Project deals with analysis
of customer usage, payment calculation, Royalty bearing along with it exporting data to Teradata and
different other sources. In Apple we collect and analyse large amounts of data daily, monthly and
quarterly basis.
Responsibilities:
• Involved in requirement gathering, analysis and design document creation
• Introduced metadata driven architecture in RINS aggregate
• Worked on a live 1000+ nodes Hadoop cluster running HDP2.2
• Developed Hive scripts for end user / analyst requirements to perform ad hoc analysis
• Very good understanding of Partitions, Bucketing concepts in Hive and designed
both Managed and External tables in Hive to optimize performance
• Solved performance issues in Hive with understanding of Joins, Group and aggregation
and how does it translate to MapReduce jobs.
• Developed UDFs and UDAF in Java as and when necessary to use in HIVE queries
• Developed Oozie workflow for scheduling and orchestrating the ETL process
• Created autosys jobs for aggregates in order to schedule the job
• Shell scripts are introduced to export the result to Teradata and XT interface.
• Testing document is created incorporating all the relevant scenarios.
• Created CR artifacts and deployed the same in production clusters.
Hadoop Developer
ScoringPoA Oct 2014 – April 2015
Client : Dun & Bradstreet
Description: Build one global solution that compares relative levels of risk irrespective of
geographic boundaries. This would help D&B's to build a single, globally-consistent risk score
for their customers with cross-border needs that can go beyond their priority markets
Responsibilities:
• Developed Hive scripts for end user / analyst requirements to perform ad hoc analysis
• Worklows and scheduling jobs are created using Oozie
Hadoop Developer
Global Analytics Platform March 2013 – Sept 2014
Client : Neilsen (Walmart)
Description: Global Analytical Platform of Nielsen Retail team deals with the Buying insights of
the customers. GAP team involves different modules like ETL, Data Model and Publishing
services. Our project deals with the ETL module upon use cases such as Pricing Insights and
Pricing Elasticity. ETL team performs the extraction of data from multiple data sources, does
pre-validation checkings and through source-target mappings we do transformations for stage
4. table creation. Finaly we generate the target tables and provide to the model team to perform
analytics. The entire data flow is triggered using oozie which calls the Hive scripts and Java API.
Responsibilities:
• Involved in requirement gathering, analysis and design document creation
• Worked on a live 100+ nodes Hadoop cluster running CDH5.1
• Developed UDF in java inorder to meet a specific requirement in hive
• Developed Hive scripts for end user / analyst requirements to perform ad hoc analysis
• Worklows and scheduling jobs are created using Oozie
• Introduced shell script to extract the raw data in different formats and load the file
to configured location in hdfs.
Hadoop Developer
Big Data Synchronization Sept 2012 – Feb 2013
Client : CommonWealth Bank of Australia
Description: Tool incorporates HDFS, hive and hbase data backup. Details of the data to be sync
are in mysql. Hive metastore replication make use of mysqlreplication. It also provides facility to
merge and purge files.
Responsibilities:
• Involved in requirement gathering, design, development and testing document creation
• Implemented hdfs replication using distcp
• HBase replication solution is created using HBase CopyTable command
• Hive replication is done using mysql replication
• Worklows and scheduling jobs are created using Oozie
Hadoop Developer
M &S Performance Benchmarking Jun 2012 – Aug 2012
Client : Marks And Spencer
Description: Application is used to know the product affinity.Source data resides in DB2.Data is
transferred to HDFS through Sqoop and analysis is triggered.Visualisation is done through
Pentaho.Manufacturer gets an overview of buying pattern of the products sold.
Responsibilities:
• Exported data from DB2 to hdfs using Sqoop
• Implemented Market Basket Analysis algorithm to analyse product affinity
• Designed and created hive table to load resultant data
• Visualization is done using Pentaho from hive table data
5. Hadoop Developer
Foundation Framework Mar 2012 – May 2012
Client : Internal
Description: Acts as a centralized repository of reusable utilities of big data components
Different hadoop based components can be integrated using this framework.
Responsibilities:
• Involved in requirement gathering, design, development and testing document creation
• Integrated Hive migration solution to framework.
• Committed all the tested code to Git
Hadoop Developer
NSN Performance Benchmarking Jan 2012 – Mar 2012
Client : Nokia Siemens Network
Description: Developing a near real time platform to analyze customer events on the large
volume data. It mainly incorporates ETL processing using hive and pig scripts and optimization
recommendations.
Responsibilities:
• Designed and created Hive table
• Transformation is performed using Hive
• Transformation is performed using Pig
• Cluster monitoring and benchmarking is done using Ganglia
Hadoop Developer
Spark-POC
Client : Internal
Description: Log Analysis is done using Spark streaming and Spark-Sql
Responsibilities:
• Implemented Spark streaming using Spark streaming API
• Spark Sql is used to analyse the data
Career Profile
Company Accenture
Designation Senior Analyst
Location Bangalore
Duration May 2015 till date
6. Company Cognizant Technologies
Designation Associate-Project
Location Cochin
Duration November 2014 - May 2015
Company Tata Consultancy services Ltd
Designation Systems Engineer
Location Chennai , Cochin
Duration September 2011 – October 2014
Personal Details
Name Vipin KP
Date of Birth 30-Dec-1987
Nationality Indian
Sex Male
Marital Status Single
Positive Points Determinant, sincere, good listener, leadership
Qualities
Passport Details K1297963
Educational Qualification
Degree and Date University Specialization
Bachelor of Technology,June
2011
Kerala University Computer Science
7. Company Cognizant Technologies
Designation Associate-Project
Location Cochin
Duration November 2014 - May 2015
Company Tata Consultancy services Ltd
Designation Systems Engineer
Location Chennai , Cochin
Duration September 2011 – October 2014
Personal Details
Name Vipin KP
Date of Birth 30-Dec-1987
Nationality Indian
Sex Male
Marital Status Single
Positive Points Determinant, sincere, good listener, leadership
Qualities
Passport Details K1297963
Educational Qualification
Degree and Date University Specialization
Bachelor of Technology,June
2011
Kerala University Computer Science