Use dependency injection to get Hadoop *out* of your application codeDataWorks Summit
Hadoop MapReduce provides transparent parallelization but often results in specialized code bases that interact with low-level data formats. We present a means of using dependency injection to manage data flows in MapReduce which in turn supports reusable, Hadoop-agnostic application code that interacts with high-level business domain objects. An example is provided that applies Dependency Injection to the Hadoop WordCount example and shows how the same code invoked from the WordCount MapReduce job can be reused in a real-time context. We then discuss Opower’s application of this pattern to employ the same core calculations in both batch processing and in servicing real-time requests from end users. This topic will be of interest to those interested in reusing core batch calculations in real-time contexts. It also provides a means forward for organizations moving to Hadoop that have existing code components that they would like to employ in batch MapReduce computations.
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB
Bernard Doering, Senior Slaes Director DACH, Cloudera.
Hadoop and the Future of Data Management. As Hadoop takes the data management market by storm, organisations are evolving the role it plays in the modern data centre. Explore how this disruptive technology is quickly transforming an industry and how you can leverage it today, in combination with MongoDB, to drive meaningful change in your business.
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
Data warehouses have been the standard tool for analyzing data created by business operations. In recent years, increasing data volumes, new types of data formats, and emerging analytics technologies such as machine learning have given rise to modern data lakes. Connecting application databases, data warehouses, and data lakes using real-time data pipelines can significantly improve the time to action for business decisions. More: http://info.mapr.com/WB_MapR-StreamSets-Data-Warehouse-Modernization_Global_DG_17.08.16_RegistrationPage.html
BDM39: HP Vertica BI: Sub-second big data analytics your users and developers...Big Data Montreal
Despite how fantastic pigs look with lipstick on and how magical elephants look with wings attached, there remains a large gap between what popular big data stacks offer and what end users demand in terms of reporting agility and speed. Join us to learn how Montreal-based AdGear, an advertising technology company, faced challenges as its data volume increased. You will hear how AdGear's data stack evolved to meet these challenges, and how HP Vertica's architecture and features changed the game.
(by Mina Naguib, Technical Director of Platform Engineering at AdGear).
https://youtu.be/tzQUUCuVjVc
Use dependency injection to get Hadoop *out* of your application codeDataWorks Summit
Hadoop MapReduce provides transparent parallelization but often results in specialized code bases that interact with low-level data formats. We present a means of using dependency injection to manage data flows in MapReduce which in turn supports reusable, Hadoop-agnostic application code that interacts with high-level business domain objects. An example is provided that applies Dependency Injection to the Hadoop WordCount example and shows how the same code invoked from the WordCount MapReduce job can be reused in a real-time context. We then discuss Opower’s application of this pattern to employ the same core calculations in both batch processing and in servicing real-time requests from end users. This topic will be of interest to those interested in reusing core batch calculations in real-time contexts. It also provides a means forward for organizations moving to Hadoop that have existing code components that they would like to employ in batch MapReduce computations.
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB
Bernard Doering, Senior Slaes Director DACH, Cloudera.
Hadoop and the Future of Data Management. As Hadoop takes the data management market by storm, organisations are evolving the role it plays in the modern data centre. Explore how this disruptive technology is quickly transforming an industry and how you can leverage it today, in combination with MongoDB, to drive meaningful change in your business.
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
Data warehouses have been the standard tool for analyzing data created by business operations. In recent years, increasing data volumes, new types of data formats, and emerging analytics technologies such as machine learning have given rise to modern data lakes. Connecting application databases, data warehouses, and data lakes using real-time data pipelines can significantly improve the time to action for business decisions. More: http://info.mapr.com/WB_MapR-StreamSets-Data-Warehouse-Modernization_Global_DG_17.08.16_RegistrationPage.html
BDM39: HP Vertica BI: Sub-second big data analytics your users and developers...Big Data Montreal
Despite how fantastic pigs look with lipstick on and how magical elephants look with wings attached, there remains a large gap between what popular big data stacks offer and what end users demand in terms of reporting agility and speed. Join us to learn how Montreal-based AdGear, an advertising technology company, faced challenges as its data volume increased. You will hear how AdGear's data stack evolved to meet these challenges, and how HP Vertica's architecture and features changed the game.
(by Mina Naguib, Technical Director of Platform Engineering at AdGear).
https://youtu.be/tzQUUCuVjVc
Sudhir hadoop and Data warehousing resume Sudhir Saxena
Overall 5.8 Years of Professional IT experience in Data Warehousing and Business Intelligence.
Having one year onsite experience in Mexico and USA with USAA Client, USA.
Having good Knowledge in Bigdata, Hadoop, Pig, Hive, Sqoop, Hbase, Python, Spark, Scala.
Continuous Data Ingestion pipeline for the EnterpriseDataWorks Summit
Continuous Data ingestion platform built on NIFI and Spark that integrates variety of data sources including real-time events, data from external sources , structured and unstructured data with in-flight governance providing a real-time pipeline moving data from source to consumption in minutes. The next-gen data pipeline has helped eliminate the legacy batch latency and improve data quality and governance by designing custom NIFI processors and embedded Spark code. To meet the stringent regulatory requirements the data pipeline is being augmented with features to do in-flight ETL , DQ checks that enables a continuous workflow enhancing the Raw / unclassified data to Enriched / classified data available for consumption by users and production processes.
An Introduction to the MapR Converged Data PlatformMapR Technologies
Listen to the webinar on-demand: http://info.mapr.com/WB_Partner_CDP_Intro_EMEA_DG_17.05.31_RegistrationPage.html
In this 90-minute webinar, we discuss:
- The MapR Converged Data Platform and its components
- Use cases for the Converged Data Platform
- MapR Converged Partner Program
- How to get started with MapR
- Becoming a partner
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-HadoopMapR Technologies
Organizations need to derive business insights from an unprecedented volume and variety of data, while maximizing investments in existing SQL and business intelligence (BI) technologies.
How can you explore the onslaught of semi-structured and structured data quickly and easily, and still get the most complete and advanced analytics?
Watch this recorded webinar to learn how you can enjoy the benefits of a SQL-on-Hadoop analytics solution that provides the highest-performing, tightly-integrated platform for operational and exploratory analytics.
Learn:
- The advantages of SQL-on-Hadoop
- The pros and cons of typical SQL-on-Hadoop solutions
- How you can get the fastest, most open SQL-on-Hadoop without the trade-offs
- How you can gain deeper business insights using all of your data, while leveraging existing BI tools and skills
- Use cases from industry leaders on how to perform deeper, more advanced analytics directly in Hadoop, more efficiently and cost-effectively
Chris Selland, VP of Marketing & Business Development at HP Vertica, and Steve Wooledge, VP of Product Marketing at MapR, explain how you can grow and leverage business intelligence with an optimized, best-of-breed solution for SQL-on-Hadoop.
Making Big Data Analytics with Hadoop fast & easy (webinar slides)Yellowfin
Looking to analyze your Big Data assets to unlock real business benefits today? But, are you sick of all the theories, hype and whoopla?
View these slides from Actian and Yellowfin’s "Big Data Analytics with Hadoop" Webinar to discover how we’re making Big Data Analytics fast and easy.
Hold on as we go from data in Hadoop to dashboard in just 40-minutes.
Learn how to combine Hadoop with the most advanced Big Data technologies, and world’s easiest BI solution, to quickly generate real business value from Big Data Analytics.
Watch as we use live CDR data stored in Hadoop – quickly connecting, preparing, optimizing and analyzing this data in a tangible real-world use case from the telecommunications industry – to easily deliver actionable insights to anyone, anywhere, anytime.
To learn more about Yellowfin, and to try its intuitive Business Intelligence platform today, go here: http://www.yellowfinbi.com
To learn more about Actian, and its next generation suite of Big Data technologies, go here: http://www.actian.com/
Is your organization at the analytics crossroads? Have you made strides collecting and sharing massive amounts of data from electronic health records, insurance claims, and health information exchanges but found these efforts made little impact on efficiency, patient outcomes, or costs?
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
SAP® HANA and SAP® IQ are popular platforms for various analytical and transactional use cases. If you’re an SAP customer, you’ve experienced the benefits of deploying these solutions. However, as data volumes grow, you’re likely asking yourself: How do I scale storage to support these applications? How can I have one platform for various applications and use cases?
BDaas- BigData as a service by "Sherya Pal" from "Saama". The presentation was done at #doppa17 DevOps++ Global Summit 2017. All the copyrights are reserved with the author
Moustafa Soliman "HP Vertica- Solving Facebook Big Data challenges" Dataconomy Media
Moustafa Soliman, Business Intelligence Developer from Hewlett Packard presented "HP Vertica - Solving Facebook Big Data Challenges" as part of "Big Data Stockholm" meetup on April 1st at SUP46.
Game Changed – How Hadoop is Reinventing Enterprise ThinkingInside Analysis
The Briefing Room with Dr. Robin Bloor and RedPoint Global
Live Webcast on April 8, 2014
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=cfa1bffdd62dc6677fa225bdffe4a0b9
The innovation curve often arcs slowly before picking up speed. Companies that harness a major transformation early in the game can make serious headway before challengers enter the picture. The world of Hadoop features several of these upstarts, each of which uses the open-source foundation as an engine to drive vastly greater performance to a wide range of services, and even create new ones.
Register for this episode of The Briefing Room to hear veteran Analyst Dr. Robin Bloor explain how the Hadoop engine is being used to architect a new generation of enterprise applications. He’ll be briefed by George Corugedo, RedPoint Global CTO and Co-founder, who will showcase how enterprises can cost-effectively take advantage of the scalability, processing power and lower costs that Hadoop 2.0/YARN applications offer by eliminating the long-term expense of hiring MapReduce programmers.
Visit InsideAnlaysis.com for more information.
MapR announced a few new releases in 2017, and we want to go over those exciting new products and features that are available now. We’d like to invite our customers and partners to this webinar in which members of the MapR product team will share details about the latest updates.
End to End Machine Learning Open Source Solution Presented in Cisco Developer...Manish Harsh
The RAPIDS suite of open source software libraries and APIs gives you the ability to execute end-to-end data science and analytics pipelines entirely on GPUs. Licensed under Apache 2.0, RAPIDS is incubated by NVIDIA® based on extensive hardware and data science science experience. RAPIDS utilizes NVIDIA CUDA® primitives for low-level compute optimization, and exposes GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.
Everyone is awash in the new buzzword, Big Data, and it seems as if you can’t escape it wherever you go. But there are real companies with real use cases creating real value for their businesses by using big data. This talk will discuss some of the more compelling current or recent projects, their architecture & systems used, and successful outcomes.
Sudhir hadoop and Data warehousing resume Sudhir Saxena
Overall 5.8 Years of Professional IT experience in Data Warehousing and Business Intelligence.
Having one year onsite experience in Mexico and USA with USAA Client, USA.
Having good Knowledge in Bigdata, Hadoop, Pig, Hive, Sqoop, Hbase, Python, Spark, Scala.
Continuous Data Ingestion pipeline for the EnterpriseDataWorks Summit
Continuous Data ingestion platform built on NIFI and Spark that integrates variety of data sources including real-time events, data from external sources , structured and unstructured data with in-flight governance providing a real-time pipeline moving data from source to consumption in minutes. The next-gen data pipeline has helped eliminate the legacy batch latency and improve data quality and governance by designing custom NIFI processors and embedded Spark code. To meet the stringent regulatory requirements the data pipeline is being augmented with features to do in-flight ETL , DQ checks that enables a continuous workflow enhancing the Raw / unclassified data to Enriched / classified data available for consumption by users and production processes.
An Introduction to the MapR Converged Data PlatformMapR Technologies
Listen to the webinar on-demand: http://info.mapr.com/WB_Partner_CDP_Intro_EMEA_DG_17.05.31_RegistrationPage.html
In this 90-minute webinar, we discuss:
- The MapR Converged Data Platform and its components
- Use cases for the Converged Data Platform
- MapR Converged Partner Program
- How to get started with MapR
- Becoming a partner
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-HadoopMapR Technologies
Organizations need to derive business insights from an unprecedented volume and variety of data, while maximizing investments in existing SQL and business intelligence (BI) technologies.
How can you explore the onslaught of semi-structured and structured data quickly and easily, and still get the most complete and advanced analytics?
Watch this recorded webinar to learn how you can enjoy the benefits of a SQL-on-Hadoop analytics solution that provides the highest-performing, tightly-integrated platform for operational and exploratory analytics.
Learn:
- The advantages of SQL-on-Hadoop
- The pros and cons of typical SQL-on-Hadoop solutions
- How you can get the fastest, most open SQL-on-Hadoop without the trade-offs
- How you can gain deeper business insights using all of your data, while leveraging existing BI tools and skills
- Use cases from industry leaders on how to perform deeper, more advanced analytics directly in Hadoop, more efficiently and cost-effectively
Chris Selland, VP of Marketing & Business Development at HP Vertica, and Steve Wooledge, VP of Product Marketing at MapR, explain how you can grow and leverage business intelligence with an optimized, best-of-breed solution for SQL-on-Hadoop.
Making Big Data Analytics with Hadoop fast & easy (webinar slides)Yellowfin
Looking to analyze your Big Data assets to unlock real business benefits today? But, are you sick of all the theories, hype and whoopla?
View these slides from Actian and Yellowfin’s "Big Data Analytics with Hadoop" Webinar to discover how we’re making Big Data Analytics fast and easy.
Hold on as we go from data in Hadoop to dashboard in just 40-minutes.
Learn how to combine Hadoop with the most advanced Big Data technologies, and world’s easiest BI solution, to quickly generate real business value from Big Data Analytics.
Watch as we use live CDR data stored in Hadoop – quickly connecting, preparing, optimizing and analyzing this data in a tangible real-world use case from the telecommunications industry – to easily deliver actionable insights to anyone, anywhere, anytime.
To learn more about Yellowfin, and to try its intuitive Business Intelligence platform today, go here: http://www.yellowfinbi.com
To learn more about Actian, and its next generation suite of Big Data technologies, go here: http://www.actian.com/
Is your organization at the analytics crossroads? Have you made strides collecting and sharing massive amounts of data from electronic health records, insurance claims, and health information exchanges but found these efforts made little impact on efficiency, patient outcomes, or costs?
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
SAP® HANA and SAP® IQ are popular platforms for various analytical and transactional use cases. If you’re an SAP customer, you’ve experienced the benefits of deploying these solutions. However, as data volumes grow, you’re likely asking yourself: How do I scale storage to support these applications? How can I have one platform for various applications and use cases?
BDaas- BigData as a service by "Sherya Pal" from "Saama". The presentation was done at #doppa17 DevOps++ Global Summit 2017. All the copyrights are reserved with the author
Moustafa Soliman "HP Vertica- Solving Facebook Big Data challenges" Dataconomy Media
Moustafa Soliman, Business Intelligence Developer from Hewlett Packard presented "HP Vertica - Solving Facebook Big Data Challenges" as part of "Big Data Stockholm" meetup on April 1st at SUP46.
Game Changed – How Hadoop is Reinventing Enterprise ThinkingInside Analysis
The Briefing Room with Dr. Robin Bloor and RedPoint Global
Live Webcast on April 8, 2014
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=cfa1bffdd62dc6677fa225bdffe4a0b9
The innovation curve often arcs slowly before picking up speed. Companies that harness a major transformation early in the game can make serious headway before challengers enter the picture. The world of Hadoop features several of these upstarts, each of which uses the open-source foundation as an engine to drive vastly greater performance to a wide range of services, and even create new ones.
Register for this episode of The Briefing Room to hear veteran Analyst Dr. Robin Bloor explain how the Hadoop engine is being used to architect a new generation of enterprise applications. He’ll be briefed by George Corugedo, RedPoint Global CTO and Co-founder, who will showcase how enterprises can cost-effectively take advantage of the scalability, processing power and lower costs that Hadoop 2.0/YARN applications offer by eliminating the long-term expense of hiring MapReduce programmers.
Visit InsideAnlaysis.com for more information.
MapR announced a few new releases in 2017, and we want to go over those exciting new products and features that are available now. We’d like to invite our customers and partners to this webinar in which members of the MapR product team will share details about the latest updates.
End to End Machine Learning Open Source Solution Presented in Cisco Developer...Manish Harsh
The RAPIDS suite of open source software libraries and APIs gives you the ability to execute end-to-end data science and analytics pipelines entirely on GPUs. Licensed under Apache 2.0, RAPIDS is incubated by NVIDIA® based on extensive hardware and data science science experience. RAPIDS utilizes NVIDIA CUDA® primitives for low-level compute optimization, and exposes GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.
Everyone is awash in the new buzzword, Big Data, and it seems as if you can’t escape it wherever you go. But there are real companies with real use cases creating real value for their businesses by using big data. This talk will discuss some of the more compelling current or recent projects, their architecture & systems used, and successful outcomes.
CUTGroup Presentation for Open Indy Brigade Meetup on November 19, 2015smarziano
Sonja Marziano's presentation on the Civic User Testing Group (www.cutgroup.org) for Open Indy Brigade Meetup Civic UX Project Launch on November 19, 2015 (http://www.meetup.com/Open-Indy-Brigade/events/226485356/).
1. Prasanna Kumar
Email: rajupr1990@gmail.com
Mobile: +91 7331108854/8884601978
Summary:
Having total 4 Years of IT experience in Application and Hadoop development.
Expertise in Hadoop Framework and Big Data concepts (HDFS, Map Reduce, Hive, Impala, Pig,
Phoenix, HBase, Sqoop, Flume, Oozie, Spark, Scala, Kudu).
Having Experience in writing IMPALA and HIVE queries & Pig scripts.
Having Experience in writing Phoenix scripts and HBase.
Hands on with MapReduce Programming Model
Having good knowledge in Spark, Scala, and Kudu.
Importing and exporting data from HDFS using Sqoop.
Experience in Setting Hadoop Cluster, Performance benchmarking and monitoring of Hadoop
clusters
Having good working knowledge on UNIX command and Shell Programming.
Having Experience in Oracle ADF.
Experience in SDLC Cycle, Creating and maintaining Project Documents.
Excellent communicational, analytical, business and interpersonal skills. Self-motivated with a
proactive,resourcefulapproach toproblem solving and ability toworkindependently and tolead
or be part of a team.
Work Experience:
Worked as a Software Engineer in Tibco Software from Dec 2015 to Till Date.
Worked as a Software Engineer in SLK Software Services from Apr 2014 to Nov 2015.
Worked as an Associate Software Engineer in 3i InfoTech from Sep 2012 to Mar 2014.
Educational Qualifications:
BE/B.TECH (ECE) from Sri Venkateswara University.
Technical Proficiencies:
Frameworks : Hadoop, Oracle ADF
Hadoop Ecosystem : HDFS, Hive, Impala, Phoenix, Pig, Map Reduce, Sqoop, Flume
Programming skills : Java, SQL
NoSQL : HBase
J2EE Technologie : Oracle ADF
Servers : Tomcat, Web Logic
Scripting language : Shell Script
DBMS : Oracle and MySQL
Version Tools/ IDE : SVN, Eclipse, JDeveloper11g
Operating Systems : Windows, UNIX
2. Professional Experience:
Project #1:
Title : Data Science D2O
Client : Nielsen
Environment : Impala, Hive, Phoenix, HDFS, Map Reduce, HBase
Role : Developer
Description: This project willdigitize data science to align with the Nielsen vision to become more
digital. All Data Science tools are dependent on Global Factory Data being available in NDX.
NDX is uniquely positioned to deliver local and global insights into consumer behavior and product
sales across categories in nearly 100 countries. Nielsen’s powerful combination of deep data and
insights arms clients with actionable intelligence for their business planning. Our tools provide
clients with timely, flexible analytics, presenting a holistic view of the marketplace. Infusing digital
into data science to unleash innovation & provide cutting edge solutions for our clients. The major
implementations are CVCalc & Replica in Data Science.
CVCALC:
Implementation of new robust methodology forCV calculation based on modelling approach.
Improved data stability through minimized sample changes during sample re-designs and
annual updates and data precision through more robust variance calculations.
Integrate tool with NDX and other tools within it improving calculation efficiency
REPLICA:
Replica is a tool for calculating Relative Standard Error. Using a variance engine which is
based on bootstrap principles. It simulates thevariation associated withsample participation
by forming replicated samples where sampling units are giving variable rights to participate
in the sample.
This tool willallow standard decision in compliancewith our WBS,in terms of sample design
and MBD reporting. And therefore will translate in better quality to our client deliverables.
Responsibilities:
Interaction with client in gathering and understanding the requirements
Involved in writing Impala queries to load and process data in Hadoop file system.
Involved in writing Phoenix scripts to load and process data in HBase.
Writing Shell scripts that are invoked by TibcoBW.
Writing Map Reduce code for bulk data loading to Phoenix.
Involved in integrating Hive and HBase for purpose of generating reports.
Involved in team level meetings for knowledge transfer.
3. Project #2:
Title : Research and Analysis in Sales and Operations.
Client : Saint-Gobain
Environment : HDFS, Map Reduce, Hive, Pig, Sqoop, Cloudera
Role : Developer
Description: TheProjectbased on sales and operation information forthe business analysts. These
reports show the current month sales, prior month, previous year and year to date sales information
for different business units and different products. We get the flat files from SAP BI.
Saint-Gobain is French multinational company founded in 1665 in Paris. Originally, a mirror
manufacturer, it now produces a widevariety of construction and high performance materials. Saint-
Gobain is organized intofour major sectors.Building Distribution, Construction Products,Innovative
Materials, Packaging. Each Sector is further organized into Business Units (BUs) that serve specific
markets within each Sector.
Responsibilities:
Analyzing the functional Specifications bases on project required.
Involved in loading data from UNIX file system to HDFS.
Responsible for uploading dataset into Hadoop cluster.
Supported Map Reduce programs those are running on the cluster.
Involved in creating Hive tables, loading with data and writing hive queries which will run
internally in map reduce way.
Created partitioned tables and bucketing, using data from various regions.
Developed Sqoop scripts for having and interaction between HDFS, Hive and MySQL.
Project#3:
Title : Global Force Alliance Insurance
Role : Developer
Environment : Oracle ADF, Oracle10g, Jdeveloper11g
Description: Global Force Alliance is one of the reputed insurance companies in UK. It provides a
wide range of insurance policies such as Life Insurance, Fire Insurance, Motor Vehicle Insurance,
Accidental and healthcare Insurance etc. The main modules being contained in this project are
Production Processing, Production Definition, Customer Services, Customer Policy, Cashier, Billing,
and Underwriting. Production Processing: This is the entry point to start the business processes.
Insurance system automates the management of insurance activities:Branch Manager Details, Agent
Commission, Customer Policies Details, and Agents Details
Responsibilities
Interaction with client in gathering and understanding the requirements.
Created Database structure and extended the existing objects for customizations.
Designed and implemented ADF Business Components using EO,VO, AM, VL, Associations.
Developed User Interface using ADF Faces and ADF Task flows.
Developed Pages using ADF, JSF Components.
Application Testing, Debugging and Deployment.