Transform You Business with Big Data and Hortonworks


Published on

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Big Data is extremely critical in organizations just to keep up with the masses.In most retail organizations, internal data is very challenging to comprehend in understanding your customer as well as demand.Publications state that 1/3 of retailers are in the dark regarding data that could be available to them. The Silo approach within organizations is the primary cause of the broken data pipeline.The primary reasons as of why this is a hurdle are due to:*The lack of sharing data – definitely a major obstacle in measuring ROI*Misuse of available data in marketing communications – not able to personalize directly to your customer*Linking data at the customer level – this is needed to thoroughly understand user behavior*Infrequent data collection – only extracting from logs and online serving systems used within your traditional reporting ecosystem*Not enough customer data – not capturing the details of the customer (includes proper timings of viewed product, key indicators on why a user looks at one product versus another and so on)
  • Flight Cost Variant Determination Flight Cost is one of the algorithm methods being used to increase/decrease revenue based on page views, consumer marketing, and time spent on a particular one-way or round-trip flight by a consumer. The goal is to provide not only alternatives, but increase/decrease cost while other consumers are also viewing the same flights. This is determined by sales from all related airlines and competitors during the flight availability. This method can be extended to use other sources as well.Destinations:web applicationsmobile applicationshadooprdbmsIn the solution architecture shown, the in memory solution processes views, marketing, customer behavior, time, and competitor results to derive a increased or decreased price for a given one-way or round trip flight. This allows this travel company to determine the proper pricing based on these measures within an algorithm. The architecture shown also allows this travel company to try out other predictive models at any given point in time to see if one model out performs another. They could be utilizing similar measures and outcomes as well as new derived measures from their predictive models. Overall, this is a win for the travel company. Never losing revenue from the original ‘bread and butter’ model they always apply. Fascinating right?As you can see in the outgoing destinations, this provides consistent results in all platforms allowing a finite understanding of how the travel company is generating results overall. The solution can provide endless results based on predictive models that can be applied in real-time. Any day, any time, any millisecond.
  • Pactera offers a complete life cycle solutions within your organization. We offer a free 4 hour executive and technical workshop within your organization. We just ask for you to fill out a 1 page questionnaire to help us understand your expectations.The executive workshop entails strategy, planning, and your current and future goals.The technical workshop is a deep dive involving end to end management and a proper solution architecture based on your current and up and coming goals. Once the workshops is complete, we will provide you an assessment of the outcome.We also offer a 2-4 week proof of concept to ensure your project is put into action. And finally, we offer Full lifecycle in the following:Benchmark & MonitoringIntegrations & MigrationsImplementation & ArchitectureProject ManagementAnalyticsReporting
  • Transform You Business with Big Data and Hortonworks

    1. 1. CONSULTING SOLUTIONS OUTSOURCING PARTNER FOR A NEW ERA Transform Your Business with Big Data and Hortonworks Tom Kersnick – Pactera – Director Big Data Solutions Robby Richardson – Hortonworks – Enterprise Account Manager
    2. 2. Topics © Pactera. Confidential. All Rights Reserved. 2 Who is Hortonworks? 3 Hortonworks HDP: Enterprise Hadoop Distribution 4 5 Pactera Intro 6 Big Data Initiatives Hadoop 2.0: The Enterprise Generation 1 Hortonworks Intro 2
    3. 3. Hortonworks Snapshot • We distribute the only 100% Open Source Enterprise Hadoop Distribution: Hortonworks Data Platform • We engineer, test & certify HDP for enterprise usage • We employ the core architects, builders and operators of Apache Hadoop • We drive innovation within Apache Software Foundation projects • We are uniquely positioned to deliver the highest quality of Hadoop support • We enable the ecosystem to work better with Hadoop Develop Distribute Support We develop, distribute and support the ONLY 100% open source Enterprise Hadoop distribution Endorsed by Strategic Partners Headquarters: Palo Alto, CA Employees: 200+ and growing Investors: Benchmark, Index, Yahoo 3© Pactera. Confidential. All Rights Reserved. 3
    4. 4. Rapid Customer Growth 4© Pactera. Confidential. All Rights Reserved. 4
    5. 5. Hortonworks HDP: Enterprise Hadoop 1.x Distribution © Pactera. Confidential. All Rights Reserved. OS Cloud VM Appliance PLATFORM SERVICES HADOOP CORE Enterprise Readiness High Availability, Disaster Recovery, Security and Snapshots HORTONWORKS DATA PLATFORM (HDP) OPERATIONAL SERVICES DATA SERVICES HIVE (HCATALOG) PIG HBASE OOZIE AMBARI HDFS MAP REDUCE Hortonworks Data Platform (HDP) Enterprise Hadoop • The ONLY 100% open source and complete distribution • Enterprise grade, proven and tested at scale • Ecosystem endorsed to ensure interoperability SQOOP FLUME NFS LOAD & EXTRACT WebHDFS 5
    6. 6. Hadoop 2.0… The Enterprise Generation © Pactera. Confidential. All Rights Reserved. Business Value Big Data Transactions, Interactions, Observations Single Platform Multiple Use BATCH INTERACTIVE ONLINE 1.0 Architected for the Large Web Properties 2.0 Architected for the Broad Enterprise Enterprise Requirements Hadoop 2.0 Features Mixed workloads YARN Interactive Query Hive on Tez Reliability Full Stack HA Point in time Recovery Snapshots Multi Data Center Disaster Recovery ZERO downtime Rolling Upgrades Security Knox Gateway 6
    7. 7. HDP: Enterprise Hadoop 2.0 Distribution © Pactera. Confidential. All Rights Reserved. OS/VM Cloud Appliance PLATFORM SERVICES HADOOP CORE Enterprise Readiness High Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots HORTONWORKS DATA PLATFORM (HDP) OPERATIONAL SERVICES DATA SERVICES HIVE & HCATALOG PIG HBASE HDFS MAP Hortonworks Data Platform (HDP) Enterprise Hadoop • The ONLY 100% open source and complete distribution • Enterprise grade, proven and tested at scale • Ecosystem endorsed to ensure interoperability SQOOP FLUME NFS LOAD & EXTRACT WebHDFS KNOX* OOZIE AMBARI FALCON* YARN* TEZ* OTHERREDUCE 7
    8. 8. Seamless Interoperability with Microsoft Tools © Pactera. Confidential. All Rights Reserved. • Integrated with Microsoft tools for native big data analysis » Bi-directional connectors for SQL Server and SQL Azure through SQOOP » Excel ODBC integration through Hive • Addressing demand for Hadoop on Windows » Ideal for Windows customers with Hadoop operational experience • Enables most common Hadoop workloads in the Enterprise » Data refinement and ETL offload for high-volume data landing » Data exploration for discovery of new business opportunities » Data enrichment for fined tuned delivery and recommendation engines APPLICATIONSDATASYSTEMS Microsoft Applications HORTONWORKS DATA PLATFORM For Windows DATASOURCES MOBILE DATA OLTP, PO S SYSTEMS Traditional Sources (RDBMS, OLTP, OLAP) New Sources (web logs, email, sensor data, social media) 8
    9. 9. Transferring Our Hadoop Expertise to You © Pactera. Confidential. All Rights Reserved. The expert source for Apache Hadoop training & certification • World class training programs designed to help you learn fast • Role-based hands on classes with 50% lab time • Certification to demonstrate Hadoop Expertise in Development and Administration • Expert consulting services • Programs designed to transfer knowledge • Industry leading Hadoop Sandbox • Free download • Fastest way to learn Apache Hadoop • Personal, portable Hadoop environment 9
    10. 10. Hortonworks Summary © Pactera. Confidential. All Rights Reserved. • Leading the Innovation in Core Hadoop • Addressing the requirements for Enterprise usage • Enabling interoperability of the ecosystem • No lock-in. 100% Open Source. • Best in industry support with flexible pricing model • Find out » » 10
    11. 11. Big Data is Critical © Pactera. Confidential. All Rights Reserved. Challenges to Using Big Data Given that nearly less than one-third of businesses are in the dark about their available data, it makes sense that silos are the primary hurdle in using this information. Lack of sharing data is an obstacle to measuring marketing ROI Not using data effectively to personalize marketing communications Not able to link data together at the individual customer level Data collected infrequently or not quickly enough Too little or no customer/ consumer data 51% 45% 42% 39% 29% 11
    12. 12. What Initiatives Are Using Big Data © Pactera. Confidential. All Rights Reserved. 12
    13. 13. Keys to a Successful Big Data Initiative © Pactera. Confidential. All Rights Reserved. Define the Impact • Short term VS. Long term measures What cannot be answered today? • This is your starting point Create User Centric Internal | External Applications • Decision support framework Predicting the Consumer • Algorithms, Models, Testing, and More Testing! 13
    14. 14. Obstacles to Define Big Data ROI © Pactera. Confidential. All Rights Reserved. Not enough skilled resources for adaptation • Advance competencies Traditional IT Architectures cause limitations • Identifying the right technologies • Adapting to particular needs • Assemble business use cases • Silos Optimizing Solutions • Strong internal use cases • Inability to effectively automate data 14
    15. 15. Solution Architecture using Multiple Ecosystems © Pactera. Confidential. All Rights Reserved. incoming outgoing Real Time In-Memory Solution EDW Hadoop Sand box 2 3 4 7 8 9 6 5 Models Algorithms Simulations 1. Data Feeds into a Real-Time Memory solution that will ingest data into EDW, Hadoop, and other platforms as mobile, API’s, etc. 2. ELT streaming into In-Memory Solution to provide visibility to Real-Time Social, Mobile, and Shell approaches to Algorithms, Models, and Simulations 3. In-Memory Real-Time Solution such as YARN or Storm to digest data to EDW, Hadoop, Social Media, and other such platforms. 4. EDW for Structured Information from Sources in 1. 5. Hadoop for semi-structured and unstructured data. Solution architecture including Sand Box availability. 6. Shell UI Interfaces utilizing data from Real-Time in memory solution as well as EDW, Hadoop, etc. for Models, Algorithms, and Simulations. 7. Structured and Unstructured Reporting in reporting interfaces 8. Deep Dive analytics in Hadoop and Real-time Streaming 9. Real-Time customer interaction for Social and other similar platforms. 1 15
    16. 16. Predictive Analysis Use Case for Online Travel Company 16© Pactera. Confidential. All Rights Reserved.
    17. 17. Flight Cost by Variants Determination Data Feeds utilize real-time in-memory streaming to execute matching algorithms. Used in order to determine views within a session of certain one-way and round trip flights viewed by users. Predictive Analytics algorithms determine how to increase/decrease prices based on views, market pricing, time, and availability. © Pactera. Confidential. All Rights Reserved. http logs partners custom incoming outgoing destinations rdbms hadoop application mobile Real Time In-Memory Solution (Storm) 17
    18. 18. Solution Architecture using YARN © Pactera. Confidential. All Rights Reserved. • Created to manage resource needs across all uses • Ensures predictable performance & QoS for all apps • Enables apps to run “IN” Hadoop rather than “ON” » Key to leveraging all other common services of the Hadoop platform: security, data lifecycle management, etc. Applications Run Natively IN Hadoop HDFS2 (Redundant, Reliable Storage) YARN (Cluster Resource Management) BATCH (MapReduce) INTERACTIVE (Tez) STREAMING (Storm, S4,…) GRAPH (Giraph) IN-MEMORY (Spark) HPC MPI (OpenMPI) ONLINE (HBase) OTHER (Search) (Weave…) 18
    19. 19. Pactera Big Data Capability © Pactera. Confidential. All Rights Reserved. Big Data Solution Architecture  In-Memory Solutions  Scalable Distributed Platforms Next Generation Analytics  Models, Algorithms, and Simulations  Visualization Improving Operational Ability  Help companies drive more operational efficiencies from existing investments.  Moving from the realm of data scientists into everyday business transactions and encounters. New Business Processes  Impact on both customer intelligence and operational efficiency by making everything immediately actionable.  Armed with immediate decision-making capability and intelligence, companies will be able to implement new business processes that will change how business is done.  We ask the Right Questions 19
    20. 20. How Pactera can help with Big Data Implementation and Architecture Benchmark and Monitoring Implementation and Architecture POC (2-4 Weeks) © Pactera. Confidential. All Rights Reserved. Executive Workshop Strategies, Planning, and Expectations • Big Data strategy on what tomorrow will look like • Using Big Data to establish market dominance • Big Data project takeaways • Roadblocks to implementing Big Data analytics • Defining an ROI for Big Data • Getting the right ROI on Big Data Workshop (4 Hours) Proof of Concept (2-4 Weeks) Projects: •Benchmark & Monitoring •Integrations & Migrations •Implementation & Architecture •Project Management •Analytics •Reporting Technical Workshop End-To-End Management • System tuning/auto-tuning and configuration management • Dealing with both structured and unstructured data • Monitoring, diagnosis, and automated behavior detection Solution Architecture • Processor, memory, and system architectures for data analysis • Benchmarks, metrics, and workload characterization for big data • Availability, fault tolerance and recovery issues • Data management and analytics for vast amounts of unstructured data 20
    21. 21. © Pactera. Confidential. All Rights Reserved. Thank You Tom Kersnick Robby Richardson