[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
Upcoming SlideShare
Loading in...5
×
 

[Webinar] Getting to Insights Faster: A Framework for Agile Big Data

on

  • 828 views

In this slidedeck, Infochimps Director of Product, Tim Gasper, discusses how Infochimps tackles business problems for customers by deploying a comprehensive Big Data infrastructure in days; sometimes ...

In this slidedeck, Infochimps Director of Product, Tim Gasper, discusses how Infochimps tackles business problems for customers by deploying a comprehensive Big Data infrastructure in days; sometimes in just hours. Tim unlocks how Infochimps is now taking that same aggressive approach to deliver faster time to value by helping customers develop analytic applications with impeccable speed.

Statistics

Views

Total Views
828
Views on SlideShare
828
Embed Views
0

Actions

Likes
1
Downloads
67
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • quick stories about the transformative effect big data can have on a business... or the world​an app is a use case! big data is not a toy. exploration is great... but to what end? focus leads to value faster.​diagram of all the different use cases and industries that big data affectsTHIS WAS A LITTLE LONG, KEEP IT SHORT AND SWEET
  • quick stories about the transformative effect big data can have on a business... or the world​an app is a use case! big data is not a toy. exploration is great... but to what end? focus leads to value faster.​diagram of all the different use cases and industries that big data affects
  • where are you in terms of adoption of big data applications?already applications in productionapps currently under developmentplanning and evaluation phaseresearching / early explorationI don’t know / No current plans
  • where are you in terms of adoption of big data applications?already applications in productionapps currently under developmentplanning and evaluation phaseresearching / early explorationI don’t know / No current plans
  • our use of the terms analytics and analysis is extremely broad. i would consider it both simple statistics as well as more advanced modeling. when i want to call out modeling, i usually use the term "modeling" specifically ... or i will use the phrase "advanced analytics" to differentiate it from simpler analytics. the phrase "analytic application" is essentially meant to mean data-oriented, use case -driven applications.
  • Have you identified your first big data application use case (or next one)?YesNoI don’t know
  • quick stories about the transformative effect big data can have on a business... or the world​an app is a use case! big data is not a toy. exploration is great... but to what end? focus leads to value faster.​diagram of all the different use cases and industries that big data affects
  • re-emphasize iterative design here… it’s an organization change and a technology changeone of Jims diagrams that has traditional data analysis application cycles, including the long time spent upfront doing data modeling and ETL transformation​build the diagram from one step to the next via animations​this is problematic for three reasons:​time to value is slower​takes longer to determine first checkpoint of success or failure of the project​difficult to iterate​
  • re-emphasize iterative design here… it’s an organization change and a technology changeone of Jims diagrams that has traditional data analysis application cycles, including the long time spent upfront doing data modeling and ETL transformation​build the diagram from one step to the next via animations​this is problematic for three reasons:​time to value is slower​takes longer to determine first checkpoint of success or failure of the project​difficult to iterate​
  • diagram of the four customers and how fast they developed apps and how few developers it took to create themDON’T DWELL HERE, DON’T TALK TO EVERY SINGLE USE CASE… STAY HIGH LEVEL
  • what does HGST do? saying they are a part of western digital isn’t enough.
  • Poll 3: What is your biggest challenge to realizing the value of Big Data applications? Talent gap/experience Cost of capital investment Big Data technology risk Failed prior projects Other/N/A
  • java compilingetc versus scripting approaches… we really like using scripting tools
  • wukong and ironfan are both open source, and we’ve contributed them back!- - - -similar to the slide that shows how Wukong is the DSL for big data app dev, and Ironfan is the DSL for big data infrastructure dev, except incorporate the broader picture of Tachyon the orchestrator and the Deploy Pack application code vesselSo now let’s drill in and look at how we actually deliver a solutionThe ProblemThere are two complementary ways to process Big Data: batch processing and real-time (or stream) processing. These are traditionally viewed as very different approaches to solving problems, especially in a Big Data context, where the toolsets for each kind of processing differ greatly. Typically, for cross-platform there are several issues that slow down analytic development:You Need to Run the Whole Thing – that means that the entire infrastructure has to be running in order to test small changes.You Will Wait 10 Minutes Every Time You Make a Mistake – Compiling Jar files, transferring code, launching jobs, and finding log files is time consuming.You will Disrupt Production Traffic – If you are doing any testing at scaleHadoop Does Not Understand Storm and Storm Does Not Understand Hadoop– Same language, different paradigms, different base classes.The SolutionWukong is a Domain Specific Language (DSL) designed specifically for data analytics, processing, and flow. It abstracts the platform that the analytics are running on (like Hadoop, Storm, or your local command line) and allows you to focus on writing analytics.A simple wukong script could easily be written in a few lines in a plain text file on your hard drive. It can then be run as a simple command line application, or used as a large Hadoop job or as part of a real-time Storm topology. The same analytics can be leveraged over and over again across your enterprise.Wukong enables its users to:Write and test code locally – from the command lineAvoid Disrupting Others – your deploy packDebug Rapidly – see results in real-timeSeamlessly move between contexts – like real-time (with Storm) and batch (with Hadoop)This allows for every rapid iteration of analytics, and allows your data scientists to be as agile as your business demands.QuestionsIf you develop real-time analytics, how would you run those against historical data?Does every developer in your organization have their own Hadoop cluster? Referenceshttp://www.infochimps.com/infochimps-cloud/tools/wukong/https://github.com/infochimps-labs/wukong/tree/3.0.0/
  • need to make shape be all the way up through project planplatform rollout has a lot more too it including QA/testing, analytics development, production rollout of first application, training, acceptance/success testingproduction should have infrastructure support, application/analytics support, SLA, managed services (training and acceptance testing could be part of the bottom part)
  • probably the diagram that shows the loop from local to cloud... except updated and made more powerful... maybe have the animations build as well
  • Call on the audience to figure out their first application and begin the path toward success by following the framework in this webinar​If big data projects are already underway, are you finding business value? do you feel like you are iterating through use cases? are your personnel utilizing their existing talents and strengths?
  • I invite you to let us know what your use case is, and we can help you evaluate which tools and architecture is appropriate to solve it. Now we are open to questions!

[Webinar] Getting to Insights Faster: A Framework for Agile Big Data [Webinar] Getting to Insights Faster: A Framework for Agile Big Data Presentation Transcript

  • Getting to Insights Faster: A Framework for Agile Big Data @TimGasper Director of Product Infochimps, a CSC Big Data Business
  • Agenda (1) IT’S ALL ABOUT THE APP (2) WHAT IS A BIG DATA APP (3) TRADITIONAL VS AGILE APPROACH (4) ENABLERS OF AGILE BIG DATA (5) DEMONSTRATION
  • What problem are you trying to solve?
  • It’s all about the apps.
  • Poll Question 1
  • What is a Big Data app? ? + Critical Business Problems = Impactful Analytic Applications
  • Smart Meter Monitoring for Customer Value Add Predictive Inventory Levels to Minimize Warehousing Costs Personalized Medicine Treatment Programs Trade Options and Futures Pricing Platform Source: PARC Customer Churn Analysis for Increased Customer Lifetime Value
  • Poll Question 2
  • It’s all about the apps.
  • Source: Tableau
  • Predictive Manufacturing + Smart Manufacturing & Energy Ad Publisher Campaign Analytics 360 Customer Experience Management Social Media Monitoring & Analytics
  • The Traditional Way Business Discovery Info Discovery Logical Data Model Physical Data Model System Staging Data Ingestion, Transformation, ETL Application Development Analytics Data Warehouse Project 12-24 Months to Reach Production Production Staging
  • Big Data: A New Hope Business Discovery Info Discovery Logical Data Model Physical Data Model System Staging Data Ingestion, Transformation, ETL Application Development Production Staging Analytics Data Warehouse Project 12-24 Months to Reach Production App Dev Business Discovery Info Discovery Sys. Stag. Initial Data Ingest Analytics Schema on Read App Dev Prod. Stag. App Dev App Dev App Dev Analytics Analytics Analytics Analytics Schema on Read Schema on Read Schema on Read Schema on Read Big Data Project 3-6 Months to Reach Production
  • Application Development Timelines 6 2 Developers Months 5 2 Developers Months 3 1 Developer Months 4 2 Developers Months
  • Speed to Value: A Case Study HGST, a Western Digital company, is improving customer support and product quality by collecting, analyzing, and acting on massive quantities of machine and sensor data.  Greatly diminished operational burden with ability to focus on analysis and driving business action  Fast project delivery and success  Expertise with Big Data technologies like Hadoop KEY STATS Industry Storage Technology Solution Machine Data Analysis Engine Channel B2B Cloud Services Cloud::Queries Cloud::Hadoop Users Application Developers, Data Scientists, Analysts Deployment Amazon Web Services
  • Poll Question 3
  • Enablers of Agile Big Data 1. ​Managed infrastructure means focusing on Big Data apps 2. The community tech itself and what it enables 3. ​Our customer engagement framework for choosing use cases that have impact and designing successful solutions 1. ​Agile, iterative analytics app dev lifecycle 1. ​Our application reference design framework for kick starting application development
  • A Managed Platform
  • Technologies Under the Hood PART 1 HADOOP ​• Java ​MapReduce ​• Streaming MapReduce ​• SQL on Hadoop, Pig, Hive ​NOSQL DATABASES ​• ​ HBase/Accumulo ​• ​ Elasticsearch ​• ​ Cassandra, MongoDB ​STREAM PROCESSING, MESSAGE QUEUES ​• Storm ​• Kafka
  • Technologies Under the Hood PART 2 HADOOP INTERFACES ​• Hue ​• Command Line ​STATISTICAL TOOLS • R, SAS, SPSS ​BUSINESS INTELLIGENCE AND DATA VIZ • Legacy: Cognos, Biz Objects, OBIEE, Microsoft BI • New Gen: Tableau, Qlikview, SiSense, Kibana
  • Our Unique Toolset Addition SaaS Develop & Test Locally with App/Analytics Scripting & “Deploy Pack” Orchestration PaaS Real-time Analytics With Cloud::Streams Interactive Analytics With Cloud::Queries Batch Analytics With Cloud::Hadoop Abstract to any cloud with Orchestration DSL IaaS Public Cloud Virtual Private Cloud Private Cloud
  • Customer Engagement Framework Service Requirements Week 1-2 Discovery Design & Build Week 3-4 Technical Design Production Ongoing Iterative App Development Week 5-8+ Platform Rollout Build Data Flows Interview Key Business Stakeholders Define Business Benefits Design Data Flows Interview Key Technical Stakeholders Define Target Use Case Define Architecture Define Objectives & Challenges Develop HighLevel Approach & Costs Identify Data Sources Agree to Project Plan/Rollout Real-Time Data Flow Architecture Validation Standup / Connect Environment Tuning Solution Historical Data MAJOR ACTIVITIES • Run 2-4 hour Design Thinking Workshop • Review current state metrics • Review business pain points & opportunities • Review application & infrastructure environment • Define target use case • Identify data sources for target use case • Develop high level tech approach and costs • Define high level benefits • Develop initial case for action • Develop go forward plan • Develop Data Model • Technical architecture & integration design • Stand up environment • Dashboard design workshops • Data mapping • Build prototype dashboard • Configure prototype application • Data load • Run solution iterations • Analytical modeling
  • Agile Iteration for App Dev ::
  • App Reference Design Framework • A use-case-driven reference design • A code repository with: o o o o Domain-specific sample data sets/sources Sample data flows Sample data processors/analytics Simple data visualization
  • App Reference Designs Predictive Manufacturing + Smart Manufacturing & Energy Ad Publisher Campaign Analytics 360 Customer Experience Management Social Media Monitoring & Analytics
  • Social Media App Reference Design
  • Demonstration
  • Big Data Benefits ENABLED BY • ​Unstructured data and semi-structured data allow for faster path to data integration • ​Real-time analysis and batch analysis with scripting tools • ​Schema on read for app-driven data models and data structures • ​Local to cloud, small data to big data… tools can talk to each other​ New Use Cases New Analytics and Analytical Techniques More Data Time to Value Faster Iteration Faster Data Increased Flexibility
  • What is Your First Big Data App?
  • Learn More » sales@infochimps.com 1-855-328-2386 Request a Demo: http://infochimps.com/demo Q&A