• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Go Beyond 'Debug': Wire Tap your App for Knowledge with Hadoop
 

Go Beyond 'Debug': Wire Tap your App for Knowledge with Hadoop

on

  • 1,063 views

Today, application developers devote roughly 80% of their code to persisting roughly 20% of the total data flowing through the applications. That means two things: * 80% of the data flowing through ...

Today, application developers devote roughly 80% of their code to persisting roughly 20% of the total data flowing through the applications. That means two things: * 80% of the data flowing through our applications is at best lost in rolling log files, at worst never collected — without ever being analyzed or accounted for. * Application-level database programming, licensing, storage, administration, and ETL processing have maxed out IT budgets and have constrained app development teams from keeping pace with the rate of change in the business. The other 80% of the data is “Event Data” that can no longer be ignored if you want to stay competitive. Changes to application state are already stored as a sequence of events in application and middleware logs. In fact, since this data never held value to anyone but the developer in the past, a lot of potentially valuable information is often never collected. With Hadoop, we can: * store and query these events – Transaction tracing, * use the event log to reconstruct the application domain at any point in time – ETL, * use the same event log to construct new domains we haven`t planned for – ELT, and * automatically adjust our data domains to cope with retroactive changes – ??? In this talk, we will demonstrate how capturing all event data could dramatically simplify data collection and management within the enterprise.

Statistics

Views

Total Views
1,063
Views on SlideShare
1,053
Embed Views
10

Actions

Likes
2
Downloads
14
Comments
0

1 Embed 10

http://localhost 10

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Go Beyond 'Debug': Wire Tap your App for Knowledge with Hadoop Go Beyond 'Debug': Wire Tap your App for Knowledge with Hadoop Presentation Transcript

    • © Hortonworks Inc. 2012 Go beyond debug Wire Tap your App for knowlege with Hadoop Tom McCuch Solution Engineering @ Hortonworks Twitter: tmccuch Oleg Zhurakousky Principal Architect @ Hortonworks Twitter: z_oleg
    • © Hortonworks Inc. 2012© Hortonworks Inc. 2012 The Application Development Dilemma • Today, application developers devote roughly 80% of their code to persisting roughly 20% of the total data flowing through their applications –80% of the data flowing through our applications is at best lost in rolling log files, at worst never collected -- without ever being analyzed or accounted for –For the remaining 20% we do currently collect – application-level database programming, licensing, storage, administration, and ETL processing have maxed out IT operations budgets and have constrained app development teams from keeping pace with the rate of change in the business Page 2
    • © Hortonworks Inc. 2012© Hortonworks Inc. 2012 Example: Data Available During Ingest • Record count • Highest/Lowest record length • Average record length • Compression ratio But with a little more work. . . • Field parsing –Unique values –Unique values per field –Access to values of each field independently from the record –Relatively fast field-based searches, without indexing –Value encoding –Etc… These are cross-cutting concerns! Page 3
    • How do we address cross-cutting concerns without disturbing the existing process flow? Page 4
    • © Hortonworks Inc. 2012© Hortonworks Inc. 2012 Wire Tap Defined Page 5
    • © Hortonworks Inc. 2012© Hortonworks Inc. 2012 Wire Tap is an Enterprise Integration Pattern Page 6
    •  Transformer Convert payload or modify headers  Filter Discard messages based on boolean evaluation  Router Determine next channel based on content  Splitter Generate multiple messages from one  Aggregator Assemble a single message from multiple Other Enterprise Integration Patterns Page 7
    • © Hortonworks Inc. 2012 The Business Case
    • © Hortonworks Inc. 2013 6 Key Hadoop DATA TYPES 1. Sentiment Understand how your customers feel about your brand and products – right now 2. Clickstream Capture and analyze website visitors’ data trails and optimize your website 3. Sensor/Machine Discover patterns in data streaming automatically from remote sensors and machines 4. Geographic Analyze location-based data to manage operations where they occur 5. Server Logs Research logs to diagnose process failures and prevent security breaches 6. Text Understand patterns in text across millions of web pages, emails, and documents Page Value
    • © Hortonworks Inc. 2013 20 Apache Hadoop Enterprise Use Cases Page Vertical Use Case Data Type Financial Services New Account Risk Screens Text, Server Logs Fraud Prevention Server Logs Trading Risk Server Logs Maximize Deposit Spread Text, Server Logs Insurance Underwriting Geographic, Sensor, Text Accelerate Loan Processing Text Telecom Call Detail Records (CDRs) Machine, Geographic Infrastructure Investment Machine, Server Logs Next Product to Buy (NPTB) Clickstream Real-time Bandwidth Allocation Server Logs, Text, Sentiment New Product Development Machine, Geographic Retail 360° View of the Customer Clickstream, Text Analyze Brand Sentiment Sentiment Localized, Personalized Promotions Geographic Website Optimization Clickstream Optimal Store Layout Sensor Manufacturing Supply Chain and Logistics Sensor Assembly Line Quality Assurance Sensor Proactive Maintenance Machine Crowdsourced Quality Assurance Sentiment
    • © Hortonworks Inc. 2012 Fraud Prevention Business Problem • Financial institutions are always at risk of fraud • Fraudsters test bank systems for vulnerabilities • This testing leaves subtle patterns often undetected by bank employees or law enforcement • Fraud losses costs banks millions Solution • HDP reduces the cost to detect fraudulent activity • HDP stores more types of data for longer • Analysis of data in the “data lake” exposes fraudulent patterns that would have gone undetected Financial Services Data: Server Logs
    • 12 Credit Request Process Flow - Before Credit Request Processing • Credit Request arrives on a Gateway • Credit Request is sent over a Channel • Credit Request Processor • Receives Request • Processes the Request • Issues a Response
    • • Credit Scoring • Fraud Detection • Gathering Data Available during Credit Request Process Flow Cross-Cutting Concerns
    • © Hortonworks Inc. 2012 Demo
    • 15 Credit Request Processing Flow - After HDP
    • 16 Example: HTTP Header Collection
    • © Hortonworks Inc. 2012© Hortonworks Inc. 2012 Example: Data Available During Ingest • Record count • Highest/Lowest record length • Average record length • Compression ratio But with a little more work. . . • Field parsing - unstructured data is not all that unstructured… –Unique values –Unique values per field –Access to values of each field independently from the record –Relatively fast field-based searches, without indexing –Value encoding –Etc… These are cross-cutting concerns! Page 17
    • © Hortonworks Inc. 2012 Demo
    • © Hortonworks Inc. 2012 Thank You! Questions & Answers Follow: @tmccuch, @z_oleg, @hortonworks Page 19