20130117 - Big Data Architectures

1,139 views
1,049 views

Published on

Presented at the Northeast briefing "Big Data Made Real", 17 January 2013, at Microsoft, Cambridge MA

Published in: Technology
1 Comment
2 Likes
Statistics
Notes
No Downloads
Views
Total views
1,139
On SlideShare
0
From Embeds
0
Number of Embeds
20
Actions
Shares
0
Downloads
92
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide

20130117 - Big Data Architectures

  1. 1. What is Big Data?A new generation of technologies and architecturesdesigned to economically extract value from very largevolumes of a wide variety of data, by enabling highvelocity capture, discovery and/or analysis
  2. 2. VELOCITY VARIETY VOLUME + VISUALIZATION VALUEBig Data’s impact can be expressed by The Five V’s
  3. 3.  E-Commerce Site fed by outsourced Ad Servers Ads appear on a wide range of sites with various offers Massive amount of data is generated by these servers: • Web logs and click stream data from the E-Commerce Site • Ad logs and click stream data from the Ad Servers • Results in relational transactions on the site Goal: Maximize Traffic Analysis for Business Value • Velocity Demo: Pinpoint activity in real-time & react • Variety Demo: Examine historical trends across sources • Visualization Demo: Enable ad-hoc data analysis for insightsDemo Context
  4. 4. WEB SERVERS How to identify when Ad clicks results in Site Traffic?  High volume stream of log activity coming in: • Web logs and Ad Server logs  Real-time stream analysis allows for pinpointing data when it happens LOG FILES  Simultaneously join structured and unstructured data in a persistent query  Can be used for A/B testing, Offer improvement, Site Dynamic behavior, or Fraud Detection AD SERVERSVelocity Architecture
  5. 5. DEMO: StreamInsight
  6. 6. WEB SERVERS How to do historical analysis on unstructured data? M/R LOG FILES  Ad Servers and Web Servers generate different log files with different formats making them hard to analyze  Map/Reduce processing allows for us to execute a query across variant data formats stored in Hadoop  Hive provides a traditional query interface to Map/Reduce  Correlate and connect high variety data for trend analysis AD SERVERSVariety Architecture
  7. 7. Access Azure blob storage via a Hive “view” and aggregate session data CREATE EXTERNAL TABLE logs ( date1 STRING, time1 STRING, action STRING, page_uri STRING, cookie STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY STORED AS TEXTFILE LOCATION asv://logs/logs/; CREATE TABLE log_summary AS SELECT l.cookie ,MAX(regexp_replace(cookie, [-], ) % 36) AS geo_hash ,MAX(l.time1) AS time1 ,l.page_uri ,MAX(CASE LOWER(action) WHEN click THEN concat(l.date1, , l.time1) ELSE NULL END) AS click_time ,MIN(CASE LOWER(action) WHEN view THEN concat(l.date1, , l.time1) ELSE NULL END) AS view_time ,MAX(l.date1) AS date1 FROM logs l GROUP BY l.cookie, l.page_uri;Hive HQL Queries
  8. 8. DEMO: Azure HDInsight
  9. 9. Hadoop is an open source framework for building large scale, distributed, data- intensive applications • Hadoop is HDFS, the kernel & M/R • MapReduce brings the code to the data • Open set of tools exist to extend its functional uses and representationsHadoop Ecosystem Overview
  10. 10. The "Map" step The "Reduce" step The mappers are responsible for reading the input data and Each reducer executes a function on all values for a given emitting key/value pairs. The input file can be CSV, XML, or any key. The framework ensures that all values for the same format as long as it can be converted into k/v pairs. key are sent to the same reducer.Map/Reduce Distributes Processing of Operations
  11. 11. WEB SERVERS How to do ad-hoc data discovery and visualizations? M/R LOG FILES  Ad Servers and Web Servers generate different log files with different formats making them hard to analyze  Map/Reduce processing allows for us to execute a query across variant data formats stored in Hadoop  Hive provides a traditional query interface to Map/Reduce  Correlate and connect high variety data for trend analysis AD SERVERSVisualization Architecture
  12. 12. DEMO: Excel & Hive Adapter
  13. 13.  Big Data & Analytics Projects are often Additive • New Capabilities layered on top of existing data & apps • Analytics can drive Applications in new ways Visualizations put Big Data in the hands of the BusinessSummary
  14. 14. We are BlueMetal Architects
  15. 15. Take the next steps – Imagine, Define, Build
  16. 16.  Envisioning & Strategy Briefing: Big Data, Analytics & Collaboration Envisioning Session: Data is the App – Envisioning the Next Generation, Data Driven Enterprise Architecture Design Session: Big Data & Analytics Healthcare / Life Sciences: Strategy Briefing or Architecture Design Session – Big Data Architecture, Cloud & Use Case Driven Analytics and applications, Portal, M-Health and UX design for Providers, Patients, Pharma & Biotechnology Financial Services: Strategy Briefing or Architecture Design Session – Big Data & Analytics for Banking, Capital Markets, Retail Brokerage or InsuranceTake the next steps - our offerings
  17. 17. Thank You
  18. 18. DESIGN Differentiation UX DATA SOCIAL Specialization CODE FoundationWho We Are
  19. 19. DESIGN Differentiation Strategy Analysis Creative UX DATA SOCIAL Desktop Analytics Web Content Specialization Mobile Big Data Intranets Web Client Core SQL Collaboration .NET SERVICES On-Premise Foundation Java PPP CloudWho We Are

×