• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Big Data Easy BI
 

Big Data Easy BI

on

  • 3,091 views

Currently, the Yahoo EC Taiwan team provides business performance matrix to users by acquiring data from the Web production and Back office ERP systems. The reporting system is built using traditional ...

Currently, the Yahoo EC Taiwan team provides business performance matrix to users by acquiring data from the Web production and Back office ERP systems. The reporting system is built using traditional BI technologies such as RDBMS, ETL tools, OLAP tools, home-made reporting tools, store procedures, web pages,?. With increasing usage growth of the user browsing data in the business decision on daily basis, The ability to provided data analytics on these Big Data is getting more and more important and needed. The traditional RDBMs have reaching its limit in process big data while connecting to OLAP tool. We started with the feasibility of connecting MicroStrategy with Hive 0.9 and created a prototype system to test in two scenarios – ad-hoc query to Hive and performance test of the predefined MicroStrategy Intelligent Cube for ad-hoc analytics. We did the performance test on Ad-hoc query via HiveQL and query from MicroStrategy cube, and will share the result in the session. Based on our test results, we will be able to provide the following applications to different types of users. A) Ad-hoc query running against Hadoop can allow well trained data analyst or power users to have deeper analysis on data within Hadoop. B) OLAP reports running against MicroStrategy Intelligent Cube can provide quicker response time on ad-hoc analytics with predefined data in Cube.

Statistics

Views

Total Views
3,091
Views on SlideShare
3,088
Embed Views
3

Actions

Likes
8
Downloads
0
Comments
0

1 Embed 3

http://writem.com 3

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Big Data Easy BI Big Data Easy BI Presentation Transcript

    • BIG DATA, EASY BI Data Team, eCommerce Engineering, Yahoo! Taiwan
    • Agenda §  Background Introduction §  Solution §  Demonstration 7/11/132
    • Tim Hsu •  Senior Data Engineer at Yahoo! •  Data modeling, BI application design •  Wants to provide an integrated, easy to use BI system to Yahoo! EC users 7/11/133
    • Neal Lee •  Data Engineer at Yahoo! •  Aims to build up an easy to use self-service BI platform connecting to Hadoop 7/11/134
    • Background Introduction
    • §  APAC is the best region where Yahoo! runs EC business §  Major EC properties ›  2001 Auction ›  2004 Shopping Mall ›  2008 Store Market §  Yahoo! is the leading eCommerce company in Taiwan Who Are We? 7/11/136 In MM USD - 1,000 2,000 3,000 4,000 5,000 EHS National 3C Chains Fubon momo TV shopping FarEastern Dept store TK 3C PC Home SOGO Dept Store Y!EC RT Mart(hyper mart) FamiMart PxMart (hyper mart) Carrefour ShinKwan Mitsukoshi 7-Eleven 2011 Taiwan Retail Revenue
    • Types of End Users in Yahoo! EC Taiwan 7/11/137 GM, BU Heads Business Analysts Marketers, Data Analysts Category Managers Suppliers, Sellers
    • BI Needs for Different Types of Users 7/11/138 SophisticatedSummarized Business Lowanalytics Technical Highanalytics Data Scale User Types & Analytical Needs GM, BU Business Analysts Marketers, Data Analysts Category Managers Suppliers, Sellers
    • Ad hoc Reports Challenges 7/11/139 ERP Transactions Web Logs Browsing Purchase DW/DM Performance Reports Management Reports Traffic Reports PHP, ASP.NET MicroStrategy Hyperion SQL, Store Procedure, Pig, HiveQL PHP, Web Services API
    • One  unified  data  pla.orm  for  retrieving   informa5on  in  an  easy  and  efficient  way.   7/11/1310 Yahoo! EC Taiwan Needs …
    • Solution
    • Where We Are Going … 7/11/1312 Business Intelligence Application Business Intelligence Platform Data Storage Data Process Data Source
    • Auction Shopping Store Architecture 7/11/1313 Instrumentation Instrumentation Instrumentation Auction Backend Shopping ERP Store ERP E T L Oracle RAC Listing Member Revenue Seller Sales Supplier F E T L Yahoo! Grid Page View Click Event Session ETL Beacon Servers Data Highway Users Hive MicroStrategy SQL Engine
    • Unified BI Platform 7/11/1314 Oracle RAC In Memory Caches
    • Performance Test §  Use case: Visitor distribution by demographic and device preference §  Source Data: 293TB web logs in 60 days §  Transformed Cube : 2.3 GB, 60.5M rows §  Test environment ›  MicroStrategy Server: 8 Cores 2.5G, 16G RAM, v9.2.1 ›  Hive Server: 4 Cores 2.5G, 4G RAM, v0.9 ›  Hadoop clusters: 300+ nodes, v0.23 7/11/1315
    • Case C1: Cross tab with date slice Case C2: Dynamic prompt on date Case C3: Dynamic data grouping (Browser) Case C4: 80/20 Analysis Case C5: Data grouping & charting Test Cases 7/11/1316 Case C1: Cross tab with date slice Case C2: Dynamic prompt on date Case C3: Dynamic data grouping (Browser) Case C4: 80/20 Analysis Case C5: Data grouping & charting
    • 7/11/1317 10 CU. 25 CU. 50 CU. 100 CU. 20 Days 1.8 3.5 6.1 11.9 40 Days 3.1 6.8 12.1 24.5 60 Days 4.7 9.6 19.2 36.1 0 5 10 15 20 25 30 35 40 Avg.ResponseTime(sec) Concurrent Users Avg. Resp. Time by Concurrent Users 20 Days 40 Days 60 Days 20 Days 40 Days 60 Days 10 CU. 1.8 3.1 4.7 25 CU. 3.5 6.8 9.6 50 CU. 6.1 12.1 19.2 100 CU. 11.9 24.5 36.1 0 5 10 15 20 25 30 35 40 Avg.ResponseTime(sec) Data Volume in Cube Avg. Resp. Time by Data Volume 10 CU. 25 CU. 50 CU. 100 CU. Test Result
    • Demonstration
    • Use Cases in Demonstration §  Dynamic OLAP analysis using in memory cubes §  Direct access to Hadoop through Hive §  Self-service Business Intelligence 7/11/1319
    • 7/11/1320
    • 7/11/1321