The Birth of eBay . . . Initial Business Model and Target Users . . . Build equitable electronic marketplace for Americans to buy and sell their stuff
eBay Facts 450+ Million Registered Users Over 2 Billion Photos 220+ Million Active Item Listing for sale 50,000 Categories 2 Petabytes Stored 25 Petabytes Processed daily 300+ Features per quarter 100,000 lines of code rolled out every 2 weeks 48 Billion SQL Calls Per day 5.5 Billion API Calls Per month > 4.4 GB Source Code - 16 Years After . . . Global Presents In 33 International Markets 10+ Million New Items Added Per Day $2,000+ USD Trading Value Per Second
Analytical Data Platforms Singularity EDW Low End Enterprise-class System Discover & Explore Analyze & Report 20-50 concurrent users 500+ concurrent users Enterprise-class System >5 concurrent users Structure the Unstructured Detect Patterns Hadoop Developer System EDW/ODW (Primary& Secondary) “ Compare User Activity against last year” Trending and Forecast Analysis (large history) Operational Analytics Transactional Analytics High volume ad hoc queries Contextual-Complex Analytics Deep, Seasonal, Consumable Data Sets Production Data Warehousing Large Concurrent User-base Image Fingerprinting Image Classification Pattern Recognition Detect Counterfeits & SNADs
Closed loop, active Data Warehouse Site Databases Analytical Reporting Enterprise DW Raw data: daily, hourly feeds Knowledge: Integrated, aggregated, augmented www.ebay.com Trust & Safety Customer Support
Real time creative
Wisdom: informed, fact based actions Marketing
APD– Resource Distribution Chennai, India Cognizant Technology Services (on shore / off shore model) Shanghai, CN DW Core Team, APD Ops anchor point for China based outsourcers (HP, DX). Core competencies DW Development, Business System Analysis, Quality Assurance, Architecture, Project Management Office and Production Support. Seattle, WA DW Core Team & anchor point for India based outsourcing. Core competencies in VLDB and highly efficient / scaleable arch (Next Gen). San Jose, CA BU Dedicated Teams (IMS, DMS, MRM, UBI), DW Core, and Arch & Ops. Core competencies in rapid development, VLDB, MPP, business analysis, DW Dev.
DBQL and Table Usage Info are Teradata Dictionary Tables
DBQL: Contains each query details, such as runtime, CPU cost, queryband ect.
Table Usage Info: What table(s) is been used by the query
Data Flow Analysis Engine analyze the raw data of DBQL and Table Usage Info, get dependency metadata about table(s)
On batch script (job)level, what table(s) is output table of the script(job)
What table(s) is input table of script(job)
DFD MetaData contains the result of Analysis Engine, including
DFD dependency meta data of each table, with the meta data, we could draw DFD for any table via the tool Graphviz.
Each script(job) is a node of the diagram
The dependency between script(job) setup the mapping between nodes.
DFD Repository is the collection of DFDs, we organize and display online Data Flow Analysis Engine DFD Meta Data DFD Repository
How to Read DFD? Step2: the step number is ordered by the job start time Job Start/End Time(HH:MM:SS) The script(job) name to populate the table in the step The output table of step1, also, it is the input table of step2 Round Corner Rectangle: The upstream tables from other subject area Blue line: Stands for the process critical path Set Background as gray to highlight the target table of the diagram
ETL JOB RUNTIME INFO from all ETL SERVER UC4 TABLE USAGE MASTER DATA FLOW JOBTRACK REPOSITORY TERADATA QUERYLOG from TD1/TD2/TD3/TD5 TABLE DEPENDECY QUERY PATTERN QUERY USER BEHAVIOR USER QUERY/BATCTH JOB ENHANCEMENT MDR TABLE USAGE INRO ETL JOB STATUS JOB TRACK REPOSITORY DATA SOURCE Applications … JOBTRACK OVERVIEW
JOBTRACK FEATURES AUTOMATION for Any Table + Any ETL JOB REALTIME + HISTORY + FORECAST ALL INFORMATION IN ONE PAGE NOT ONLY Dataflow, you can get all data about Data info you need