Sourcing data primarily Java Applications using Perl, Scala, Python…
Data Access Frameworks
Hbase - for EDWdata
Pig – data piplelines
Hive – Adhoc queries MQL – Mobius Query Language
Monitoring & Alerting
HUE/Mobius – lifecycle of user jobs UC4 - scheduling Oozie – user workflow and data pipelines
Mahout – data mining
Built to support multiple groups
Job invocation uses the group name
Allocations based on investment
Minimum share of mappers and reducers
Auth & Auth
HUE – custom module to use corp. credentials
CLI*– PAM custom module
Security* - Implement token interface to replace Kerberos with SAML.
* Work in Progress
Data Sourcing Patterns Click Stream EDW Images Search Indices Analytics Reporting Algorithmic Models Acquisition Description Source Preparation Format Pattern Click Stream Session Event Session Container Session/Event Streamed as LZO/Text SessionContainer generate Sequence Files Session/Event Data Build an index and use LzoTextInputFormat for splits based on the work done by Johan Oskarsson/Twitter Session Container ‘Value to Type Conversion’ Pattern Secondary sort with reduce side join EDW Item Transaction User Feedback Bids Streamed as GZIP/Text Generate SequenceFile/ Hbase snapshot with previous day snapshot and current day data. Hive StorageHandlers to point to SequenceFile/Hbase snapshot
TotalOrderPartitoner with RandomSamplers to identify partition ranges for reducers.
Create Hbase regions using Hfile
Update RegionServers using ruby script loadtable.rb
Search Use Case – Machine Learned Ranking ClickStream Items Users Feedback Classifiers Ranking Function Great Search Results
Enhance search relevance for eBay’s items.
Build a ranking function that takes multiple factors into account like price, listing format, seller track record, relevance.
Ability to add new factors to validate hypothesis
Research Use Case – Description Data Mining
Extend catalog coverage
Leverage data mining/machine learning techniques to create inventory into name value pairs
in an completely unsupervised way
BARBIE 1999 "PREMIERE NIGHT" Home Shopping Special Edition Gorgeous Doll With Beautiful Blond Hair / In A Gown Of Purple And Silver New / Never Removed From Box / Doll Is In Mint Condition / Remember This Beauty Is 11 Years Old Free Shipping To US Only / Will Ship International / Please E-mail For Cost Feel Free To Ask Me Any Questions Or Concerns Smoke - Free Environment Free Shipping Year: 1999 Model: premiere night Edition: home shopping special Hair: blond Gown: purple and silver Condition: new / never removed from box / mint
Platform Details Metrics Job Statistics, System/Disk Consumption, Utilization Infrastructure Publish/Subscribe ETL tools, low latency data movement Development Tools, Environment, IDE, Architecture Schemas, Metadata, Governance, Policies Operations Administration, Configuration, Monitoring Reporting Visualization, BI Generation, Information delivery Security User & Group Management, Auth & Auth Clusters Details Exploratory Strategic investment 1000-5000 nodes Production Site facing, low latency, high availability Use Case Specific Advertising, Trust & Safety , Merchandizing