View stunning SlideShares in full-screen with the new iOS app!Introducing SlideShare for AndroidExplore all your favorite topics in the SlideShare appGet the SlideShare app to Save for Later — even offline
View stunning SlideShares in full-screen with the new Android app!View stunning SlideShares in full-screen with the new iOS app!
Security* - Implement token interface to replace Kerberos with SAML.
* Work in Progress
Data Sourcing Patterns Click Stream EDW Images Search Indices Analytics Reporting Algorithmic Models Acquisition Description Source Preparation Format Pattern Click Stream Session Event Session Container Session/Event Streamed as LZO/Text SessionContainer generate Sequence Files Session/Event Data Build an index and use LzoTextInputFormat for splits based on the work done by Johan Oskarsson/Twitter Session Container ‘Value to Type Conversion’ Pattern Secondary sort with reduce side join EDW Item Transaction User Feedback Bids Streamed as GZIP/Text Generate SequenceFile/ Hbase snapshot with previous day snapshot and current day data. Hive StorageHandlers to point to SequenceFile/Hbase snapshot
TotalOrderPartitoner with RandomSamplers to identify partition ranges for reducers.
Create Hbase regions using Hfile
Update RegionServers using ruby script loadtable.rb
Leverage data mining/machine learning techniques to create inventory into name value pairs
in an completely unsupervised way
BARBIE 1999 "PREMIERE NIGHT" Home Shopping Special Edition Gorgeous Doll With Beautiful Blond Hair / In A Gown Of Purple And Silver New / Never Removed From Box / Doll Is In Mint Condition / Remember This Beauty Is 11 Years Old Free Shipping To US Only / Will Ship International / Please E-mail For Cost Feel Free To Ask Me Any Questions Or Concerns Smoke - Free Environment Free Shipping Year: 1999 Model: premiere night Edition: home shopping special Hair: blond Gown: purple and silver Condition: new / never removed from box / mint
Platform Details Metrics Job Statistics, System/Disk Consumption, Utilization Infrastructure Publish/Subscribe ETL tools, low latency data movement Development Tools, Environment, IDE, Architecture Schemas, Metadata, Governance, Policies Operations Administration, Configuration, Monitoring Reporting Visualization, BI Generation, Information delivery Security User & Group Management, Auth & Auth Clusters Details Exploratory Strategic investment 1000-5000 nodes Production Site facing, low latency, high availability Use Case Specific Advertising, Trust & Safety , Merchandizing