Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Transforming Unstructured Web into Actionable Insights Using AI - Abhimanyu - Ugam Solutions

4,307 views

Published on



The volume of data is practically exploding by the day. Online retail world is really struggling to stay afloat in this new turbulent big data environment. There are billions of products listed by various online retailers and the depth of product information across retailers varies. Data mining of unstructured data at this scale makes it a very grueling task. With thousands of new products being added every minute, the competitive landscape changes very quickly. It is a daunting task for retailers to keep track of the competition and take any informed decision about their business. The core of the problem is to have the knowledge-base of ‘Matching Products’ across retailers and it is a very complex problem to solve. This session is about our efforts in utilizing Machine Learning on Big-Data to solve this problem.

* ML in product classification
* NLP/ML in Attribute Extraction
* CV in Image processing

Published in: Software
  • Be the first to comment

  • Be the first to like this

Transforming Unstructured Web into Actionable Insights Using AI - Abhimanyu - Ugam Solutions

  1. 1. Private & ConfidentialPrivate & Confidential Transforming unstructured web into actionable insights using AI March-2018
  2. 2. Private & Confidential 2 Using AI to accelerate Digital Transformation
  3. 3. Private & Confidential 3 Ugam is a data and analytics company helping leading corporations to improve business decisions Analytics application s Analytics services E- commerce operations 17 years Manufacturer Distributor B2C B2B Manufacturer Retailer
  4. 4. Private & Confidential 4 Problem Definition & Business Impact
  5. 5. Private & Confidential 5
  6. 6. Private & Confidential 6
  7. 7. Private & Confidential 7 • Amazon: 562 million 2018 - 372 million 2017. • ~20 K every hour Volume • Every retailer has different site-cat-path • Photo, video, Social, Mobile Variety • Periodic, near Real Time, Real TimeVelocity • Unstructured data representation • Schema Varies per retailer Structure 200K + Categories 500k + Brands 800K + Attributes 8m + Sellers 400 Million Products Processing Performance Curse of Modularity Class Imbalance Curse of Dimensionality Feature Engineering Heterogeneity & Noise
  8. 8. Private & Confidential 8 Cleaning Deduping Classification Attribution Compression Matching What - How - Why The Holy Grail Retailers Price Intelligence & Optimization Assortment Intelligence Product Content solutions Analytics for Merchandising & Marketing Decisions Brands Dynamic Pricing Map Monitoring Data Aggregation Data Synthesis Data Analysis Data Delivery
  9. 9. Private & Confidential 9 Cleaning Deduping Classification Attribution Compression Matching Category Research Hierarchical Classification Multiclass Linear SVM Convolutional NN Ensemble
  10. 10. Private & Confidential 10 Cleaning Deduping Classification Attribution Compression Matching Original Data Set D1 D2 Dn-1 Dn Multiple Data sets Multiple Classifier s Combining Classifiers C1 C2 Cn-1 Cn Bootstrap Aggregating for improved performance Clothing Laptops Electronics Toys Handbags & Luggage Health Beauty Antiques Kitchen Miscellaneous Personal care Baby Ensemble ⅀
  11. 11. Private & Confidential 11 Black Shoe Black Pointed-toe stilettoBlack High Heel Black studded leather pointed-toe Christian Louboutin 6” glided heel stiletto for night out Cleaning Deduping Classification Attribution Compression Matching Category Research Text Attributes: CNN, Sequence Labeling Image Feature Extraction : CNN Type: Casual Heel Height: 0.5 Inch Heel Type: Flat Material: PVC ASIN: B077BMVXLQ Brand: Footsoul Managed Attributes Unmanaged Attributes
  12. 12. Private & Confidential 12 Info Bundle delivered through Image Processing APIPre-classified Input image Cleaning Deduping Classification Attribution Compression Matching
  13. 13. Private & Confidential 13 Cleaning Deduping Classification Attribution Compression Matching Feature Libraries/Functions Data used for training Object identification, Image clustering • Tensorflow • Keras (CNN) • Caffe • Internal product database • CIFAR-100 • CIFAR-10 Foreground extraction/ Edge & contour • OpenCV • Keras (CNN) • KITTI vision benchmark • GTI image database Template matching/ Brand dectection • Keras • OpenCV • Internal product database • CIFAR-100 • KITI • Gait dataset Text/Color extraction • Tensorflow • Tesseract • OpenCV • Internal product database • CIFAR-10 • CIFAR-100 Merchandise Category Managed Features Coverage achieved Hardline: Consumer Electronics, etc. • Brand, Color, Product • Up to 95% Soft line: Apparel • Up to 80% Merchandise Category Unmanaged Coverage achieved Hardline: Consumer Electronics, etc. • MPN, UPC • Up to 80% Soft line: Apparel • Up to 70%
  14. 14. Private & Confidential 14 Cleaning Deduping Classification Attribution Compression Matching 02 Attribute Extraction• Maximizing attribute coverage • Brand, MPN/UPC, Category specific enforcer attribute 04 Associations • Associative rule matching Product Matching Getting Classification done• Correct classification gives us right set of attributes. 03 Compression / Clustering• Allows us to work on scale • Hierarchical Agglomerative Clustering 01 • Exact, Similar matches
  15. 15. Private & Confidential 15
  16. 16. Private & Confidential 16 Reinforcement: Validation of attributes - Tool Cleaning Deduping Classification Attribution Compression Matching
  17. 17. Private & Confidential 17 Cleaning Deduping Classification Attribution Compression Matching Matching Engine - Tool
  18. 18. Private & Confidential 18
  19. 19. Private & Confidential www.ugamsolutions.com Disclaimer: The information set out in this presentation is produced by Ugam Solutions (“the Company” or “Ugam”) and is being made available AS IS to recipients solely for information purposes only. This presentation and its contents are strictly confidential to Ugam and may not be used, reproduced, redistributed or transmitted, passed on or published, in whole or in part, to any other person for any purpose whatsoever.

×