Nile - eCommerce Sales Analytics
Get real-time insights into sales data - Ritu Rathore
Use Cases
✓ Record historical sales data
✓ Daily/Monthly volumes by category
✓ Month to month product revenue comparative trend
✓ Product popularity by location (zipcode / region)
✓ Real-time view of product sale by minute
Data Model
belongs
to many
Order
--------------------------
Order_id
Date
Customer_id
Zipcode
Items [...]
Product
-----------------
Product_id
Name
Categories [...]
contains
many
Category
--------------------
Category_id
Name
Item
---------------
Product_id
Qty
Price
Order
Id: AZ1721
Placed On: 12/24/2014
Customer_id: 9952357
Zipcode: 95054
Items
Harry Potter
Quantity: 1 Price: $ 19.99
Data Engineering for Dummies
Quantity: 3 Price: $ 24.99
Product Catalog
Id: 5655
Name: Harry Potter
Categories: DVD,
Fiction,
Fantasy
Id: 7999999
Name: Data Engineering for Dummies
Categories: Book,
Non-Fiction,
Engineering
Kafka
Bash script to
load catalog &
sales order
Py script to
simulate sales
Product Catalog
Sales Order
Data Pipeline
Ingestion
● Multi-line record to tsv
● Kafka
o Reliable chunking of messages for batch storage
and speed layer
o Decouple transaction and analytical system
Key Takeaways
Batch Layer
● Using Hive to do replicated join
o Buffers the product catalog across all nodes
o Map-side join eliminates reducer
● Product to category many to many relationship flattened
Kafka Spout BoltBolt
Field grouping
on product,
write to hbase
Splits orders
to product
Emitting
sales order
Speed Layer with Storm
Precomputed Views
Row key <product>_YYMMDD designed to reduce
hotspotting
Jazz_2014-12-29 column=f:c1, timestamp=1423357346057, value=1203
Music_2014-12-29 column=f:c1, timestamp=1423357346057, value=1235
Fiction_2014-12-29 column=f:c1, timestamp=1423357346057, value=1201
Arts_2014-12-29 column=f:c1, timestamp=1423357346057, value=1317
Ritu Rathore
https://github.com/rituraja/Nile
Make it Simple - Life needs Perception, Systems need Abstraction!
➢ Masters in Computer Science
➢ Past Projects:
○ Analytics on wireless device
event stream
○ Web application
○ Visualizing results from a load
testing tool
○ Reporting for Wells Fargo

Insight project

  • 1.
    Nile - eCommerceSales Analytics Get real-time insights into sales data - Ritu Rathore
  • 2.
    Use Cases ✓ Recordhistorical sales data ✓ Daily/Monthly volumes by category ✓ Month to month product revenue comparative trend ✓ Product popularity by location (zipcode / region) ✓ Real-time view of product sale by minute
  • 5.
    Data Model belongs to many Order -------------------------- Order_id Date Customer_id Zipcode Items[...] Product ----------------- Product_id Name Categories [...] contains many Category -------------------- Category_id Name Item --------------- Product_id Qty Price Order Id: AZ1721 Placed On: 12/24/2014 Customer_id: 9952357 Zipcode: 95054 Items Harry Potter Quantity: 1 Price: $ 19.99 Data Engineering for Dummies Quantity: 3 Price: $ 24.99 Product Catalog Id: 5655 Name: Harry Potter Categories: DVD, Fiction, Fantasy Id: 7999999 Name: Data Engineering for Dummies Categories: Book, Non-Fiction, Engineering
  • 6.
    Kafka Bash script to loadcatalog & sales order Py script to simulate sales Product Catalog Sales Order Data Pipeline
  • 7.
    Ingestion ● Multi-line recordto tsv ● Kafka o Reliable chunking of messages for batch storage and speed layer o Decouple transaction and analytical system Key Takeaways
  • 8.
    Batch Layer ● UsingHive to do replicated join o Buffers the product catalog across all nodes o Map-side join eliminates reducer ● Product to category many to many relationship flattened
  • 9.
    Kafka Spout BoltBolt Fieldgrouping on product, write to hbase Splits orders to product Emitting sales order Speed Layer with Storm
  • 10.
    Precomputed Views Row key<product>_YYMMDD designed to reduce hotspotting Jazz_2014-12-29 column=f:c1, timestamp=1423357346057, value=1203 Music_2014-12-29 column=f:c1, timestamp=1423357346057, value=1235 Fiction_2014-12-29 column=f:c1, timestamp=1423357346057, value=1201 Arts_2014-12-29 column=f:c1, timestamp=1423357346057, value=1317
  • 11.
    Ritu Rathore https://github.com/rituraja/Nile Make itSimple - Life needs Perception, Systems need Abstraction! ➢ Masters in Computer Science ➢ Past Projects: ○ Analytics on wireless device event stream ○ Web application ○ Visualizing results from a load testing tool ○ Reporting for Wells Fargo