Case Sudy:Retail In-StoreAnalysis with HadoopNils Kübler, YMCMay 13th 2013
CC 2.0 by Franck BLAIS | http://flic.kr/p/cwVnSyWhat is the StatusQuo? What couldbe possible?Introduction
Status QuoWhat is the KPI in Retail?→ Revenue/qm2
How to bring in more metrics?Possibile sensors for a real store:● customer frequency counters at doors● the cashier system...
What type of Questions could we ask?● How many people visited the store? → unique visitors?● How many visits did we have?●...
CC 2.0 by by Ian Carroll | http://flic.kr/p/6NWoGmHow do we answerthese questions?Preparation
Traditional Data Management ApproachFrom a high level of abstraction the answer is simple. We need adata management system...
Blueprint for a Data Management Systemwith HadoopWe take this basis architecture and replace the generic termswhile mappin...
CC 2.0 by Perry French | http://flic.kr/p/8wDMJSWhat do we need?Setup
Ingrediants1. 2 WiFi access points to simulate two different stores2. Flume to move all log messages to HDFS3. A 4 node CD...
● 2 WIFI Routers with OpenWRT installed: one Buffalo and oneFonera● Installed 4 Days before the Hackathon, to have some lo...
Parsing, Transformation, Filtering, Load● Raw Log-Data needs to be transformed to CSV● Many open-source BI Tools to help w...
Process● Data can now be processed either by Hive or Impala● create intermediate with messages like: login/logout withvisi...
CC 2.0 by Qi Wei Fong | http://flic.kr/p/7w8vfqNow, what did weget?Results
Visits for stores Buffalo and Fonera● about 85% of the visits were detected in the Buffalo store● about 15% in the Fonera ...
Unique visitors● 135 visits in the Buffalo by only 9 unique visitors● 24 visits in the Fonera store by 5 unique visitors
New vs. returning users● more returning than new users in both stores● Fonera didnt see a new visitor over the past four d...
Visit duration over the past 4 days● Buffalo has more evenly distributed durations● Fonera shows some peaks● visitors tend...
Conclusion● Analysing WiFi router log files could be done with atraditional RDBMS database approach as well.● Answering su...
CC 2.0 by Aurelien Guichard | http://flic.kr/p/cjg9ywBlog Series:http://bitly.com/bundles/nkuebler/1Thank you
Upcoming SlideShare
Loading in...5
×

In-Store Analysis with Hadoop

1,585

Published on

While user tracking with WebTrends, comScore, Google Analytics etc. is a de-facto standard in the online world, tracking visitors in the real world is still fragmented. From a wide perspective, potential tracking data is produced by various sensors. With a real ‘bricks and mortar’ store, one could figure out possible sensors they could use: customer frequency counters at the doors, the cashier system, free WiFi access points, video capture, temperature, background music, smells and many more. For many of those sensors additional hardware and software would be needed, but a few sensors already have solutions available, e.g. video capturing with face or even eye recognition. The most interesting sensor data that doesn’t require additional hardware and software could be the WiFi access points. Especially given that many visitors will have WiFi enabled mobile phones. This talk demonstrates how WiFi access point log files can be used to answer different questions for a particular store.

Published in: Technology
1 Comment
4 Likes
Statistics
Notes
  • This talk was held at the 7th meeting on May 13 at IBM Zurich by Nils Kübler, YMC AG.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
1,585
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
1
Likes
4
Embeds 0
No embeds

No notes for slide

Transcript of "In-Store Analysis with Hadoop"

  1. 1. Case Sudy:Retail In-StoreAnalysis with HadoopNils Kübler, YMCMay 13th 2013
  2. 2. CC 2.0 by Franck BLAIS | http://flic.kr/p/cwVnSyWhat is the StatusQuo? What couldbe possible?Introduction
  3. 3. Status QuoWhat is the KPI in Retail?→ Revenue/qm2
  4. 4. How to bring in more metrics?Possibile sensors for a real store:● customer frequency counters at doors● the cashier system● free WiFi access points● video capturing● temperature● ...For many of these sensors additional Hardware and Software isneeded:⇒ Lets use the free WIFI access points
  5. 5. What type of Questions could we ask?● How many people visited the store? → unique visitors?● How many visits did we have?● What is the average visit duration?● How many people are new vs. returning?● ....
  6. 6. CC 2.0 by by Ian Carroll | http://flic.kr/p/6NWoGmHow do we answerthese questions?Preparation
  7. 7. Traditional Data Management ApproachFrom a high level of abstraction the answer is simple. We need adata management system with three pieces:1. ingest2. store3. process
  8. 8. Blueprint for a Data Management Systemwith HadoopWe take this basis architecture and replace the generic termswhile mapping it onto the Hadoop ecosystem.With this Hadoop architecture a Data Scientist should be able toanswer the questions without any programming environment.He/she can also use familiar BI, analysis and reporting tools aswell.
  9. 9. CC 2.0 by Perry French | http://flic.kr/p/8wDMJSWhat do we need?Setup
  10. 10. Ingrediants1. 2 WiFi access points to simulate two different stores2. Flume to move all log messages to HDFS3. A 4 node CDH4 cluster4. Pentaho Data Integration‘s graphical designer for datatransformation, parsing, filtering and loading to thewarehouse5. Hive as data warehouse system on top of Hadoop to projectstructure onto data6. Impala for querying data from Hive in real time7. MS Excel to visualize results
  11. 11. ● 2 WIFI Routers with OpenWRT installed: one Buffalo and oneFonera● Installed 4 Days before the Hackathon, to have some logdata● Syslogs are collected on Central Syslog Server● Flume Node collects syslogs and store them on HDFS,without any manual intervention (no transformation, nofiltering)● (Flume can also be run as Syslogserver)Ingest
  12. 12. Parsing, Transformation, Filtering, Load● Raw Log-Data needs to be transformed to CSV● Many open-source BI Tools to help with that: Palo, SpargoBI,Pentaho, Talend● We used Pentaho● Design a MapReduce Job for distributed transformation ofthe Log-Data with○ Regular expression to match line and split columns○ Filter empty Lines○ UDF to create CSV and Unix Timestamp● From this data we can easily generate a Hive Schema andstore the data to our Hive Data Warehouse.1358765267,2013,1,21,11,47,47,+01:00,buffalo,hostapd,wlan0,10:68:3f:40:20:2d,IEEE 802.1X,authorizing port1358765267,2013,1,21,11,47,47,+01:00,buffalo,hostapd,wlan0,10:68:3f:40:20:2d,WPA,pairwise key handshake completed (RSN)
  13. 13. Process● Data can now be processed either by Hive or Impala● create intermediate with messages like: login/logout withvisit duration.● We used Impala to query our data ad-hock for our questionsoutput:○ How many people visited the store (unique visitors)?○ How many visits did we have?○ What is the average visit duration?○ How many people are new vs. returning?● The output was then loaded into Excel to create some niceGraphs.
  14. 14. CC 2.0 by Qi Wei Fong | http://flic.kr/p/7w8vfqNow, what did weget?Results
  15. 15. Visits for stores Buffalo and Fonera● about 85% of the visits were detected in the Buffalo store● about 15% in the Fonera store.● Is Buffalo Store in a better location?
  16. 16. Unique visitors● 135 visits in the Buffalo by only 9 unique visitors● 24 visits in the Fonera store by 5 unique visitors
  17. 17. New vs. returning users● more returning than new users in both stores● Fonera didnt see a new visitor over the past four days at all
  18. 18. Visit duration over the past 4 days● Buffalo has more evenly distributed durations● Fonera shows some peaks● visitors tend to stay in shop Buffalo much longer
  19. 19. Conclusion● Analysing WiFi router log files could be done with atraditional RDBMS database approach as well.● Answering such questions based on WiFi router log files canbe done without programming software● Given the fact that one can quickly ramp up a test clusterwith a few nodes, similar problems can be solved within oneday with a handful of engineers.● It could be possible to track paths from people based on WiFirouter signals using triangulation.
  20. 20. CC 2.0 by Aurelien Guichard | http://flic.kr/p/cjg9ywBlog Series:http://bitly.com/bundles/nkuebler/1Thank you

×