SlideShare a Scribd company logo
CC 2.0 by Per Olesen | http://flic.kr/p/7pVCgZ
CC 2.0 by Franck BLAIS | http://flic.kr/p/cwVnSy
CC 2.0 by John Steven Fernandez | http://flic.kr/p/a8uTzz
CC 2.0 by Ian Carroll | http://flic.kr/p/6NWoGm
CC 2.0 by Perry French | http://flic.kr/p/8wDMJS
CC 2.0 by John Mitchell | http://flic.kr/p/5UaPg8
März
                                              8,
                                             2013


Before we started designing a blueprint       7
solution we first of all asked ourselves:

1  Who would be asked to answer
   questions like this?
2  Who is this person?
3  What tools does this person expect to
   use?
4  And what is a typical skill set of this
   person?
5  How do they work?

Preparation

How do we answer these questions?
März
                                                                      8,
                                                                     2013



From a high level of abstraction the                                 8

answer is simple. We need a data
management system with three pieces:
ingest, store and process.


          Data                   Data          Data        Data
         Source                Ingestion      Storage   Processing




Traditional Data Management System Approach


So, how do we answer these questions as a
März
                                                                            8,
                                                                           2013

We take this basis architecture and replace the                            9
generic terms while mapping it onto the Hadoop
ecosystem.

         Data                                                  HIVE,
        Source                 Flume                 HDFS     Impala


                                                            BI/Analysis/
                                                             Reporting


With this Hadoop architecture a Data Scientist
should be able to answer the questions without any
programming environment. He/she can also use
familiar BI, analysis and reporting tools as well.


Blueprint for a Data Management System with Hadoop

So, how do we answer these questions as a
März
                                                                  8,
                                                                 2013

1       2 WiFi access points to simulate two different stores     10
        with OpenWRT, a linux based firmware for routers,
        installed
2       Flume to move all log messages to HDFS, without any
        manual intervention (no transformation, no filtering)
3       A 4 node CDH4 cluster
4       Pentaho Data Integration‘s graphical designer for data
        transformation, parsing, filtering and loading to the
        warehouse
5       Hive as data warehouse system on top of Hadoop to
        project structure onto data
6       Impala for querying data from Hive in real time
7       Tool to visualize results

Setup

Ingrediants
CC 2.0 by Qi Wei Fong | http://flic.kr/p/7w8vfq
März
                                                                            8,
                                                                           2013

The plot indicates that about 85% of the visits were detected in store     12
number one and about 15% in store number two. One might draw the
conclusion that store number one is in a much better location with more
occasional customers.




But let’s gain more insights by analysing the number of unique visitors.




Analysis Result

Visits for stores number one & two
März
                                                                     8,
                                                                    2013

This plot gives us more details about the customers. It turns out   13
that the 135 visits in store number one were caused by just 9
unique visitors while store number two encountered 5 unique
visitors.




Analysis Result

Unique visitors
März
                                                                             8,
                                                                            2013

This plot indicates that we have more returning than new users in both      14
stores. In store number two we didn’t see a new user over the past 4 days
at all.




It’s probably a good idea to start a marketing campaign which aims at
new customers, e.g. to give out vouchers for the first purchase.


Analysis Result

New vs. returning users
März
                                                                           8,
                                                                          2013

The plot for the last 4 days vividly visualizes that the visit duration   15
in store number one was evenly distributed while the distribution
in store number two shows some peaks.




We can also see that visitors tend to stay in shop number one
much longer.


Analysis Result

Visit duration over the past 4 days
März
                                                              8,
                                                             2013

There is a lot of useful information that can be derived     16
from this plot.




1.  There is a repeating pattern of step-ins and step-outs
    within a short period of time.
2.  There was a step-out of store number one and a step-in
    into store number two within just 28 seconds.

Analysis Result

Avg. Duration Between Visits of one particular user
Mä
                                                       rz	
  
                                                       8,	
  
                                                       201
                                                       3	
  




CC 2.0 by Aurelien Guichard | http://flic.kr/p/cjg9yw
März
                                          8,
                                         2013



1  Presentation, Video and Post Series   18

   •  http://bit.ly/YgtIMK
2  http://sentric.ch
3  http://www.bigdata-usergroup.ch
4  http://about.me/jpkoenig




Links

More Related Content

Viewers also liked

مشروع تجاري نسخة
مشروع تجاري   نسخةمشروع تجاري   نسخة
مشروع تجاري نسخة
خالد الناصر khalid alnasser
 
Retail design and planning or How to design GREAT STORE
Retail design and planning or How to design GREAT STORERetail design and planning or How to design GREAT STORE
Retail design and planning or How to design GREAT STORE
Sandra Draskovic
 
Departmental stores
Departmental storesDepartmental stores
Departmental stores
Kanwalnain Kaur
 
Store layout, design and merchandising
Store layout, design and merchandisingStore layout, design and merchandising
Store layout, design and merchandisingSachin Wakchaure
 
Departmental store Project - Tinsukia College
Departmental store Project - Tinsukia CollegeDepartmental store Project - Tinsukia College
Departmental store Project - Tinsukia College
Kumar Nirmal Prasad
 
Retail store layout,design and display
Retail store layout,design and displayRetail store layout,design and display
Retail store layout,design and displayPrithvi Ghag
 

Viewers also liked (7)

مشروع تجاري نسخة
مشروع تجاري   نسخةمشروع تجاري   نسخة
مشروع تجاري نسخة
 
Retail design and planning or How to design GREAT STORE
Retail design and planning or How to design GREAT STORERetail design and planning or How to design GREAT STORE
Retail design and planning or How to design GREAT STORE
 
Departmental stores
Departmental storesDepartmental stores
Departmental stores
 
Store layout, design and merchandising
Store layout, design and merchandisingStore layout, design and merchandising
Store layout, design and merchandising
 
Store design
Store designStore design
Store design
 
Departmental store Project - Tinsukia College
Departmental store Project - Tinsukia CollegeDepartmental store Project - Tinsukia College
Departmental store Project - Tinsukia College
 
Retail store layout,design and display
Retail store layout,design and displayRetail store layout,design and display
Retail store layout,design and display
 

Similar to WMFRA # 46: Case Study - In-Store Analysis

Case Study: In-Store Analysis
Case Study: In-Store AnalysisCase Study: In-Store Analysis
Case Study: In-Store Analysis
Christian Gügi
 
Msbi by quontra us
Msbi by quontra usMsbi by quontra us
Msbi by quontra us
QUONTRASOLUTIONS
 
Gerenral insurance Accounts IT and Investment
Gerenral insurance Accounts IT and InvestmentGerenral insurance Accounts IT and Investment
Gerenral insurance Accounts IT and Investment
vijayk23x
 
Big data upload
Big data uploadBig data upload
Big data upload
Bhavin Tandel
 
Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.
Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.
Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.
Jennifer Walker
 
Applying Lessons from API Development to Healthcare Enterprise Integrations
Applying Lessons from API Development to Healthcare Enterprise IntegrationsApplying Lessons from API Development to Healthcare Enterprise Integrations
Applying Lessons from API Development to Healthcare Enterprise Integrations
Redox Engine
 
In-Store Analysis with Hadoop
In-Store Analysis with HadoopIn-Store Analysis with Hadoop
In-Store Analysis with Hadoop
Swiss Big Data User Group
 
Big dataplatform operationalstrategy
Big dataplatform operationalstrategyBig dataplatform operationalstrategy
Big dataplatform operationalstrategy
Himanshu Bari
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
AmpoolIO
 
What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)
Brian Brazil
 
Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)
mark madsen
 
Challenges of Big Data Research
Challenges of Big Data ResearchChallenges of Big Data Research
Challenges of Big Data Research
Regional Science Academy
 
Data warehousing
Data warehousingData warehousing
Data warehousing
Owais Ashraf
 
Mighty Guides- Data Disruption
Mighty Guides- Data DisruptionMighty Guides- Data Disruption
Mighty Guides- Data Disruption
Mighty Guides, Inc.
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattoo
Mohamed Magdy
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential Tools
FredReynolds2
 
Data science unit2
Data science unit2Data science unit2
Data science unit2
varshakumar21
 
Guide to big data analytics
Guide to big data analyticsGuide to big data analytics
Guide to big data analytics
Gahya Pandian
 

Similar to WMFRA # 46: Case Study - In-Store Analysis (20)

Case Study: In-Store Analysis
Case Study: In-Store AnalysisCase Study: In-Store Analysis
Case Study: In-Store Analysis
 
Msbi by quontra us
Msbi by quontra usMsbi by quontra us
Msbi by quontra us
 
Gerenral insurance Accounts IT and Investment
Gerenral insurance Accounts IT and InvestmentGerenral insurance Accounts IT and Investment
Gerenral insurance Accounts IT and Investment
 
Big data rmoug
Big data rmougBig data rmoug
Big data rmoug
 
Big data upload
Big data uploadBig data upload
Big data upload
 
Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.
Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.
Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.
 
Applying Lessons from API Development to Healthcare Enterprise Integrations
Applying Lessons from API Development to Healthcare Enterprise IntegrationsApplying Lessons from API Development to Healthcare Enterprise Integrations
Applying Lessons from API Development to Healthcare Enterprise Integrations
 
In-Store Analysis with Hadoop
In-Store Analysis with HadoopIn-Store Analysis with Hadoop
In-Store Analysis with Hadoop
 
Big dataplatform operationalstrategy
Big dataplatform operationalstrategyBig dataplatform operationalstrategy
Big dataplatform operationalstrategy
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)What does "monitoring" mean? (FOSDEM 2017)
What does "monitoring" mean? (FOSDEM 2017)
 
Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)
 
Challenges of Big Data Research
Challenges of Big Data ResearchChallenges of Big Data Research
Challenges of Big Data Research
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
ETL QA
ETL QAETL QA
ETL QA
 
Mighty Guides- Data Disruption
Mighty Guides- Data DisruptionMighty Guides- Data Disruption
Mighty Guides- Data Disruption
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattoo
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential Tools
 
Data science unit2
Data science unit2Data science unit2
Data science unit2
 
Guide to big data analytics
Guide to big data analyticsGuide to big data analytics
Guide to big data analytics
 

Recently uploaded

Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 

Recently uploaded (20)

Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 

WMFRA # 46: Case Study - In-Store Analysis

  • 1. CC 2.0 by Per Olesen | http://flic.kr/p/7pVCgZ
  • 2. CC 2.0 by Franck BLAIS | http://flic.kr/p/cwVnSy
  • 3. CC 2.0 by John Steven Fernandez | http://flic.kr/p/a8uTzz
  • 4. CC 2.0 by Ian Carroll | http://flic.kr/p/6NWoGm
  • 5. CC 2.0 by Perry French | http://flic.kr/p/8wDMJS
  • 6. CC 2.0 by John Mitchell | http://flic.kr/p/5UaPg8
  • 7. März 8, 2013 Before we started designing a blueprint 7 solution we first of all asked ourselves: 1  Who would be asked to answer questions like this? 2  Who is this person? 3  What tools does this person expect to use? 4  And what is a typical skill set of this person? 5  How do they work? Preparation How do we answer these questions?
  • 8. März 8, 2013 From a high level of abstraction the 8 answer is simple. We need a data management system with three pieces: ingest, store and process. Data Data Data Data Source Ingestion Storage Processing Traditional Data Management System Approach So, how do we answer these questions as a
  • 9. März 8, 2013 We take this basis architecture and replace the 9 generic terms while mapping it onto the Hadoop ecosystem. Data HIVE, Source Flume HDFS Impala BI/Analysis/ Reporting With this Hadoop architecture a Data Scientist should be able to answer the questions without any programming environment. He/she can also use familiar BI, analysis and reporting tools as well. Blueprint for a Data Management System with Hadoop So, how do we answer these questions as a
  • 10. März 8, 2013 1  2 WiFi access points to simulate two different stores 10 with OpenWRT, a linux based firmware for routers, installed 2  Flume to move all log messages to HDFS, without any manual intervention (no transformation, no filtering) 3  A 4 node CDH4 cluster 4  Pentaho Data Integration‘s graphical designer for data transformation, parsing, filtering and loading to the warehouse 5  Hive as data warehouse system on top of Hadoop to project structure onto data 6  Impala for querying data from Hive in real time 7  Tool to visualize results Setup Ingrediants
  • 11. CC 2.0 by Qi Wei Fong | http://flic.kr/p/7w8vfq
  • 12. März 8, 2013 The plot indicates that about 85% of the visits were detected in store 12 number one and about 15% in store number two. One might draw the conclusion that store number one is in a much better location with more occasional customers. But let’s gain more insights by analysing the number of unique visitors. Analysis Result Visits for stores number one & two
  • 13. März 8, 2013 This plot gives us more details about the customers. It turns out 13 that the 135 visits in store number one were caused by just 9 unique visitors while store number two encountered 5 unique visitors. Analysis Result Unique visitors
  • 14. März 8, 2013 This plot indicates that we have more returning than new users in both 14 stores. In store number two we didn’t see a new user over the past 4 days at all. It’s probably a good idea to start a marketing campaign which aims at new customers, e.g. to give out vouchers for the first purchase. Analysis Result New vs. returning users
  • 15. März 8, 2013 The plot for the last 4 days vividly visualizes that the visit duration 15 in store number one was evenly distributed while the distribution in store number two shows some peaks. We can also see that visitors tend to stay in shop number one much longer. Analysis Result Visit duration over the past 4 days
  • 16. März 8, 2013 There is a lot of useful information that can be derived 16 from this plot. 1.  There is a repeating pattern of step-ins and step-outs within a short period of time. 2.  There was a step-out of store number one and a step-in into store number two within just 28 seconds. Analysis Result Avg. Duration Between Visits of one particular user
  • 17. rz   8,   201 3   CC 2.0 by Aurelien Guichard | http://flic.kr/p/cjg9yw
  • 18. März 8, 2013 1  Presentation, Video and Post Series 18 •  http://bit.ly/YgtIMK 2  http://sentric.ch 3  http://www.bigdata-usergroup.ch 4  http://about.me/jpkoenig Links