3. Data Mining Goals
Analyze QVC airtime and sales history to determine the best times to sell
certain products on air
Determine which states make the most purchases in order to better
geographically target QVCs sales
Determine which brands and products sell the best
5. Clean the Data
In order to get the data in a format readable by HDFS file types, the data
needed to be cleaned
We used a combination of Excel and Powershell to do this
Quotes needed to be removed and dates needed to be formatted as YYYY-
MM-DD not MM/DD/YYYY.
6. Process the Data
A mixture of the Hadoop tools Hive and Impala were used
We ran a combination of queries on the tables including joins and distinct
queries to get an idea of the data we were working with
These queries generated the Excel files that we further analyzed in Tableau
In a real world situation, one would not limit themselves to one tool