Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data setup for fast, error free analysis (webinar)

1,086 views

Published on

Webinar on 5th April 2018
Q Research Software

Published in: Technology
  • Login to see the comments

  • Be the first to like this

Data setup for fast, error free analysis (webinar)

  1. 1. Setup data for fast, error-free analysis in Q Q R E S E A R C H S O F T W A R E . C O M W E B I N A R
  2. 2. Other resources for learning about data cleaning Online Training (within Q: Help > Online Training) Lots of resources on the wiki (wiki.q-researchsoftware.com): • Technical detail about how to clean (use Search) • Video library (more about this coming soon) We will send you a detailed eBook on data tidying and cleaning in the next couple of weeks. support@q-researchsoftware.com (Help > Email Support)
  3. 3. Conceptualizing the data analysis workflow 3 Housekeeping Imputation Weighting Transformation Analysis Reporting Production Tidy raw data set(s) Data cleaning cycle Data shaping Clean data Extracting metadata-rich source data  
  4. 4. Metadata is the key to understanding the data ID Each organization has one value on this variable and no other organizations have the same value. Industry The industry classification of the firm. Shop Agree (A) or disagree (D) that “It is important to shop around” Understand Agree (A) or disagree (D) that “I understand my company's communication needs” Key Agree (A) or disagree (D) that “Communications technology is key to our business” Interested Agree (A) or disagree (D) that “I am interested in communications technology” Value Agree (A) or disagree (D) that “Value for money is more important to us than getting the best technology” Profit ($) An estimate of the gross profit provided by each firm to the industry (excluding fixed costs). Constructed from a series of survey questions about the types of products held, usage levels and bill payments. # Employees Number of employees of the business ID Industry Shop Understa nd Key Interest Value Profit ($) # Employees 1 Retail Trade A A A A D 9777.47 12 2 Retail Trade A A A D A 3595.79 12 3 Cult. and Rec. Services A A A A D 2660.15 20 4 Retail Trade A A D A A 2303.08 30 5 Manufacturing A D A D D 644.57 6 6 Mining D A A A D 3517.85 99 7 Agr., Forest. & Fishing A D A D D 6905.25 8 8 Retail Trade D D A A D 9916.39 60 9 Health & Community Services A A A A A 1855.43 56 10 Property & Business Services A A A A D 765.10 4 11 Communication Services D A D D A 838.13 1 12 Manufacturing A A A A A 2303.08 30 13 Manufacturing D D D D D 2151.92 7 14 Manufacturing A A A A D 1263.65 1 Data Metadata
  5. 5. Key bits of metadata in Q Variable labels Value labels Multiple response set information Missing data Unique identifiers  Faster project setup  Reduce the risk of errors  Reduce the time to report data  Helps spotting changes in definitions
  6. 6. Extracting metadata-rich source data Excel files .xls or xlsx SPSS format .sav Triple S .sss SPSS Dimensions .mdd SQL databases Text format .txt, .tab, .tsv CSV files .csv files Metadata “poor” data“Good” datafile types   Search wiki: Setting Up Files With No Metadata & Excel and CSV Data File Specifications Search wiki: SPSS Data File Specifications
  7. 7. The first aim: Getting a tidy raw data set into your project Right shapeWrong Shape  Rows & Columns  Row = unit of analysis  Column = variable  Column has a name Ideally with: • Unique identifier • Associated metadata Data shaping Tidy raw data set(s)
  8. 8. Data set reshaping tools in Q Ways of reshaping Do entirely by code (R Data Set) Do by clicking buttons Aggregation  Sorting   Sort columns on the Data tab Filtering   Filter the whole report and/or Data tab Deleting   Delete rows on Data tab; if required, export as an SPSS data file Partitioning (splitting)   Delete rows on the data tab, then export as an SPSS file. Repeat with different rows deleted. Sampling   Create a filter using a random numbers, then see Deleting. Stretching  Stacking   Tools > Stack SPSS .sav Data File (using Tools > Save Data as SPSS.sav Data File first, if necessary) Widening (flattening)  Merging data by case (appending)   Tools > Merge Data Files > Add New cases Deduping (deduplicating)   Create a new R variable with expression of duplicated(variableName) and see Deleting Merging by variable (augmenting)   Tools > Merge Data Files > Add New Variables String splitting   Create > Variables R Variable Creating a Distance Matrix   Create > Correlations > Distance Matrix Data shaping
  9. 9. Stacking 9 ID Apple Microsoft IBM Apple Microsoft IBM Apple Microsoft IBM 1 6 9 7 1 0 0 1 1 0 2 8 7 7 1 0 0 1 0 0 3 0 9 8 0 1 0 0 0 0 4 0 0 0 0 0 0 0 0 0 This brand is fun This brand is exciting Likelihood to recommend
  10. 10. Stacking ID Apple Microsoft IBM Apple Microsoft IBM Apple Microsoft IBM 1 6 9 7 1 0 0 1 1 0 2 8 7 7 1 0 0 1 0 0 3 0 9 8 0 1 0 0 0 0 4 0 0 0 0 0 0 0 0 0 This brand is fun This brand is exciting Likelihood to recommend ID Brand Likelihood to recommend This brand is fun This brand is exciting 1 Apple 6 1 1 1 Microsoft 9 0 1 1 IBM 7 0 0 2 Apple 6 1 1 2 Microsoft 9 0 1 2 IBM 7 0 0 3 Apple 6 1 1 3 Microsoft 9 0 1 3 IBM 7 0 0 4 Apple 6 1 1 4 Microsoft 9 0 1 4 IBM 7 0 0 From: one row per respondent To: one row per brand per respondent
  11. 11. Widening ID Apple Microsoft IBM Apple Microsoft IBM Apple Microsoft IBM 1 6 9 7 1 0 0 1 1 0 2 8 7 7 1 0 0 1 0 0 3 0 9 8 0 1 0 0 0 0 4 0 0 0 0 0 0 0 0 0 This brand is fun This brand is exciting Likelihood to recommend ID Brand Likelihood to recommend This brand is fun This brand is exciting 1 Apple 6 1 1 1 Microsoft 9 0 1 1 IBM 7 0 0 2 Apple 6 1 1 2 Microsoft 9 0 1 2 IBM 7 0 0 3 Apple 6 1 1 3 Microsoft 9 0 1 3 IBM 7 0 0 4 Apple 6 1 1 4 Microsoft 9 0 1 4 IBM 7 0 0 Widening, which is also known as flattening, is the reverse of stacking.
  12. 12. Widening
  13. 13. Conceptualizing the data analysis workflow 13 1. Wrong Question Type 2. Incorrect Base Size 3. Unusual Values 4. Too-small categories 5. Poor Metadata 6. Multi-variable problems Tidy raw data set(s) Data cleaning cycle Data shaping Importing source data Clean data
  14. 14. The Cleaning Cycle Dirt #1 Wrong Question Type #2 Incorrect Base Size #3 Unusual Values #4 Too-small Categories #5 Poor Metadata #6 Multi-variable problems Summary Tables
  15. 15. The Cleaning Cycle Dirt How to detect Cleaning action #1 Wrong Question Type • Variables and Questions Tab • Summary Tables • Change Question Type & setting #2 Incorrect Base Size • Summary Tables • Recode • Delete Cases • Get new data #3 Unusual Values • Summary Tables • Recode • Change values of raw data • Delete cases • Back code #4 Too-small Categories • Summary Tables • Merge #5 Poor Metadata • Summary Tables • Manually change • Search and replace #6 Multi-variable problems • Crosstabs • Sankey Diagrams • Missing Value Patterns • Flatlining • Validation Rules • Nets • Recode • Delete cases • Get new data
  16. 16. phone.sav 1. Wrong Question Type 2. Incorrect Base Size 3. Unusual Values 4. Too-small categories 5. Poor Metadata 6. Multi-variable problems Data cleaning cycle
  17. 17. Housekeeping Include questionnaire numbering in the Question labels (makes for quick search) 1 Hide (H-tag) irrelevant questions/variables 2 Move questions to the top/bottom using the blue buttons (or see Move Data in the Automate Menu) 3 Clean variable labels (Tip: use Find/ Replace and the asterisk *) 4 Four useful tips for a Tidy Variables and Questions tab
  18. 18. Keep learning more Q wiki: Basic Workflow For Checking and Cleaning a Project eBook on Data Tidying and Cleaning – coming soon! Subscribe to Q blog (on website) – www.q-researchsoftware.com

×