4. “statisticians working in a research
environment… may well have to
explain that the data are inadequate
to answer a particular question.”
Pre-data, post-data, post-analysis
מההפוטנציאלשלנתוניםלייצרידע/תובנה?
4
5. מקרה2:מההערךשלטכנולוגית/שיטת
איסוףנתוניםחדשה?
5
Despite its safety, cost
advantages, and obvious appeal
from the patient perspective,
the PillCam COLON was
approved “for use only in
patients who had an incomplete
optical colonoscopy.”
In a trial of 700 subjects who
underwent both capsule
endoscopy and an optical
colonoscopy, the capsule failed
to identify polyps about 1/3 of
the time.
[Cannot be used to treat
pathology that may be
discovered]
נתונים
רפואיים
13. תלויהבאיכותהמרכיביםg, X, f, U
ובקשריםביניהם
איכותהמידע=הפוטנציאלהגלוםבניתוחנתוניםמסוימים
להשגתמטרתהמחקר/הפרויקט(הפקתהתועלתהרצויה)
The potential of a particular dataset to achieve a particular goal
using a given data analysis method
13
InfoQ(f,X,g,U) = U( f(X|g) )
14. שיטותסטטיסטיותלהגדלתאיכותהמידע
Study Design (Pre-Data)
• DOE
• Clinical trials
• Survey sampling
• Computer experiments
Post-Data-Collection
• Data cleaning and
preprocessing
• Re-weighting, bias
adjustment
• Meta analysis
Randomization, Stratification,
Blinding, Placebo, Blocking,
Replication, Sampling frame,
Link data collection protocol
with appropriate design
Recovering “real data” vs.
“cleaning for the goal”
Handling missing values,
outlier detection, re-
weighting, combining results 14
15. כיצדלאמודולמדודאתאיכותהמידע?
“Quality of Statistical Data”
(Eurostat, OECD, NCSES,…)
• Relevance
• Accuracy
• Timeliness and punctuality
• Accessibility
• Interpretability
• Coherence
• Credibility
3 V’s of Big Data
• Volume
• Variety
• Velocity
Marketing Research
• Recency
• Accuracy
• Availability
• Relevance
15
17. #1 Data Resolution רזולוצייתהנתונים
סולםהמדידהומידתהאגרגציה
17
בתרשימיבקרה:
“Process operators might review data hourly or daily;
area managers, weekly; site managers, monthly; and
business managers, quarterly. The level of temporal
aggregation would likely increase as one moves up
the management hierarchy”
Zwetsloot & Woodall, 2018
19. #2 Data Structure מבנההנתונים
Data Types
• Time series, cross-sectional, panel
• Geographic, spatial, network
• Text, audio, video, semantic
• Structured, semi-, non-structured
• Discrete, continuous
Data Characteristics
Corrupted and missing values due to
study design or data collection
mechanism
19
20. #3 Data Integration שילובנתונים
Utility of Linkage
Dangers: Privacy
Increase or decrease InfoQ?
22. #4 Temporal Relevance עדכניות
Analysis Timeliness
(solving the right
problem too late)
Data
Collection
Data
Analysis
Study
Deployment
t1 t2 t3 t4 t5 t6
Collection Timeliness
(relevance to g)
g: Prospective vs. retrospective; longitudinal vs. snapshot
Nature of X, complexity of f
forecast
22
23. #5 Chronology of Data & Goal
תאימותביןנגישותהנתוניםוהמטרה
g1:זיהויבעיותאמינות
החזרותלתיקון (recall): g2
Retrospective/prospective
Ex-post availability
Endogeneity
23
24. #6 Generalizability יכולתהכללה
הכללהסטטיסטית הכללהמדעית
Definition of g
Choice of X, f, U 24
Those who purchased the
extended warranty are not
a random sample of all
purchasers. They would be
expected to be heavy users
of the product and more
likely to experience
failures. Thus, the results
on these units may be
somewhat pessimistic.
התוצאותרלוונטיותגםל-
●רכיביםאחרים?
●דגמיםאחרים?
●משתמשיםבמדינות
אחרות?
27. #8 Communication תקשורת
Visual, written, verbal presentations & reports
Knowledge must reach the right person at the right time
• Mentoring
• Manuscript reviewing
• Data made available to others
• EDA and shared visualization
dashboards
• Seminars + conferences!
27
28. 28
“In the last three years, there has
been a concerted effort by those in
Washington to reduce government
spending and reign in the national
debt.
One reason for the budget cuts?
Research by two Harvard
economists, Ken Rogoff and Carmen
Reinhart. The pair found that when a
country owes more than 90 percent
of their GDP, it slides into
recession.”
… Fixing this Excel error transforms
high-debt countries from recession
to growth
www.marketplace.org/topics/economy/excel-mistake-heard-round-world
29. InfoQ: Summary
InfoQ approach streamlines questioning of data value
• “Why should we invest in this data?” – management
• Compare value of potential datasets, analyses, data collection
systems
• Prioritize/rank projects
• Strengthen functional – analytical relationship
• Useful for developing an analysis plan (prospective) and
evaluating an empirical study (retrospective)
29