SlideShare a Scribd company logo
Andrew Pa tricio | T 5 7 1.216.2003 | www.dataeffec tiveness .com
Data Effectiveness
The Consistency, Accuracy, Relevancy cycle
Council of Great City Schools
Annual Academic, Information Technology
and Research Conference
11 July 2016
Introduction
2
Agenda
Introduction
What is Data Effectiveness?
Data Reporting Issues
Data Quality Culture
Consistency, Relevancy, Accuracy
Reporting Platform
Managing Data Requests
Self Service Reporting
Summary
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 3
DCPS Facts
Led by Chancellor Kaya Henderson since 2010
48000+ students (steady increase for the past 5 years) in 111 schools
• 60 Elementary Schools
• 11 Middle Schools
• 18 Education Campuses (usually Preschool through 8th grade)
• 15 High Schools
• 7 other (Special Ed etc)
6800+ school staff (3600+ teachers)
600+ central office staff
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 4
About Me – Andrew Patricio
(former) Deputy Chief for Data Systems
• Nov 2010 to June 2016
• During most of my tenure was the most
senior technical resource at DCPS
(DCPS Deputy CIO created this year)
Personal background
• BS in Electrical Engineering
• IT & management consulting
• Current: www.dataeffectiveness.com
• Data Strategy Advising
• Building Reporting Infrastructure
• Helping Improve Data Quality
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 5
Data Effectiveness
6
Data Driven Decision Making
All organizations seek to make decisions based on data
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 7
Data Reality
But the reality is that the data we have available is often in poor shape
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 8
Getting to Data Driven – Data Prep
Challenge is usually not analysis, it is getting the data ready to analyze
76% of data scientists find data prep the least enjoyable data science task
Source:
http://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/#7ee06c277f75
Getting the data is 90% of the work
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 9
Getting to Data Driven – Reporting vs Analytics
Steve Levitt, Freakonomics Podcast, 26 June 2014
“Yeah, I think the hardest single thing is that even if you have the desire … to be
data driven, that the existing systems…I never would have thought this before I
started working with companies. I never would have imagined that it is an I.T.
problem that you simply cannot get the data you want, and the data are held in
27 different data sets that have different identifiers … the I.T. support and the
complexity in these big firms blows your mind about how hard it is to do the
littlest, simple things.”
Data analysts are NOT necessarily technologists
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 10
Survey results – Reporting vs Analytics structure
80% have dedicated reporting team with analytics function distributed
throughout organization
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 11
DCPS Structure
Data Effectiveness
12District of Columbia Public Schools | 2015 CGCS CIO Conference
Data Reporting Team
Data and
Strategy
data
analysts
How to handle workload and deal with varying levels of data quality?
Data Effectiveness
Program data
analyst
Program data
analyst
…
Central Data reporting team pulls student
data from backend systems
• Complicated data pulls
• Data reporting self-service support
Data and Strategy data analysts collaborate
with data analysis in different offices
• Assists in vetting data requests
• Volume of requests means capacity is an issue
DCPS is very data driven so reporting tends
to be bottleneck
Data Driven Decision Making
High performance data analytics…
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 13
Requires pragmatic data reporting
…in the real world of data
Data Driven Pipeline
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 14
Data
Reporting
Data
Analytics
Effective
data
Programs /
Business
Effective
decisions
Effective
outcomes
Product of business is Effective Outcomes
Product of analytics is Effective Decisions
Product of reporting is Effective Data
What is Data Effectiveness?
Data Effectiveness is primary responsibility of reporting
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 15
Data
Reporting
Effective
data
Data
Analytics
Programs /
Business
Effective
decisions
Effective
outcomes
Being effectively data driven starts with Data Effectiveness:
Getting good data, when it is needed, to who needs it
Data Reporting Issues
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 16
How does Data go wrong?
Data entry issues
• Fat fingering
• Workarounds
• Solving immediate problem without thinking about
long term consequences
• Transactional system driven by latest action not
historical data changes important for reporting
• Poor understanding of process/policy
• Student Record Duplication
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 17
Legacy data
• Different definitions year to year (period to daily absence conversion)
• Poor QA processes (ISA definition incorrect)
• System transitions (Poor data transfer strategy from previous vendors)
Data Issues 1
End of year attendance example (1 particular school)
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 18
Date report run SY13-14 End of year Average Daily Attendance (ADA)
July 2014 95%
October 2014 92%
• How could attendance for the year change after the year is over?!
• Initially assumed that there was a bug in second report
• Turns out reason behind nonsensical error was that school registrar was changing
enrollment date from Aug 2013 to Aug 2014 so that those kids did not look like
they were enrolled in the 2013-2014 school year any longer
Result:
• Students who were present in SY13-14 data in June were missing in October,
severely skewing the data
Data Issues 2
Example: Enrollment overlaps
Student Information System (SIS) is transactional system, only tracks current state
• For enrollment it doesn’t care about data values in enrollment history
• Only cares about latest enrollment action (admit or withdrawal) and school
• Actual enrollment history in system is merely log of events
• Users can willy-nilly adjust enrollment history with no effect on current status
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 19
Data Issues 3
School Dashboard vs
Weekly reports
Idea was to get more
regularly updated data
to schools
Inconsistencies
reduced trust in data
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 20
Two different queries implementing the same metric but poor data quality
meant slightly different answers. Example:
• “Current School” on student table used for dashboard queries
• Didn’t always match school based on enrollment history used in reports
Survey Results – data issues
Challenges to data reporting at your district (11 responses)
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 21
Area total max min avg stddev
Requirements - business rules are communicated clearly 36 5 2 3.27 0.96
Siloed data - correlating data from different systems 36 5 1 3.27 1.29
Validity - keeping number of data entry errors low 35 5 1 3.18 1.27
Capacity - managing the number of report requests 33 5 2 3.00 1.35
Efficiency - quick delivery of report when requested 32 4 1 2.91 0.90
Complexity - reports requests easily coded 30 4 1 2.73 1.05
Repeatability - recreating same metric in various reports 28 5 1 2.55 1.44
Veracity - data values match reality 27 3 1 2.45 0.78
Reliability - data reports do not often need rework 26 4 1 2.36 0.88
Utility - data reports are useful and relevant 25 3 1 2.27 0.86
Fixing Data Quality
How do we make our data more effective given these challenges?
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 22
Improve Data Quality long term?
Make data driven decisions today?
Long term – Data Quality Culture
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 23
Legacy issues – Once you’ve identified and fixed these, they will not return
Ongoing issue is data entry problem
• Need to balance flexibility/freedom of entry with validation checks
• Most systems can validate based on patterns or entries but do not have enough flexibility to
differentiate between other valid and invalid entries
Data Entry - Front End Validation?
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 24
At DCPS, when a school doesn’t have the access to make a needed data change
they enter a data modification request for the tech team to handle
• strictness of data entry check needs to balance against technical team capacity
Reporting relationship to schools
All central office teams support schools
Difference with reporting team is that schools are really our "data entry team"
rather than our “users”
• Successful data reporting intimately tied to their effectiveness
• Perfect system which schools are not comfortable with will still have bad data quality
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 25
Data
Analytics
Programs /
Business
Effective
decisions
Data
Reporting
Effective
data
“Data Entry Team” rather than “Users”
“Data Entry Team” is a part of things, “Users” are on the other side
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 26
Taking this point of view automatically fosters more collaboration
• Connecting the dots for schools by tracing the pathway from a specific data entry error to specific
issue on data report instead of just mandating behavior top down
• Eg duplicate Attendance Intervention entries make situation look worse because of double counting
• Data error reports include step by step how to’s specifying how issue can be fixed
• Working to include direct link to relevant student in SIS to minimize context switching
• Focus groups, feedback sessions
• Getting school staff input on how to make data entry more efficient
Users Data Entry Team
Fixing Data
Error Correction Cycle
• Feed back errors to schools for them to correct
• Central office team looks for other common data entry errors to either prevent through
front-end validation or add to error reports going to schools
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 27
Data Integrity
Management
System (soon to
be Certica Certify)
Data
Error
Report
Reporting
Platform
Aspen
(SIS)
Central office reporting and data analyst teams
Improved SIS Validations
Additional Error Patterns
Fix Data Errors
Data Integrity Management System
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 28
Today – C.A.R. cycle
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 29
Being Data Driven requires a C.A.R.
Problem is how to build a train as it’s moving down the track. Even when data
quality is not so good you still have to provide reports and make decisions, you
cannot wait until everything is perfect because that’s a moving target
Good enough is good enough but what is good enough?
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 30
Consistency
Accuracy
Relevancy
Consistency, Accuracy, Relevancy cycle
Goal is to have accurate metrics aligned with business goal
• Cannot talk about accuracy if there isn’t agreement on the value being reported
• Once the value is consistent, you can talk about if it’s accurate
• Once it’s accurate you can talk about whether it’s relevant to business goal
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 31
Metric A
Report 1: 90
Report 2: 81
Report 3: 87
Metric A
Report 1: 87
Report 2: 87
Report 3: 87
Consistent
Metric A
Report 1: 85
Report 2: 85
Report 3: 85
Metric
aligned with
goal
Not
Relevant
Determine proposed change
and go through cycle again
Accurate Relevant
DATA INFORMATION KNOWLEDGE
Consistency – DATA
“What numerical value is being shown for this metric?”
Driven by reporting
Consistency means literally just that: a metric has the same value for the same
parameters no matter who pulls it. Matching reality is not the focus at this stage
Factors
• Traceability – same metric in different reports must be traced back to same source
• Same parameters – need to be careful because different metrics could be referred to by
the same common name
• “# of absences” – unexcused? ISA? Truancy?
• Time factor – legitimate changes can be made after report is run
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 32
Wkly Unex Abs Pulled Difference
100 Oct 12 First pull
95 Oct 19 Data corrected
90 Oct 26 Suspensions approved, don’t count as unexcused
Accuracy – INFORMATION
“Is the numerical value shown for this metric correct?”
Driven by Analytics
Once you have consistency, you can work on accuracy, ie does the value reflect
what is actually happening in reality
verify by comparing against manually calculated metric or physical audits
Metric could be “inaccurate” because
• Bug in query – fix
• Wrong or inconsistent business rules – nail down definitions, two different sets of
business rules for same metric could be appropriate (eg one school year vs another)
• Data quality – identify source and reason for poor data quality, make sure to verify
calculations using only good data quality data
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 33
Relevancy – KNOWLEDGE
“Is this metric helping to meet our goal?”
Driven by business
Once you have accuracy, then you can determine whether that metric is useful.
With consistency and accuracy, it may be that you are not measuring what you
thought you were. Or what you are measuring doesn’t really impact outcome.
If not, then either business goal or metric needs to change
• Changing metric
• Use new metric – longer to get consistency, cycle could be just as long or longer
• Refine business rules of existing metric – less effort to get consistency, shorter cycle
• Changing business goal
• Effective data in hand is worth two in the bush
• Tail could be wagging the dog but unmeasurable business goal is just a wish
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 34
Cycle
As data becomes information becomes knowledge, the data sophistication of the
process grows which requires more/different metrics
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 35
Different metrics could be at different points in the cycle
Accuracy
RelevancyConsistency
Accuracy
RelevancyConsistency
Accuracy
RelevancyConsistency
Acc
RelCons
Acc
RelCons
Acc
RelCons
Acc
RelCons
Acc
RelCons
Acc
RelCons
Acc
RelCons
Acc
RelCons
Acc
RelCons
Acc
RelCons
Acc
RelCons
Acc
RelCons
Reporting Platform
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 36
Single system for operations and reporting
Previously used SIS as reporting data store and transactional system
• Made querying a bear due to complex data model for transactional system
• All reports required technical team capacity, even simple ones
• Highly normalized = even simple information was stored in a very complicated way
• All business rules were implemented in query code created by contractor
• Difficult to change when rules changed
• Often query code itself was only “documentation”
Example: find Residency Verification
select decode (afv.value,null,'N',438,'N','Y') end as "Residency
Verification SY13-14",
from students p, adhoc_fields_values afv, adhoc_fields_drop_downs afdd
where p.pupil_number = afv.pupil_number(+) and
afv.adhoc_fields_def_ID(+) = 109
and AFV.ADHOC_FIELDS_DEF_ID = AFDD.ADHOC_FIELDS_DEF_ID(+)
and afv.value = AFDD.FIELD_KEY_VALUE(+)
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 37
Reporting platform - Speed
Data model focused on reporting, not on transactions
• space vs speed tradeoff highly biased towards speed
• Virtually unlimited disk space
• Batch processing not real time
• Complete flexibility to organize data optimally for ease of reporting
• Central store for all siloed data (data-warehouse lite)
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 38
Student
Demographics
Enrollment
Attendance Base
Assessment
Course Credits
Example Transactional Data Model
Grad Progress
Simplified Reporting Data Model
Course Teachers
School Stats
Reporting platform – Ease of Use
Really nothing more than a dedicated reporting database, not data warehouse
Data model can be tailored for reporting
• Keeps track of all changes, not just latest data (valid from, valid to)
• Super flat, highly denormalized = easily understood data model
• Redundancy okay so long as we have data traceability
• Same base data stored in multiple formats/structures for different uses
• Fewer joins so can shift technical capacity to more complex business rules
• Can be exposed more directly to data analysts for increased self-service
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 39
select decode
(afv.value,null,'N',438,'N','Y') end
"Residency Verification", from students p,
adhoc_fields_values afv,
adhoc_fields_drop_downs afdd where
p.pupil_number = afv.pupil_number(+) and
afv.adhoc_fields_def_ID(+) = 109 and
AFV.ADHOC_FIELDS_DEF_ID =
AFDD.ADHOC_FIELDS_DEF_ID(+) and afv.value =
AFDD.FIELD_KEY_VALUE(+)
select [Residency
Verification] from
student_demographics_snapshot
Reporting platform - Consistency
Common processing
• Common query code centralized
• Batch ETL so can make multiple passes to pre-calculate higher order metrics
Consistent business rules
• Can have old and new metrics back-calculated as well (eg old vs new truancy rules)
• Calculate metric in one place so one number, right or wrong, is reported
Data Traceability
• Data path from systems of record to reports fully documented
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 40
Herding Kittens One Easy Powerful Cat
SSIS, SQL Server, Perl on
Virtual Machine servers
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 41
STARS
(legacy SIS)
SpEd Data Sys
Assessment
data dump
Assessment
data dump
Assessment
data dump
Assessment
data files
Assessment
data dump
Assessment
data dump
Assessment
data dump
Misc Data
Files
ELL data system
Misc SystemMisc SystemMisc System
ETL
SQL Server
Integration
Services
(SSIS),
Perl,
Manual
loads
Reporting
Platform
(MS SQL
Server)
Aspen
(Current SIS)
Data Mart
(MS SQL
Server)
Direct SQL (SQL
Server Management
Studio)
Reporting Platform Architecture
Reporting Platform Examples – Attendance base table
Based on weekly attendance report
Updated daily
Calculates individual student attendance metrics
Use values from this table whenever reporting on attendance
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 42
Metric Details
Truancy Calculates truancy based on old rules and new rules
so can compare trends
Absence Counts Period and Daily; Unexcused, Excused, In Seat
Attendance, Suspension
Attendance
Interventions
3, 5, 10, 20 day intervention letters needed and sent
Child & Family Service/Court referrals, Police Pickups
Reporting Platform Examples – enrollment matching
Enrollment admit withdraw matching
• SIS stores enrollment as separate admit and withdraw events
• Need to match admits to withdrawals for the same enrollment period and school
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 43
Admit Date Withdraw Date School
24 August 2011 24 June 2012 123
24 June 2012 10 October 2012 456
11 October 2012 1 January 3030 789
Date Type School
24 August 2011 Admit 123
24 June 2012 Withdrawal 123
24 June 2012 Admit 456
10 October 2012 Withdrawal 456
11 October 2012 Admit 789
Currently enrolled students have
“withdrawal date” in the far future
(1/1/3030) so that there is an actual date
and not a null to compare against:
(today() < [withdraw date])
as “currently enrolled”
Reporting Platform Examples - Assessment
Generally two ways we need to analyze assessments
• Single view of all assessments for a student – data in columns
• Each row is a single student for a particular school year
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 44
Student Assessment SY Score
123 A1 Q1 SY1415 90
123 A1 Q2 SY1415 80
123 A1 Q3 SY1415 70
123 A1 Q4 SY1415 100
456 A1 Sem 1 SY1415 65
Student A1 Q1 A1 Q1 A1 Q3 A1 Q4 A2 Sem 1 A2 Sem 2 SY
123 90 80 70 100 76 87 SY1415
456 60 70 80 90 65 86 SY1415
• Comparing one run of an assessment with another – data in rows
• Each row is a single assessment for a single student for a particular school year
Key is that both are processed from
the exact same data sets at the same
time so contain the same data
stored in two different structures
Reporting Platform Development
Biggest challenge was how to develop system when we had poor data quality
How could we avoid introducing more errors?
Solution
• Prioritize – Start with standard re-occurring reports (eg attendance weekly)
• Compartmentalize – Run reports using only students with no data quality issues
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 45
Aspen
(Current SIS)
Reporting Platform Development
Need to ensure that reporting platform is not introducing new errors. How?
Use only known good data to validate:
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 46
Report
validated
Fix any issues
with Reporting
platform
No discrepancies
discrepancies
Filter out
students with
bad data into
exceptions table
Reporting Platform
Report query
Standard Report
Sample Report
Why?
Compare
Bad data
students
Good data
students
Reporting Platform Development
1. Create Sample Report and compare to Standard Report (eg attendance
weekly)
2. Check for discrepancies
1. If discrepancy is due to mistake in reporting platform or query, fix it
2. If discrepancy is due to bad data, store student id in exceptions table
3. Pull Sample Report again, filtering out exception students so that only “Good
Data” is included in report
4. Continue until no discrepancies
Example “Bad Data” exceptions:
1. Overlapping enrollments
2. Absences outside of enrollment
3. Missing data
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 47
Managing Data Requests
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 48
Survey results – requestors of reports
Who is asking for data and most often with what frequency?
Counts are number of districts who report that frequency for that requester type
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 49
Who N/A Yearly Monthly Weekly Daily Ad-hoc
Parents/community 3 4 0 0 1 3
Students 6 1 0 0 2 2
Teachers 0 1 1 3 4 2
School administrators 0 0 2 4 4 1
Central office program staff 0 0 3 3 2 3
District leadership 0 0 3 5 3 0
State/Fed Dept of Ed or other
external org
1 3 2 1 2 2
Counts are number of districts who report that frequency for that requester type
Capacity vs Demand
Demand for data is ever increasing, people are hungry for data
Needed to do more with the same size team
Two Tracks
•Increase reporting efficiency
•Reduce demand on reporting team
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 50
Increase Efficiency
Users make requests via online “Data Request Tool” (DRT)
• Central point of communication with requestors for clarifications
• Tracks implementation notes and report writer assignments
• Report files attached to request along with query code
• One report can be attached to multiple requests to allow for reuse
• “Student Data Current” report available on front end
• Updated daily with common student metrics (absences, GPA, grade level, school, etc)
• User can customize columns/filters to download for themselves
• Example of some columns available:
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 51
Student_ID YTD_Unexcused_Absences Total SBT Suspension_Days
School_Name YTD_Excused_Absences Truant - still be truant?
ELL_Status YTD_ISA_Average_Attendance Truant_>=10_days
FARM_Status Membership_days Current_School_Average_Attendance
Student_Race Absences_Towards_Truancy Current_School_Excused_Absences
SPED_Status Suspension_Absences_Days Current_School_ISA_Average_Attendance
Increase Efficiency
“Data Request Tool” (DRT)
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 52
Increase Efficiency
Data Librarian is first point of contact for requests to reporting team
• Dedicated FTE position
• Clarifies request requirements
• Is there an already completed report that can fulfill this request?
• Acts as gatekeeper to qualify requests before they hit reporting capacity
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 53
Program
needs data
Standard Report?
Student Data Current?
Program Enters
Data Request
Data Librarian
clarifies request
Report
Created
Report Writer
assigned
Report
Reviewed
Existing report
available?
Report
Delivered
Self Service Reporting
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 54
Self Service Reporting
Goal was to provide self-service reporting to analysts while ensuring consistency
• Giving them raw access to reporting platform is too overwhelming
• Analysts are not database developers/DBAs
• Requires SQL skills: eg would still need joins, aggregations to get meaningful data
• Creating dedicated pull of custom data would mean another thing to maintain
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 55
Crawl before we can we walk,
Walk before we can run,
Run before we can fly
Self Service Reporting
Solution was to rely on already existing standard reports
• Enrollment Daily, Attendance Weekly, ACGR, Student Data Current
Weekly snapshot of each report was saved into a dedicated “data mart”
• Analysts were already used to seeing these reports so no learning curve
• These were official reports so data was guaranteed to match our official numbers
• Added benefit of saving historically reported official numbers
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 56
Not quite flying yet, but closer…
“Data Mart” example - Enrollment
Report #1612 is a daily report of the enrollment for every student in DCPS
• Forms the basis of how enrollment is monitored throughout the year
• Especially important during pre-enrollment for upcoming school year
• Also forms basis for denominator when calculating “percent of total students” metrics
• Data mart data model has exact same columns as DRT with addition of “report date”
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 57
Enrollment
Daily report
Reporting
Platform
Direct SQL
Enrollment
Data Mart
Enrollment DR 8/24
Enrollment DR 8/31
Enrollment DR 9/7
Enrollment DR 9/14
Enrollment DR 6/13
…
Data Analyst
Report requests hitting report writers
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 58
0
20
40
60
80
100
120
Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul
Report Writer Data Requests per Month
SY12-13 SY13-14 SY14-15 SY15-16
More self-service reporting and standardized reports
• Fewer ad-hoc requests for standard data
• Reporting capacity can be spent on more complex requests
Summary
District of Columbia Public Schools | 2015 CGCS CIO Conference 59
Data Effectiveness
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 60
Data
Reporting
Effective
data
Data
Analytics
Programs
/ Business
Effective
decisions
Effective
outcomes
Data Driven Decision Making starts with Data Effectiveness
How to make data more effective
Consistency first, then Accuracy, then Relevancy
Data Effectiveness
District of Columbia Public Schools | 2015 CGCS CIO Conference 61
Metric A
Report 1: 90
Report 2: 81
Report 3: 87
Metric A
Report 1: 87
Report 2: 87
Report 3: 87
Consistent
Metric A
Report 1: 85
Report 2: 85
Report 3: 85
Metric
aligned with
goal
Accurate Relevant
Improve data quality by seeing School Staff as "data entry team" instead of “users”
Users Data Entry Team
Take Aways
Meet your data where it is today and build to where you want to be
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 62
Take some time to do the
work today that will help
you tomorrow…
•Data Quality Culture
•C.A.R. Cycle
•Data Request Process
There’s flying and then
there’s flying. Good enough
is probably good enough.
•Reporting Platform
•Data Marts
Questions?
andrew.patricio@dataeffectiveness.com
@dataeffectively
blog.dataeffectiveness.com
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 63
Appendicies
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 64
State of Data at DCPS
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 65
DCPS Data Systems Evolution
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 66
10+ years ago
•Critical data tracked poorly
or not at all
•Heavy manual effort in
using data to support
operations
•Very difficult to pull ad hoc
reports or change existing
reports
•Only way to do trend
reporting was via excel
5 years ago
•Systems of record exist but
data flow is both one way
and not easy
•All critical data is tracked
somewhere but in separate
systems
•Very Ad-hoc reporting,
every request was it’s own
query, “reuse” is copy-paste
•Trend reporting done from
yearly snapshots
Today
•Consolidated view of
student, employee, and
school data
•Robust reporting
infrastructure increases
capacity
•More self-service reporting
available
•Data quality better
•All changes tracked means
trend reporting much easier
Evolution from poor data capture to
Mainframe
replaced by
Oracle forms
System
(AAL eSIS)
SIS upgraded
(Follett Aspen)
Separate
reporting
database
created
Data Capture
Data Quality
Reporting ROI
Reporting requests – Example 1
Average of 80 complex data reporting data requests per month
Attendance Weekly
• Summarizes weekly and YTD attendance for every student at DCPS
• Absence stats: In Seat Attendance (ISA), Truancy, Unexcused Absences
• Intervention: 3, 5, 10, 15, 20 day attendance letters/meetings/etc
• Also includes behavior stats: Suspension, Suspension days
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 67
Reporting requests – Example 2
Adjusted Cohort Graduation Rate – calculates individual graduation progress per
credit for all high school students, flags degree of being off track
• Tracks 4 year cohorts across all high schools at DCPS
• Looks at grades in currently scheduled courses as well as credits received
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 68
On Grade Level On-Track 4 Core Subjects Enrolled in English III
Grade On-Track English Passing English III
On Potential Grad List On-Track Math Enrolled in English IV
Total # Credits On-Track Science Passing English IV
Credits Needed to Graduate On-Track Social Studies Enrolled in Math
English Cumulative Math Cumulative Passing Math
Example
Columns
Survey Demographics
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 69
Survey results – District Sizes
Survey conducted via CGCS CIO Mailing list: 11 Responses Total
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 70
Data Sophistication
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 71
Data Sophistication Cycle
Results oriented incompatible with data driven?
• In a results-oriented organization the push is to “get things done” and the
velocity of the need often makes it difficult for data systems to keep up.
• And as a result the data driven aspect gets starved of food
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 72
Data Sophistication Cycle
Data capture system evolves along with process sophistication
Reporting sophistication should keep pace with data quality
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 73
Example Data Entry
System
Key Data
structure
Process
Sophistication
Data
Quality
Reporting
Sophistication
Notepad Open entry
Excel Data cells
MS Access Data records
Student Information
System (SIS)
Normalized data
model
Reporting system
separate from SIS
Reporting data
model
Don’t build a formal data warehouse for excel “data systems”!
Data Effectiveness.
Don’t over engineer tracking system, should lead but not exceed process maturity.
Example
• Relative vs Absolute metrics
• High stakes (IMPACT) vs experimental (RTI)
Different groups or initiatives in the same organization could be at different points
in the cycle
Make sure you identify what level a particular need is at, no need for data tracking
or reporting to be more sophisticated than the business process in question
Data Effectiveness
Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 74

More Related Content

What's hot

Big Data, Big Problems: Avoid System Failure with Quality Analysis - Webinar ...
Big Data, Big Problems: Avoid System Failure with Quality Analysis - Webinar ...Big Data, Big Problems: Avoid System Failure with Quality Analysis - Webinar ...
Big Data, Big Problems: Avoid System Failure with Quality Analysis - Webinar ...
CAST
 
Design Science and Solution Architecture
Design Science and Solution ArchitectureDesign Science and Solution Architecture
Design Science and Solution Architecture
Alan McSweeney
 
Outsourcing &amp; Cloud Computing
Outsourcing &amp; Cloud ComputingOutsourcing &amp; Cloud Computing
Outsourcing &amp; Cloud Computing
visionetinternasional
 
CSI Content: Who Killed ECM?
CSI Content: Who Killed ECM?CSI Content: Who Killed ECM?
CSI Content: Who Killed ECM?
John Newton
 
User Experience as an Organizational Development Tool
User Experience as an Organizational Development ToolUser Experience as an Organizational Development Tool
User Experience as an Organizational Development Tool
Donovan Chandler
 
Solution Architecture And User And Customer Experience
Solution Architecture And User And Customer ExperienceSolution Architecture And User And Customer Experience
Solution Architecture And User And Customer Experience
Alan McSweeney
 
Webinar on 4th Industrial Revolution, IoT and RPA
Webinar on 4th Industrial Revolution, IoT and RPAWebinar on 4th Industrial Revolution, IoT and RPA
Webinar on 4th Industrial Revolution, IoT and RPA
Redwan Ferdous
 
Shadow IT And The Failure Of IT Architecture
Shadow IT And The Failure Of IT ArchitectureShadow IT And The Failure Of IT Architecture
Shadow IT And The Failure Of IT Architecture
Alan McSweeney
 
Program eval webinar final v2
Program eval webinar final v2Program eval webinar final v2
Program eval webinar final v2
Nptimes
 
Predictive Analytics in Practice - Breakfast Club 11th May 2017
Predictive Analytics in Practice - Breakfast Club 11th May 2017Predictive Analytics in Practice - Breakfast Club 11th May 2017
Predictive Analytics in Practice - Breakfast Club 11th May 2017
Bilot
 
Investing Intelligently In The IT Function
Investing Intelligently In The IT FunctionInvesting Intelligently In The IT Function
Investing Intelligently In The IT Function
Alan McSweeney
 
Operational Risk Management Data Validation Architecture
Operational Risk Management Data Validation ArchitectureOperational Risk Management Data Validation Architecture
Operational Risk Management Data Validation Architecture
Alan McSweeney
 
Solution Architecture and Solution Acquisition
Solution Architecture and Solution AcquisitionSolution Architecture and Solution Acquisition
Solution Architecture and Solution Acquisition
Alan McSweeney
 
Dialogue Tool for Value Creation in Digital Transformation: Roadmapping for...
Dialogue Tool for Value Creation  in Digital Transformation:  Roadmapping for...Dialogue Tool for Value Creation  in Digital Transformation:  Roadmapping for...
Dialogue Tool for Value Creation in Digital Transformation: Roadmapping for...
Naoshi Uchihira
 
Agile Solution Architecture and Design
Agile Solution Architecture and DesignAgile Solution Architecture and Design
Agile Solution Architecture and Design
Alan McSweeney
 
KSA Business Intelligence Qualifications
KSA Business Intelligence QualificationsKSA Business Intelligence Qualifications
KSA Business Intelligence Qualifications
JDOLIV
 
The Centre Cannot Hold: Making IT Architecture Relevant In A Post IT World
The Centre Cannot Hold: Making IT Architecture Relevant In A Post IT WorldThe Centre Cannot Hold: Making IT Architecture Relevant In A Post IT World
The Centre Cannot Hold: Making IT Architecture Relevant In A Post IT World
Alan McSweeney
 
Identifying knowledge value measurement in a company - june 2006
Identifying knowledge value measurement in a company - june 2006Identifying knowledge value measurement in a company - june 2006
Identifying knowledge value measurement in a company - june 2006
Epistema
 
Translating Big Raw Data Into Small Actionable Information
Translating Big Raw Data Into Small Actionable InformationTranslating Big Raw Data Into Small Actionable Information
Translating Big Raw Data Into Small Actionable Information
Alan McSweeney
 
Digital Enterprise Architecture: Four Elements Critical to Solution Envisioning
Digital Enterprise Architecture: Four Elements Critical to Solution EnvisioningDigital Enterprise Architecture: Four Elements Critical to Solution Envisioning
Digital Enterprise Architecture: Four Elements Critical to Solution Envisioning
Cognizant
 

What's hot (20)

Big Data, Big Problems: Avoid System Failure with Quality Analysis - Webinar ...
Big Data, Big Problems: Avoid System Failure with Quality Analysis - Webinar ...Big Data, Big Problems: Avoid System Failure with Quality Analysis - Webinar ...
Big Data, Big Problems: Avoid System Failure with Quality Analysis - Webinar ...
 
Design Science and Solution Architecture
Design Science and Solution ArchitectureDesign Science and Solution Architecture
Design Science and Solution Architecture
 
Outsourcing &amp; Cloud Computing
Outsourcing &amp; Cloud ComputingOutsourcing &amp; Cloud Computing
Outsourcing &amp; Cloud Computing
 
CSI Content: Who Killed ECM?
CSI Content: Who Killed ECM?CSI Content: Who Killed ECM?
CSI Content: Who Killed ECM?
 
User Experience as an Organizational Development Tool
User Experience as an Organizational Development ToolUser Experience as an Organizational Development Tool
User Experience as an Organizational Development Tool
 
Solution Architecture And User And Customer Experience
Solution Architecture And User And Customer ExperienceSolution Architecture And User And Customer Experience
Solution Architecture And User And Customer Experience
 
Webinar on 4th Industrial Revolution, IoT and RPA
Webinar on 4th Industrial Revolution, IoT and RPAWebinar on 4th Industrial Revolution, IoT and RPA
Webinar on 4th Industrial Revolution, IoT and RPA
 
Shadow IT And The Failure Of IT Architecture
Shadow IT And The Failure Of IT ArchitectureShadow IT And The Failure Of IT Architecture
Shadow IT And The Failure Of IT Architecture
 
Program eval webinar final v2
Program eval webinar final v2Program eval webinar final v2
Program eval webinar final v2
 
Predictive Analytics in Practice - Breakfast Club 11th May 2017
Predictive Analytics in Practice - Breakfast Club 11th May 2017Predictive Analytics in Practice - Breakfast Club 11th May 2017
Predictive Analytics in Practice - Breakfast Club 11th May 2017
 
Investing Intelligently In The IT Function
Investing Intelligently In The IT FunctionInvesting Intelligently In The IT Function
Investing Intelligently In The IT Function
 
Operational Risk Management Data Validation Architecture
Operational Risk Management Data Validation ArchitectureOperational Risk Management Data Validation Architecture
Operational Risk Management Data Validation Architecture
 
Solution Architecture and Solution Acquisition
Solution Architecture and Solution AcquisitionSolution Architecture and Solution Acquisition
Solution Architecture and Solution Acquisition
 
Dialogue Tool for Value Creation in Digital Transformation: Roadmapping for...
Dialogue Tool for Value Creation  in Digital Transformation:  Roadmapping for...Dialogue Tool for Value Creation  in Digital Transformation:  Roadmapping for...
Dialogue Tool for Value Creation in Digital Transformation: Roadmapping for...
 
Agile Solution Architecture and Design
Agile Solution Architecture and DesignAgile Solution Architecture and Design
Agile Solution Architecture and Design
 
KSA Business Intelligence Qualifications
KSA Business Intelligence QualificationsKSA Business Intelligence Qualifications
KSA Business Intelligence Qualifications
 
The Centre Cannot Hold: Making IT Architecture Relevant In A Post IT World
The Centre Cannot Hold: Making IT Architecture Relevant In A Post IT WorldThe Centre Cannot Hold: Making IT Architecture Relevant In A Post IT World
The Centre Cannot Hold: Making IT Architecture Relevant In A Post IT World
 
Identifying knowledge value measurement in a company - june 2006
Identifying knowledge value measurement in a company - june 2006Identifying knowledge value measurement in a company - june 2006
Identifying knowledge value measurement in a company - june 2006
 
Translating Big Raw Data Into Small Actionable Information
Translating Big Raw Data Into Small Actionable InformationTranslating Big Raw Data Into Small Actionable Information
Translating Big Raw Data Into Small Actionable Information
 
Digital Enterprise Architecture: Four Elements Critical to Solution Envisioning
Digital Enterprise Architecture: Four Elements Critical to Solution EnvisioningDigital Enterprise Architecture: Four Elements Critical to Solution Envisioning
Digital Enterprise Architecture: Four Elements Critical to Solution Envisioning
 

Similar to Data Effectiveness: How to build a Data Driven and Reporting infrastructure

Practical Data Strategies in the real world of poor Data Quality
Practical Data Strategies in the real world of poor Data QualityPractical Data Strategies in the real world of poor Data Quality
Practical Data Strategies in the real world of poor Data Quality
Andrew Patricio
 
DataEd Slides: Data Management Best Practices
DataEd Slides: Data Management Best PracticesDataEd Slides: Data Management Best Practices
DataEd Slides: Data Management Best Practices
DATAVERSITY
 
Tx data system deployment project
Tx data system deployment projectTx data system deployment project
Tx data system deployment projectAnn Tarasena
 
Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality EngineeringData-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality EngineeringDATAVERSITY
 
Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality Engineering Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality Engineering
Data Blueprint
 
Rise of the Data Democracy
Rise of the Data DemocracyRise of the Data Democracy
Rise of the Data Democracy
Brendan Aldrich
 
Is your bi system fit for purpose?
Is your bi system fit for purpose?Is your bi system fit for purpose?
Is your bi system fit for purpose?Jisc
 
Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...
Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...
Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...
Data Science Society
 
MSF Paper Final Draft_MIrvin_Oct14
MSF Paper Final Draft_MIrvin_Oct14MSF Paper Final Draft_MIrvin_Oct14
MSF Paper Final Draft_MIrvin_Oct14Melissa Irvin, Ed.D.
 
Introduction
IntroductionIntroduction
Introduction
Lee Schlenker
 
Predictive Analytics - How to get stuff out of your Crystal Ball
Predictive Analytics - How to get stuff out of your Crystal BallPredictive Analytics - How to get stuff out of your Crystal Ball
Predictive Analytics - How to get stuff out of your Crystal Ball
DATAVERSITY
 
Student Activity Hub community Meeting 10-25-2017
Student Activity Hub community Meeting 10-25-2017Student Activity Hub community Meeting 10-25-2017
Student Activity Hub community Meeting 10-25-2017
Brett Pollak
 
Founding a Data Democracy: How Ivy Tech is Leading a Revolution in Higher Edu...
Founding a Data Democracy: How Ivy Tech is Leading a Revolution in Higher Edu...Founding a Data Democracy: How Ivy Tech is Leading a Revolution in Higher Edu...
Founding a Data Democracy: How Ivy Tech is Leading a Revolution in Higher Edu...
Brendan Aldrich
 
Data-Ed Webinar: Implementing the Data Management Maturity Model (DMM) - With...
Data-Ed Webinar: Implementing the Data Management Maturity Model (DMM) - With...Data-Ed Webinar: Implementing the Data Management Maturity Model (DMM) - With...
Data-Ed Webinar: Implementing the Data Management Maturity Model (DMM) - With...
DATAVERSITY
 
Who Owns Faculty Data?: Fairness and transparency in UCLA's new academic HR s...
Who Owns Faculty Data?: Fairness and transparency in UCLA's new academic HR s...Who Owns Faculty Data?: Fairness and transparency in UCLA's new academic HR s...
Who Owns Faculty Data?: Fairness and transparency in UCLA's new academic HR s...
chloejreynolds
 
Identifying and Tracking Trends in Instructional Design and Technology
Identifying and Tracking Trends in Instructional Design and TechnologyIdentifying and Tracking Trends in Instructional Design and Technology
Identifying and Tracking Trends in Instructional Design and Technology
Fabrizio Fornara
 
Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your DataFoundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Precisely
 
TCDSB Information and Technology Strategy
TCDSB Information and Technology StrategyTCDSB Information and Technology Strategy
TCDSB Information and Technology Strategy
Steve Camacho, MBA
 
Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...
Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...
Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...
DATAVERSITY
 

Similar to Data Effectiveness: How to build a Data Driven and Reporting infrastructure (20)

Practical Data Strategies in the real world of poor Data Quality
Practical Data Strategies in the real world of poor Data QualityPractical Data Strategies in the real world of poor Data Quality
Practical Data Strategies in the real world of poor Data Quality
 
DataEd Slides: Data Management Best Practices
DataEd Slides: Data Management Best PracticesDataEd Slides: Data Management Best Practices
DataEd Slides: Data Management Best Practices
 
Tx data system deployment project
Tx data system deployment projectTx data system deployment project
Tx data system deployment project
 
Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality EngineeringData-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality Engineering
 
Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality Engineering Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality Engineering
 
Rise of the Data Democracy
Rise of the Data DemocracyRise of the Data Democracy
Rise of the Data Democracy
 
Is your bi system fit for purpose?
Is your bi system fit for purpose?Is your bi system fit for purpose?
Is your bi system fit for purpose?
 
Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...
Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...
Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...
 
MSF Paper Final Draft_MIrvin_Oct14
MSF Paper Final Draft_MIrvin_Oct14MSF Paper Final Draft_MIrvin_Oct14
MSF Paper Final Draft_MIrvin_Oct14
 
Introduction
IntroductionIntroduction
Introduction
 
Predictive Analytics - How to get stuff out of your Crystal Ball
Predictive Analytics - How to get stuff out of your Crystal BallPredictive Analytics - How to get stuff out of your Crystal Ball
Predictive Analytics - How to get stuff out of your Crystal Ball
 
Data Collection
Data CollectionData Collection
Data Collection
 
Student Activity Hub community Meeting 10-25-2017
Student Activity Hub community Meeting 10-25-2017Student Activity Hub community Meeting 10-25-2017
Student Activity Hub community Meeting 10-25-2017
 
Founding a Data Democracy: How Ivy Tech is Leading a Revolution in Higher Edu...
Founding a Data Democracy: How Ivy Tech is Leading a Revolution in Higher Edu...Founding a Data Democracy: How Ivy Tech is Leading a Revolution in Higher Edu...
Founding a Data Democracy: How Ivy Tech is Leading a Revolution in Higher Edu...
 
Data-Ed Webinar: Implementing the Data Management Maturity Model (DMM) - With...
Data-Ed Webinar: Implementing the Data Management Maturity Model (DMM) - With...Data-Ed Webinar: Implementing the Data Management Maturity Model (DMM) - With...
Data-Ed Webinar: Implementing the Data Management Maturity Model (DMM) - With...
 
Who Owns Faculty Data?: Fairness and transparency in UCLA's new academic HR s...
Who Owns Faculty Data?: Fairness and transparency in UCLA's new academic HR s...Who Owns Faculty Data?: Fairness and transparency in UCLA's new academic HR s...
Who Owns Faculty Data?: Fairness and transparency in UCLA's new academic HR s...
 
Identifying and Tracking Trends in Instructional Design and Technology
Identifying and Tracking Trends in Instructional Design and TechnologyIdentifying and Tracking Trends in Instructional Design and Technology
Identifying and Tracking Trends in Instructional Design and Technology
 
Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your DataFoundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
 
TCDSB Information and Technology Strategy
TCDSB Information and Technology StrategyTCDSB Information and Technology Strategy
TCDSB Information and Technology Strategy
 
Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...
Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...
Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...
 

Recently uploaded

一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
2023240532
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 

Recently uploaded (20)

一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 

Data Effectiveness: How to build a Data Driven and Reporting infrastructure

  • 1. Andrew Pa tricio | T 5 7 1.216.2003 | www.dataeffec tiveness .com Data Effectiveness The Consistency, Accuracy, Relevancy cycle Council of Great City Schools Annual Academic, Information Technology and Research Conference 11 July 2016
  • 3. Agenda Introduction What is Data Effectiveness? Data Reporting Issues Data Quality Culture Consistency, Relevancy, Accuracy Reporting Platform Managing Data Requests Self Service Reporting Summary Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 3
  • 4. DCPS Facts Led by Chancellor Kaya Henderson since 2010 48000+ students (steady increase for the past 5 years) in 111 schools • 60 Elementary Schools • 11 Middle Schools • 18 Education Campuses (usually Preschool through 8th grade) • 15 High Schools • 7 other (Special Ed etc) 6800+ school staff (3600+ teachers) 600+ central office staff Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 4
  • 5. About Me – Andrew Patricio (former) Deputy Chief for Data Systems • Nov 2010 to June 2016 • During most of my tenure was the most senior technical resource at DCPS (DCPS Deputy CIO created this year) Personal background • BS in Electrical Engineering • IT & management consulting • Current: www.dataeffectiveness.com • Data Strategy Advising • Building Reporting Infrastructure • Helping Improve Data Quality Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 5
  • 7. Data Driven Decision Making All organizations seek to make decisions based on data Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 7
  • 8. Data Reality But the reality is that the data we have available is often in poor shape Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 8
  • 9. Getting to Data Driven – Data Prep Challenge is usually not analysis, it is getting the data ready to analyze 76% of data scientists find data prep the least enjoyable data science task Source: http://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/#7ee06c277f75 Getting the data is 90% of the work Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 9
  • 10. Getting to Data Driven – Reporting vs Analytics Steve Levitt, Freakonomics Podcast, 26 June 2014 “Yeah, I think the hardest single thing is that even if you have the desire … to be data driven, that the existing systems…I never would have thought this before I started working with companies. I never would have imagined that it is an I.T. problem that you simply cannot get the data you want, and the data are held in 27 different data sets that have different identifiers … the I.T. support and the complexity in these big firms blows your mind about how hard it is to do the littlest, simple things.” Data analysts are NOT necessarily technologists Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 10
  • 11. Survey results – Reporting vs Analytics structure 80% have dedicated reporting team with analytics function distributed throughout organization Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 11
  • 12. DCPS Structure Data Effectiveness 12District of Columbia Public Schools | 2015 CGCS CIO Conference Data Reporting Team Data and Strategy data analysts How to handle workload and deal with varying levels of data quality? Data Effectiveness Program data analyst Program data analyst … Central Data reporting team pulls student data from backend systems • Complicated data pulls • Data reporting self-service support Data and Strategy data analysts collaborate with data analysis in different offices • Assists in vetting data requests • Volume of requests means capacity is an issue DCPS is very data driven so reporting tends to be bottleneck
  • 13. Data Driven Decision Making High performance data analytics… Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 13 Requires pragmatic data reporting …in the real world of data
  • 14. Data Driven Pipeline Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 14 Data Reporting Data Analytics Effective data Programs / Business Effective decisions Effective outcomes Product of business is Effective Outcomes Product of analytics is Effective Decisions Product of reporting is Effective Data
  • 15. What is Data Effectiveness? Data Effectiveness is primary responsibility of reporting Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 15 Data Reporting Effective data Data Analytics Programs / Business Effective decisions Effective outcomes Being effectively data driven starts with Data Effectiveness: Getting good data, when it is needed, to who needs it
  • 16. Data Reporting Issues Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 16
  • 17. How does Data go wrong? Data entry issues • Fat fingering • Workarounds • Solving immediate problem without thinking about long term consequences • Transactional system driven by latest action not historical data changes important for reporting • Poor understanding of process/policy • Student Record Duplication Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 17 Legacy data • Different definitions year to year (period to daily absence conversion) • Poor QA processes (ISA definition incorrect) • System transitions (Poor data transfer strategy from previous vendors)
  • 18. Data Issues 1 End of year attendance example (1 particular school) Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 18 Date report run SY13-14 End of year Average Daily Attendance (ADA) July 2014 95% October 2014 92% • How could attendance for the year change after the year is over?! • Initially assumed that there was a bug in second report • Turns out reason behind nonsensical error was that school registrar was changing enrollment date from Aug 2013 to Aug 2014 so that those kids did not look like they were enrolled in the 2013-2014 school year any longer Result: • Students who were present in SY13-14 data in June were missing in October, severely skewing the data
  • 19. Data Issues 2 Example: Enrollment overlaps Student Information System (SIS) is transactional system, only tracks current state • For enrollment it doesn’t care about data values in enrollment history • Only cares about latest enrollment action (admit or withdrawal) and school • Actual enrollment history in system is merely log of events • Users can willy-nilly adjust enrollment history with no effect on current status Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 19
  • 20. Data Issues 3 School Dashboard vs Weekly reports Idea was to get more regularly updated data to schools Inconsistencies reduced trust in data Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 20 Two different queries implementing the same metric but poor data quality meant slightly different answers. Example: • “Current School” on student table used for dashboard queries • Didn’t always match school based on enrollment history used in reports
  • 21. Survey Results – data issues Challenges to data reporting at your district (11 responses) Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 21 Area total max min avg stddev Requirements - business rules are communicated clearly 36 5 2 3.27 0.96 Siloed data - correlating data from different systems 36 5 1 3.27 1.29 Validity - keeping number of data entry errors low 35 5 1 3.18 1.27 Capacity - managing the number of report requests 33 5 2 3.00 1.35 Efficiency - quick delivery of report when requested 32 4 1 2.91 0.90 Complexity - reports requests easily coded 30 4 1 2.73 1.05 Repeatability - recreating same metric in various reports 28 5 1 2.55 1.44 Veracity - data values match reality 27 3 1 2.45 0.78 Reliability - data reports do not often need rework 26 4 1 2.36 0.88 Utility - data reports are useful and relevant 25 3 1 2.27 0.86
  • 22. Fixing Data Quality How do we make our data more effective given these challenges? Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 22 Improve Data Quality long term? Make data driven decisions today?
  • 23. Long term – Data Quality Culture Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 23
  • 24. Legacy issues – Once you’ve identified and fixed these, they will not return Ongoing issue is data entry problem • Need to balance flexibility/freedom of entry with validation checks • Most systems can validate based on patterns or entries but do not have enough flexibility to differentiate between other valid and invalid entries Data Entry - Front End Validation? Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 24 At DCPS, when a school doesn’t have the access to make a needed data change they enter a data modification request for the tech team to handle • strictness of data entry check needs to balance against technical team capacity
  • 25. Reporting relationship to schools All central office teams support schools Difference with reporting team is that schools are really our "data entry team" rather than our “users” • Successful data reporting intimately tied to their effectiveness • Perfect system which schools are not comfortable with will still have bad data quality Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 25 Data Analytics Programs / Business Effective decisions Data Reporting Effective data
  • 26. “Data Entry Team” rather than “Users” “Data Entry Team” is a part of things, “Users” are on the other side Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 26 Taking this point of view automatically fosters more collaboration • Connecting the dots for schools by tracing the pathway from a specific data entry error to specific issue on data report instead of just mandating behavior top down • Eg duplicate Attendance Intervention entries make situation look worse because of double counting • Data error reports include step by step how to’s specifying how issue can be fixed • Working to include direct link to relevant student in SIS to minimize context switching • Focus groups, feedback sessions • Getting school staff input on how to make data entry more efficient Users Data Entry Team
  • 27. Fixing Data Error Correction Cycle • Feed back errors to schools for them to correct • Central office team looks for other common data entry errors to either prevent through front-end validation or add to error reports going to schools Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 27 Data Integrity Management System (soon to be Certica Certify) Data Error Report Reporting Platform Aspen (SIS) Central office reporting and data analyst teams Improved SIS Validations Additional Error Patterns Fix Data Errors
  • 28. Data Integrity Management System Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 28
  • 29. Today – C.A.R. cycle Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 29
  • 30. Being Data Driven requires a C.A.R. Problem is how to build a train as it’s moving down the track. Even when data quality is not so good you still have to provide reports and make decisions, you cannot wait until everything is perfect because that’s a moving target Good enough is good enough but what is good enough? Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 30 Consistency Accuracy Relevancy
  • 31. Consistency, Accuracy, Relevancy cycle Goal is to have accurate metrics aligned with business goal • Cannot talk about accuracy if there isn’t agreement on the value being reported • Once the value is consistent, you can talk about if it’s accurate • Once it’s accurate you can talk about whether it’s relevant to business goal Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 31 Metric A Report 1: 90 Report 2: 81 Report 3: 87 Metric A Report 1: 87 Report 2: 87 Report 3: 87 Consistent Metric A Report 1: 85 Report 2: 85 Report 3: 85 Metric aligned with goal Not Relevant Determine proposed change and go through cycle again Accurate Relevant DATA INFORMATION KNOWLEDGE
  • 32. Consistency – DATA “What numerical value is being shown for this metric?” Driven by reporting Consistency means literally just that: a metric has the same value for the same parameters no matter who pulls it. Matching reality is not the focus at this stage Factors • Traceability – same metric in different reports must be traced back to same source • Same parameters – need to be careful because different metrics could be referred to by the same common name • “# of absences” – unexcused? ISA? Truancy? • Time factor – legitimate changes can be made after report is run Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 32 Wkly Unex Abs Pulled Difference 100 Oct 12 First pull 95 Oct 19 Data corrected 90 Oct 26 Suspensions approved, don’t count as unexcused
  • 33. Accuracy – INFORMATION “Is the numerical value shown for this metric correct?” Driven by Analytics Once you have consistency, you can work on accuracy, ie does the value reflect what is actually happening in reality verify by comparing against manually calculated metric or physical audits Metric could be “inaccurate” because • Bug in query – fix • Wrong or inconsistent business rules – nail down definitions, two different sets of business rules for same metric could be appropriate (eg one school year vs another) • Data quality – identify source and reason for poor data quality, make sure to verify calculations using only good data quality data Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 33
  • 34. Relevancy – KNOWLEDGE “Is this metric helping to meet our goal?” Driven by business Once you have accuracy, then you can determine whether that metric is useful. With consistency and accuracy, it may be that you are not measuring what you thought you were. Or what you are measuring doesn’t really impact outcome. If not, then either business goal or metric needs to change • Changing metric • Use new metric – longer to get consistency, cycle could be just as long or longer • Refine business rules of existing metric – less effort to get consistency, shorter cycle • Changing business goal • Effective data in hand is worth two in the bush • Tail could be wagging the dog but unmeasurable business goal is just a wish Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 34
  • 35. Cycle As data becomes information becomes knowledge, the data sophistication of the process grows which requires more/different metrics Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 35 Different metrics could be at different points in the cycle Accuracy RelevancyConsistency Accuracy RelevancyConsistency Accuracy RelevancyConsistency Acc RelCons Acc RelCons Acc RelCons Acc RelCons Acc RelCons Acc RelCons Acc RelCons Acc RelCons Acc RelCons Acc RelCons Acc RelCons Acc RelCons
  • 36. Reporting Platform Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 36
  • 37. Single system for operations and reporting Previously used SIS as reporting data store and transactional system • Made querying a bear due to complex data model for transactional system • All reports required technical team capacity, even simple ones • Highly normalized = even simple information was stored in a very complicated way • All business rules were implemented in query code created by contractor • Difficult to change when rules changed • Often query code itself was only “documentation” Example: find Residency Verification select decode (afv.value,null,'N',438,'N','Y') end as "Residency Verification SY13-14", from students p, adhoc_fields_values afv, adhoc_fields_drop_downs afdd where p.pupil_number = afv.pupil_number(+) and afv.adhoc_fields_def_ID(+) = 109 and AFV.ADHOC_FIELDS_DEF_ID = AFDD.ADHOC_FIELDS_DEF_ID(+) and afv.value = AFDD.FIELD_KEY_VALUE(+) Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 37
  • 38. Reporting platform - Speed Data model focused on reporting, not on transactions • space vs speed tradeoff highly biased towards speed • Virtually unlimited disk space • Batch processing not real time • Complete flexibility to organize data optimally for ease of reporting • Central store for all siloed data (data-warehouse lite) Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 38 Student Demographics Enrollment Attendance Base Assessment Course Credits Example Transactional Data Model Grad Progress Simplified Reporting Data Model Course Teachers School Stats
  • 39. Reporting platform – Ease of Use Really nothing more than a dedicated reporting database, not data warehouse Data model can be tailored for reporting • Keeps track of all changes, not just latest data (valid from, valid to) • Super flat, highly denormalized = easily understood data model • Redundancy okay so long as we have data traceability • Same base data stored in multiple formats/structures for different uses • Fewer joins so can shift technical capacity to more complex business rules • Can be exposed more directly to data analysts for increased self-service Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 39 select decode (afv.value,null,'N',438,'N','Y') end "Residency Verification", from students p, adhoc_fields_values afv, adhoc_fields_drop_downs afdd where p.pupil_number = afv.pupil_number(+) and afv.adhoc_fields_def_ID(+) = 109 and AFV.ADHOC_FIELDS_DEF_ID = AFDD.ADHOC_FIELDS_DEF_ID(+) and afv.value = AFDD.FIELD_KEY_VALUE(+) select [Residency Verification] from student_demographics_snapshot
  • 40. Reporting platform - Consistency Common processing • Common query code centralized • Batch ETL so can make multiple passes to pre-calculate higher order metrics Consistent business rules • Can have old and new metrics back-calculated as well (eg old vs new truancy rules) • Calculate metric in one place so one number, right or wrong, is reported Data Traceability • Data path from systems of record to reports fully documented Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 40 Herding Kittens One Easy Powerful Cat
  • 41. SSIS, SQL Server, Perl on Virtual Machine servers Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 41 STARS (legacy SIS) SpEd Data Sys Assessment data dump Assessment data dump Assessment data dump Assessment data files Assessment data dump Assessment data dump Assessment data dump Misc Data Files ELL data system Misc SystemMisc SystemMisc System ETL SQL Server Integration Services (SSIS), Perl, Manual loads Reporting Platform (MS SQL Server) Aspen (Current SIS) Data Mart (MS SQL Server) Direct SQL (SQL Server Management Studio) Reporting Platform Architecture
  • 42. Reporting Platform Examples – Attendance base table Based on weekly attendance report Updated daily Calculates individual student attendance metrics Use values from this table whenever reporting on attendance Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 42 Metric Details Truancy Calculates truancy based on old rules and new rules so can compare trends Absence Counts Period and Daily; Unexcused, Excused, In Seat Attendance, Suspension Attendance Interventions 3, 5, 10, 20 day intervention letters needed and sent Child & Family Service/Court referrals, Police Pickups
  • 43. Reporting Platform Examples – enrollment matching Enrollment admit withdraw matching • SIS stores enrollment as separate admit and withdraw events • Need to match admits to withdrawals for the same enrollment period and school Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 43 Admit Date Withdraw Date School 24 August 2011 24 June 2012 123 24 June 2012 10 October 2012 456 11 October 2012 1 January 3030 789 Date Type School 24 August 2011 Admit 123 24 June 2012 Withdrawal 123 24 June 2012 Admit 456 10 October 2012 Withdrawal 456 11 October 2012 Admit 789 Currently enrolled students have “withdrawal date” in the far future (1/1/3030) so that there is an actual date and not a null to compare against: (today() < [withdraw date]) as “currently enrolled”
  • 44. Reporting Platform Examples - Assessment Generally two ways we need to analyze assessments • Single view of all assessments for a student – data in columns • Each row is a single student for a particular school year Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 44 Student Assessment SY Score 123 A1 Q1 SY1415 90 123 A1 Q2 SY1415 80 123 A1 Q3 SY1415 70 123 A1 Q4 SY1415 100 456 A1 Sem 1 SY1415 65 Student A1 Q1 A1 Q1 A1 Q3 A1 Q4 A2 Sem 1 A2 Sem 2 SY 123 90 80 70 100 76 87 SY1415 456 60 70 80 90 65 86 SY1415 • Comparing one run of an assessment with another – data in rows • Each row is a single assessment for a single student for a particular school year Key is that both are processed from the exact same data sets at the same time so contain the same data stored in two different structures
  • 45. Reporting Platform Development Biggest challenge was how to develop system when we had poor data quality How could we avoid introducing more errors? Solution • Prioritize – Start with standard re-occurring reports (eg attendance weekly) • Compartmentalize – Run reports using only students with no data quality issues Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 45 Aspen (Current SIS)
  • 46. Reporting Platform Development Need to ensure that reporting platform is not introducing new errors. How? Use only known good data to validate: Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 46 Report validated Fix any issues with Reporting platform No discrepancies discrepancies Filter out students with bad data into exceptions table Reporting Platform Report query Standard Report Sample Report Why? Compare Bad data students Good data students
  • 47. Reporting Platform Development 1. Create Sample Report and compare to Standard Report (eg attendance weekly) 2. Check for discrepancies 1. If discrepancy is due to mistake in reporting platform or query, fix it 2. If discrepancy is due to bad data, store student id in exceptions table 3. Pull Sample Report again, filtering out exception students so that only “Good Data” is included in report 4. Continue until no discrepancies Example “Bad Data” exceptions: 1. Overlapping enrollments 2. Absences outside of enrollment 3. Missing data Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 47
  • 48. Managing Data Requests Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 48
  • 49. Survey results – requestors of reports Who is asking for data and most often with what frequency? Counts are number of districts who report that frequency for that requester type Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 49 Who N/A Yearly Monthly Weekly Daily Ad-hoc Parents/community 3 4 0 0 1 3 Students 6 1 0 0 2 2 Teachers 0 1 1 3 4 2 School administrators 0 0 2 4 4 1 Central office program staff 0 0 3 3 2 3 District leadership 0 0 3 5 3 0 State/Fed Dept of Ed or other external org 1 3 2 1 2 2 Counts are number of districts who report that frequency for that requester type
  • 50. Capacity vs Demand Demand for data is ever increasing, people are hungry for data Needed to do more with the same size team Two Tracks •Increase reporting efficiency •Reduce demand on reporting team Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 50
  • 51. Increase Efficiency Users make requests via online “Data Request Tool” (DRT) • Central point of communication with requestors for clarifications • Tracks implementation notes and report writer assignments • Report files attached to request along with query code • One report can be attached to multiple requests to allow for reuse • “Student Data Current” report available on front end • Updated daily with common student metrics (absences, GPA, grade level, school, etc) • User can customize columns/filters to download for themselves • Example of some columns available: Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 51 Student_ID YTD_Unexcused_Absences Total SBT Suspension_Days School_Name YTD_Excused_Absences Truant - still be truant? ELL_Status YTD_ISA_Average_Attendance Truant_>=10_days FARM_Status Membership_days Current_School_Average_Attendance Student_Race Absences_Towards_Truancy Current_School_Excused_Absences SPED_Status Suspension_Absences_Days Current_School_ISA_Average_Attendance
  • 52. Increase Efficiency “Data Request Tool” (DRT) Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 52
  • 53. Increase Efficiency Data Librarian is first point of contact for requests to reporting team • Dedicated FTE position • Clarifies request requirements • Is there an already completed report that can fulfill this request? • Acts as gatekeeper to qualify requests before they hit reporting capacity Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 53 Program needs data Standard Report? Student Data Current? Program Enters Data Request Data Librarian clarifies request Report Created Report Writer assigned Report Reviewed Existing report available? Report Delivered
  • 54. Self Service Reporting Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 54
  • 55. Self Service Reporting Goal was to provide self-service reporting to analysts while ensuring consistency • Giving them raw access to reporting platform is too overwhelming • Analysts are not database developers/DBAs • Requires SQL skills: eg would still need joins, aggregations to get meaningful data • Creating dedicated pull of custom data would mean another thing to maintain Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 55 Crawl before we can we walk, Walk before we can run, Run before we can fly
  • 56. Self Service Reporting Solution was to rely on already existing standard reports • Enrollment Daily, Attendance Weekly, ACGR, Student Data Current Weekly snapshot of each report was saved into a dedicated “data mart” • Analysts were already used to seeing these reports so no learning curve • These were official reports so data was guaranteed to match our official numbers • Added benefit of saving historically reported official numbers Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 56 Not quite flying yet, but closer…
  • 57. “Data Mart” example - Enrollment Report #1612 is a daily report of the enrollment for every student in DCPS • Forms the basis of how enrollment is monitored throughout the year • Especially important during pre-enrollment for upcoming school year • Also forms basis for denominator when calculating “percent of total students” metrics • Data mart data model has exact same columns as DRT with addition of “report date” Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 57 Enrollment Daily report Reporting Platform Direct SQL Enrollment Data Mart Enrollment DR 8/24 Enrollment DR 8/31 Enrollment DR 9/7 Enrollment DR 9/14 Enrollment DR 6/13 … Data Analyst
  • 58. Report requests hitting report writers Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 58 0 20 40 60 80 100 120 Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Report Writer Data Requests per Month SY12-13 SY13-14 SY14-15 SY15-16 More self-service reporting and standardized reports • Fewer ad-hoc requests for standard data • Reporting capacity can be spent on more complex requests
  • 59. Summary District of Columbia Public Schools | 2015 CGCS CIO Conference 59
  • 60. Data Effectiveness Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 60 Data Reporting Effective data Data Analytics Programs / Business Effective decisions Effective outcomes Data Driven Decision Making starts with Data Effectiveness
  • 61. How to make data more effective Consistency first, then Accuracy, then Relevancy Data Effectiveness District of Columbia Public Schools | 2015 CGCS CIO Conference 61 Metric A Report 1: 90 Report 2: 81 Report 3: 87 Metric A Report 1: 87 Report 2: 87 Report 3: 87 Consistent Metric A Report 1: 85 Report 2: 85 Report 3: 85 Metric aligned with goal Accurate Relevant Improve data quality by seeing School Staff as "data entry team" instead of “users” Users Data Entry Team
  • 62. Take Aways Meet your data where it is today and build to where you want to be Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 62 Take some time to do the work today that will help you tomorrow… •Data Quality Culture •C.A.R. Cycle •Data Request Process There’s flying and then there’s flying. Good enough is probably good enough. •Reporting Platform •Data Marts
  • 64. Appendicies Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 64
  • 65. State of Data at DCPS Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 65
  • 66. DCPS Data Systems Evolution Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 66 10+ years ago •Critical data tracked poorly or not at all •Heavy manual effort in using data to support operations •Very difficult to pull ad hoc reports or change existing reports •Only way to do trend reporting was via excel 5 years ago •Systems of record exist but data flow is both one way and not easy •All critical data is tracked somewhere but in separate systems •Very Ad-hoc reporting, every request was it’s own query, “reuse” is copy-paste •Trend reporting done from yearly snapshots Today •Consolidated view of student, employee, and school data •Robust reporting infrastructure increases capacity •More self-service reporting available •Data quality better •All changes tracked means trend reporting much easier Evolution from poor data capture to Mainframe replaced by Oracle forms System (AAL eSIS) SIS upgraded (Follett Aspen) Separate reporting database created Data Capture Data Quality Reporting ROI
  • 67. Reporting requests – Example 1 Average of 80 complex data reporting data requests per month Attendance Weekly • Summarizes weekly and YTD attendance for every student at DCPS • Absence stats: In Seat Attendance (ISA), Truancy, Unexcused Absences • Intervention: 3, 5, 10, 15, 20 day attendance letters/meetings/etc • Also includes behavior stats: Suspension, Suspension days Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 67
  • 68. Reporting requests – Example 2 Adjusted Cohort Graduation Rate – calculates individual graduation progress per credit for all high school students, flags degree of being off track • Tracks 4 year cohorts across all high schools at DCPS • Looks at grades in currently scheduled courses as well as credits received Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 68 On Grade Level On-Track 4 Core Subjects Enrolled in English III Grade On-Track English Passing English III On Potential Grad List On-Track Math Enrolled in English IV Total # Credits On-Track Science Passing English IV Credits Needed to Graduate On-Track Social Studies Enrolled in Math English Cumulative Math Cumulative Passing Math Example Columns
  • 69. Survey Demographics Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 69
  • 70. Survey results – District Sizes Survey conducted via CGCS CIO Mailing list: 11 Responses Total Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 70
  • 71. Data Sophistication Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 71
  • 72. Data Sophistication Cycle Results oriented incompatible with data driven? • In a results-oriented organization the push is to “get things done” and the velocity of the need often makes it difficult for data systems to keep up. • And as a result the data driven aspect gets starved of food Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 72
  • 73. Data Sophistication Cycle Data capture system evolves along with process sophistication Reporting sophistication should keep pace with data quality Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 73 Example Data Entry System Key Data structure Process Sophistication Data Quality Reporting Sophistication Notepad Open entry Excel Data cells MS Access Data records Student Information System (SIS) Normalized data model Reporting system separate from SIS Reporting data model Don’t build a formal data warehouse for excel “data systems”!
  • 74. Data Effectiveness. Don’t over engineer tracking system, should lead but not exceed process maturity. Example • Relative vs Absolute metrics • High stakes (IMPACT) vs experimental (RTI) Different groups or initiatives in the same organization could be at different points in the cycle Make sure you identify what level a particular need is at, no need for data tracking or reporting to be more sophisticated than the business process in question Data Effectiveness Data Effectiveness CGCS Annual Academic, Information Technology and Research Conference 74