1. Localization+
Thomas Crenshaw
Dir, Technical Product Management
2. Agenda
By the conclusion of this webinar, my hope is that
you will have gained an understanding of the:
• data import process
• data sources used
• new algorithm
• API exploration and testing tools
3. Background
Why update PBS station localization? Our goal was
to provide a technical solution that satisfies the
basic viewer question:
What local PBS stations can I watch on my TV?
We built this web-service to be as flexible and
future-proof as possible and to meet the needs of
all PBS stations, despite differing station business
requirements.
Lesson learned: Without exception, every rule has
an exception!
4. Terms and Definitions
Localization
The process a site visitor goes through to determine which PBS stations are available in
his or her geographic area.
MSO
Technically only applies to a cable system operator, but we will include satellite companies
as well.
Example: Comcast, University of Wisconsin
Headend
Distribution network for television programming, serving a local community of variable size.
Example: Comcast - Sebastian, UW Madison - Madison
Flagship Call Sign
FCC-licensed owner and/or operator of one or more full-power transmission towers.
Example: WXEL, WPNE, KAID
Call Sign
Full-power transmission tower owned and/or operated by a flagship station.
Example: WXEL,WHA, KUID
Rovi Station Feeds
Rovi representation of a station‟s channels and sub-channels.
5. The Numbers
The sheer volume of data necessary to provide
localization at the national level is astounding.
• 1,283 MSOs
• 11,322 Headends
• 515,742 Headend/ZIP code pairs
• 180 Flagship call signs
• 873 Call signs (full-power and translators)
• 1,412 Rovi station feeds
• ~42,000 ZIP codes
Data as of 2012-07-23
6. What‟s Different
The Good Legacy Localization+
Sources 1 (Nielsen) Multiple (Rovi,Nielsen)
Data Updates Yearly Daily
Troubleshooting Difficult Easy
Built for TV Schedules Localization
Architecture Non-standard RESTful
Technical Changes Difficult Easy
User Tools None Goliat, Bluebell
Import Logging None Customizable
Data Changes Impossible Possible
Transparency None Open Source
7. What‟s Different
The New Localization+
Results More stations
Geolocation Not always accurate
IP Based Input Not always accurate
Proxy-based IP Not always accurate
Static Files Used for lookups to match Rovi/Nielsen/PBS data
8. Get All Headends
Get All
ZIP Headends
Step 1: Get All Headends
For a given ZIP code, retrieve all available headends from Rovi.
9. Update Confidence
Get All
ZIP Headends
Update Confidence
Step 2: Update the “Confidence” Value
“Confidence” is a relative numeric value that represents how likely the
member station is available for a given ZIP code.
All headends are assigned an initial confidence of 100. This value is
lowered based on parameters associated with the headend listings.
The highest headend confidence is assigned to applicable stations at
end of the import process.
A static file with known bad headends and associated confidence is
used.
1 of 3
10. Update Confidence
Get All Compare
ZIP Headends to Blacklist
Update Confidence
Step 2a: Compare to Blacklist
Headends that contain stations that are obviously out of area.
•Confidence dropped to 80
•Currently this is only Dish Network, which lists KGTV, WTTW, KRMA in
almost all DMAs
headend_id headend_name dma_rank dma_code headend_state vendor_mso_id mso_name
320591 Dish Network - Philadelphia 504 4 DE 300264 Echostar Communications Corp.
Headends that have consistently bad channel line-up data.
•Confidence dropped to 0
headend_id headend_name dma_rank dma_code headend_state vendor_mso_id mso_name
307250 Shaw Direct - Eastern Time Zone 300513 Shaw Direct
320209 C-Band Providers - C-Band Eastern Time 301530 Data Services Product
321898 Eastern Time Zone - Eastern Time Zone N/A 301530 Data Services Product
322141 Bell TV - Eastern Time Zone 300818 Data Services Product
2 of 3
11. Update Confidence
Get All Compare Check
ZIP Headends to Blacklist DMA
Update Confidence
Step 2b: Check DMA
Headends that do not have DMA Code or DMA Rank have their
confidence lowered to 40.
headend_id headend_name dma_code dma_rank headend_state vendor_mso_id mso_name
322924 Verizon FIOS TV of SE PA - Verizon FiOS-N. DE DE 301762 Verizon Service Corp
3 of 3
12. Get Feed(s) per Headend
Get All Compare Check Get Feed(s)
ZIP Headends to Blacklist DMA per Headend
Update Confidence
Step 3: Get Feed(s) per Headend
Get all associated Rovi station feeds for a given headend.
1 of 2
13. headend_id
20547
Get Feed(s) per Headend
headend_name
Starview Cablevision - Forwood - Wilmington
station_id
2406
call_letters
NJTV
headend_id
307250
headend_name
Shaw Direct - Eastern Time Zone
station_id
719
call_letters
KCTS
Confidence: 100 5964 PBS Confidence: 0 959 KSPS
317727 Comcast - New Castle 247 WHYY 2273 WNED
Confidence: 100 1564 WNJS 2666 WTVS
9856 WHYYHD 10884 WTVSH
9856 WHYYHD 320209 C-Band Providers - C-Band Eastern Time 5964 PBS
11892 NJTV Confidence: 0 10991 LINK
15985 Y Arts 321898 Eastern Time Zone - Eastern Time Zone 5964 PBS
17699 Y Info Confidence: 0
320478 DirecTV - Philadelphia 247 WHYY 322141 Bell TV - Eastern Time Zone 206 WGBH
Confidence: 100 304 WLVT Confidence: 0 719 KCTS
1564 WNJS 719 KCTS
5964 PBS 2666 WTVS
9856 WHYYHD 9873 KCTSD
10991 LINK 11815 WGBHW
15866 WLVTD1 322924 Verizon FIOS TV of SE PA - Verizon FiOS-N. DE 247 WHYY
320591 Dish Network - Philadelphia 247 WHYY Confidence: 100 304 WLVT
Confidence: 80 304 WLVT 2378 WNJT
548 WTTW 9856 WHYYHD
927 KRMA 9857 WLVTD4
1564 WNJS 13729 WNJSDT
5964 PBS 15866 WLVTD1
8643 WGTV 15868 WLVTV
10991 LINK 15985 Y Arts
321054 Broadcast TV - Philadelphia 9856 WHYYHD 17699 Y Info
Confidence: 100 9857 WLVTD4 323910 Eastern Time Zone - Eastern Time Zone 5964 PBS
11892 NJTV Confidence: 0
11895 WNJT-DT 324883 Shaw Direct Advanced - Eastern Time Zone 719 KCTS
13729 WNJSDT Confidence: 0 959 KSPS
15866 WLVTD1 2273 WNED
15868 WLVTV 2666 WTVS
15985 Y Arts 10884 WTVSH
17699 Y Info 325708 Armstrong Utilities - National/Eastern 5964 PBS
322333 Verizon FIOS TV of SE PA - Verizon FIOS S NJ 247 WHYY Key Confidence: for the headend
Valid station 0
Confidence: 40 304 WLVT Questionable station for the headend
1564 WNJS Invalid station for the headend
9856 WHYYHD Station that does not exist for localization purposes
9857 WLVTD4
13729 WNJSDT
15866 WLVTD1
2 of 2
14. Map Feed(s) to Call Sign
Get All Compare Check Get Feed(s) Map Feed(s)
ZIP Headends to Blacklist DMA per Headend to Call Sign
Update Confidence
Step 4: Map Rovi Station Feed(s) to PBS Call Sign
Use static CSV file to map Rovi station id to PBS call sign.
rovi_station_id rovi_station_short_name pbs_callsign pbs_station_flagship_callsign confidence
247 WHYY WHYY WHYY 100
9856 WHYYDT WHYY WHYY 100
15985 WHYYDT2 WHYY WHYY 100
17699 WHYYDT3 WHYY WHYY 100
2378 WNJT WNJT WNJT 100
11895 WNJTDT WNJT WNJT 100
304 WLVT WLVT WLVT 100
9857 WLVTDT4 WLVT WLVT 100
15866 WLVTDT WLVT WLVT 100
15868 WLVTDT3 WLVT WLVT 100
1564 WNJS WNJS WNJT 100
13729 WNJSDT WNJS WNJT 100
2406 WNJTV WNJN WNJT 100
8643 WGTV WGTV WGTV 80
927 KRMA KRMA KRMA 80
548 WTTW WTTW WTTW 80
719 KCTS KCTS KCTS 0
206 WGBH WGBH WGBH 0
stations.txt + hand curated == rovi_stations2callsign.csv
15. Apply Headend Confidence
Get All Compare Check Get Feed(s) Map Feed(s)
ZIP Headends to Blacklist DMA per Headend to Call Sign
Update Confidence
Apply Headend
Step 5: Apply Headend Confidence to Call Signs Confidence
Assign the highest headend confidence value to each call sign. This
allows the calling application to make a logical determination as to
validity of returned data.
rovi_station_id rovi_station_short_name pbs_callsign pbs_station_flagship_callsign confidence
247 WHYY WHYY WHYY 100
9856 WHYYDT WHYY WHYY 100
15985 WHYYDT2 WHYY WHYY 100
17699 WHYYDT3 WHYY WHYY 100
2378 WNJT WNJT WNJT 100
11895 WNJTDT WNJT WNJT 100
304 WLVT WLVT WLVT 100
9857 WLVTDT4 WLVT WLVT 100
15866 WLVTDT WLVT WLVT 100
15868 WLVTDT3 WLVT WLVT 100
1564 WNJS WNJS WNJT 100
13729 WNJSDT WNJS WNJT 100
2406 WNJTV WNJN WNJT 100
8643 WGTV WGTV WGTV 80
927 KRMA KRMA KRMA 80
548 WTTW WTTW WTTW 80
719 KCTS KCTS KCTS 0
206 WGBH WGBH WGBH 0
1 of 2
16. Apply Headend Confidence
Broadcast TV - Philadelphia
321054 Broadcast TV - Philadelphia 9856 WHYYHD
Confidence: 100 9857 WLVTD4
11892 NJTV
11895 WNJT-DT
13729 WNJSDT WHYY
15866 WLVTD1
15868 WLVTV Confidence: 100
15985 Y Arts
17699 Y Info
WLVT
Dish Network - Philadelphia Confidence: 100
320591 Dish Network - Philadelphia 247 WHYY
Confidence: 80 304 WLVT
548
927
WTTW
KRMA
WNJS
1564
5964
WNJS
PBS
Confidence: 100
8643 WGTV
10991 LINK
KRMA
Bell TV - Eastern Time Zone Confidence: 80
322141 Bell TV - Eastern Time Zone 206 WGBH
Confidence: 0 719 KCTS
719 KCTS WGBH
2666 WTVS
9873 KCTSD Confidence: 0
11815 WGBHW
Eastern Time Zone - Eastern Time Zone
323910 Eastern Time Zone - Eastern Time Zone 5964 PBS
Confidence: 0
2 of 2
17. Map Call Sign to Flagship
Get All Compare Check Get Feed(s) Map Feed(s)
ZIP Headends to Blacklist DMA per Headend to Call Sign
Update Confidence
Map Call Sign Apply Headend
Step 6: Map Call Sign to Flagship to Flagship Confidence
Every call sign is assigned a flagship for purposes of providing a single
entity to facilitate donation links, station logos, etc.
rovi_station_id rovi_station_short_name pbs_callsign pbs_station_flagship_callsign confidence
247 WHYY WHYY WHYY 100
9856 WHYYDT WHYY WHYY 100
15985 WHYYDT2 WHYY WHYY 100
17699 WHYYDT3 WHYY WHYY 100
2378 WNJT WNJT WNJT 100
11895 WNJTDT WNJT WNJT 100
304 WLVT WLVT WNJT 100
9857 WLVTDT4 WLVT WNJT 100
15866 WLVTDT WLVT WNJT 100
15868 WLVTDT3 WLVT WNJT 100
1564 WNJS WNJS WNJT 100
13729 WNJSDT WNJS WNJT 100
2406 WNJTV WNJN WNJT 100
8643 WGTV WGTV WGTV 80
927 KRMA KRMA KRMA 80
548 WTTW WTTW WTTW 80
719 KCTS KCTS KCTS 0
206 WGBH WGBH WGBH 0
18. Assign Sorting Filter
Get All Compare Check Get Feed(s) Map Feed(s)
ZIP Headends to Blacklist DMA per Headend to Call Sign
Update Confidence
Apply Sorting Map Call Sign Apply Headend
Step 7: Assign Sorting Filter Filter to Flagship Confidence
Currently, Nielsen data is used to sort applicable stations within a given
ZIP code.
The new architecture employed in the updated localization service
allows us to modify the import algorithm as business requirements
change.
An example of this could be the addition of a flagship station‟s home
DMA as a sorting filter.
Numeric rank is transformed from the absolute value Nielsen provides
to a relative value within the ZIP code.
20. Localization+ Data Flow
ZIP Code
Input
Apply HE Map Call Sign
Get All Confidence to Flagship
Headends
Map Feed(s) to
Apply Sorting
Compare to HE Call Sign
Filter
„Blacklist‟
Check Get Feed(s) per
Headend DMA Headend API Output
Update Confidence
21. Let‟s Look at Some ZIPs
Bluebell (test API client)
http://open.pbs.org/lookup/localize_stations/
Goliat (API Explorer)
http://open.pbs.org/explore/