This document summarizes data on UK government websites:
- Google data shows over 80% of website content is found on around 10% of sites, with huge numbers of very small sites. However, Google may not index all sites fully.
- Over 1,200 servers run Apache and over 1,500 run Microsoft IIS, costing over £29 million annually to operate this infrastructure.
- Searching government sites for common terms like benefits returns a huge number of results across many sites, with duplication.
- To fully understand costs and contracts, more data is needed on visitor counts, operating costs, and contractual agreements.
- A proposal is made to consolidate over 3,000 government sites
1. e-Delivery Team
Too many websites
Too little of interest
30.03.2004
e-Delivery Team
Alan Mather
1
2. Pages per site e-Delivery Team
Using google to spider to count the pages of all 3233 .gov.uk sites …
90% of sites have less than 2,000 pages
Less than 1% of sites have more than 20,000 pages
High count with less than 50 pages
– many redirects (where domain
has changed or is not active, e.g. Gov.UK - Web Site Page Counts And only a few sites have
direct.gov.uk), also sites that use huge page counts (between
ASP or frames making it 2000 100% 20,000 and 100,000) …
impossible for google to spider 1800 90% including ir.gov.uk, dh,
%age of total site count
behind first page 1600 80% scotland, ons, hmso
1400 70%
1200 60%
1000 50%
But, notwithstanding 800 40%
inability to spider some 600 30%
sites, it looks clear that the 400 20%
vast bulk of .gov sites have 200 10%
less than 2000 pages 0 0%
0
0
0
00
00
00
00
00
00
00
00
<5
0
0
0
0
0
00
0
0
<1
<2
<3
<4
<5
<1
<2
0
<1
e.g. 0<x<50, 50<x<1000 etc
Page Count <50 <1000 <2000 <3000 <4000 <5000 <10000 <20000 <100000
Site Count 1891 738 251 120 63 113 4 28 25
%of total 58% 81% 89% 93% 95% 98% 98% 99% 100%
Cumulative
Site Count 1891 2629 2880 3000 3063 3176 3180 3208 3233 2
3. The Google Data - Raw e-Delivery Team
The Google data shows:
More than 80% of the content (in pages) is found in around 10% of the total count of sites
There are huge numbers of very small sites (per Google), although that may be because
Google is unable to spider or does not cover all sites through the entire hierarchy
Still, errors in Google indexing are likely to be consistent across the entire population of .gov
sites, making the shape of the graph likely ok
Google's site sizes
100000 100%
90000 90%
80000 80%
70000 70%
60000 60%
site size
50000 50%
40000 40%
30000 30%
20000 20%
10000 10%
0 0%
3
4. Counting Servers e-Delivery Team
Checking on the servers operating
behind the websites in .gov.uk Apache 1209
Apache 186
Over 1,200 running Apache Apache/1.3.26 274
And more than 1,500 running Microsoft IIS Apache/1.3.27 282
Apache/1.3.28 62
These figures don’t include servers that Apache/1.3.29 99
may be configured but not active for, e.g. Apache/2.0.40 25
Apache/2.0.45 1
resilience. They also don’t include Apache/2.0.46 32
servers further down the infrastructure other Apache 248
stack, e.g. running content applications Microsoft-IIS 1547
or other code IIS/4.0 377
IIS/5.0 1103
Naturally, each of these servers is likely IIS/6.0
other IIS
65
2
to be accompanied by firewall and
Lotus-Domino Lotus-Domino 109
storage configurations Netscape-Enterprise Netscape-Enterprise 74
At a conservative cost of £10,000 per
server, the total cost of this infrastructure
alone is over £29,000,000
4
5. Cost of Websites
(Benchmarking) e-Delivery Team
Not on Record
•dti
•IR
ONS •HMCE
•Home Office
DH
DfT Worktrain •DEFRA
Business.gov? •ODPM
JC+ (development)
Figures drawn from recent PQ (and, unless
stated, include only hosting charges and
not development or development support)
JC+
DWP The Pension
Service
OFT Large
TheRegister.com Worktrain Quasi-Public
HMT (development) Sector (fully
DfES
Loaded)
5
250k 500k 750k 1.0m 1.25m 1.5m 1.75m 2.0m 2.25m 2.5m 3.0m
6. Characteristics of .gov.uk sites e-Delivery Team
Inconsistent - five different
look and feels
Unreliable - Poor uptime Huge - up to 100,000
pages
Complex - Nine levels More than 3200
deep More than three
sites navigation designs
100s of broken Some parts of the
links More than 2.5 site not linked to others
million documents ‘orphan content’
More than 200
URLs per dept More than 300
authors
Slow - download time
more than one minute
6
7. Looking For The Right Thing? e-Delivery Team
Using Internet search engines in an
effort to find “the right thing” can be
challenging. The search terms at left
were entered, with the results restricted
30/03/2004
to the “.gov.uk” domain only
Disability Living Allowance 14,700
Child Tax Credit 5,790 There is a huge amount of duplication
Carers Allowance 915 in government online:
Working Family Tax Credit 546
Attendance Allowance 13,000 Many local authority sites repeat the
Council Tax Benefit 42,000 description of the rules for claiming
Housing Benefit 77,800 certain benefits, where to claim, what to
Statutory Sick Pay 6,200 claim for and so on … and doubtless,
Self Assessment 14,000 every year or so, each of these
mentions must be updated with the
correct rules (but what if they’re not?)
Even “self assessment” only has 4,950
mentions on the Inland Revenue’s own
site, but a further 9,000 across the rest
of government
7
8. And how does .gov look to the consumer? e-Delivery Team
The variety of sites show little in
the way of consistency
Navigation varies from site to
site, sometimes on the left,
sometimes tabbed, sometimes
graphic, sometimes text
“Search” is called different
things, is often not on the home
page and often returns poor
results – despite research
showing that consumers who
can’t see what they want
instantly will use search
Accessibility is poor with many
sites not attempting to achieve
the lowest hurdles
Even sites owned by the same
parent are confusing, e.g.
pensionservice, pensionguide,
agepositive, over50 …
8
9. The Missing Data e-Delivery Team
To complete the picture and allow the proposed plan of action to be fine
tuned, the following data is needed:
Visitor counts (Hitwise may offer an approximation)
Approximate costs to operate (at an infrastructure level including all servers,
network equipment, firewalls, software licences etc) – both price bought at and the
price for continued operations projected forwards (to allow for annual licence
premiums, renewals etc that may be due in the future)
Contractual agreements around exit arrangements, renewal dates etc along with
whether the contract for web hosting is part of a wider technology outsource
agreement (that might, therefore, make it harder to exit)
9
10. Proposal For What Next e-Delivery Team
Principles
Government is in the business of helping citizens by making information easy to
find. The total number of websites needs to be rationalised dramatically – from
over 3,000 to under 600 in the first stage (including Local Authorities).
Government is in the business of presenting information in a way that citizens will
understand; it is not in the user interface design business. The range of
navigational and interface styles needs to be harmonised to a single core style.
Government has already spent significant sums on its online presence, yet
government is not a technology leader. The cost of the programme outlined must
be absorbed through saves generated in the first year of the programme, making it
self-funding.
Government buys in cycles and these are likely to be maintained. This cycle will
allow work to be completed at a constant pace as contracts come to their natural
end, thus incurring no exit penalties.
A programme of rationalisation this large will require multiple parallel streams of
work – the cost of the overlap reducing the saves inherent slightly but increasing
the odds of success through elimination of bottleneck and delay
10
11. DotP versus Everything Else e-Delivery Team
Condensing 3,000+ sites to a few hundred is no simple task. It will likely
require a variety of approaches and software solutions to ensure that there
are no bottlenecks.
DotP’s primary characteristics are:
A managed service model (i.e. hardware, software, network included)
A high end content management engine allowing customised workflow, complex
information architectures and large numbers of geographical authors
Highly resilient, scalable and secure infrastructure reducing the risk of failure
A model to allow changes to sites through configuration, not code customisation
A range of features tailored to solve government’s main content problems
Other content engines usually:
Come as a software licence with extensive customisation required
Have a range of features that DotP doesn’t have and that have been developed
over several product cycles, primarily for commercial customers. Some of these
features will be useful for government
Will develop competitively no matter what government does
But they rarely come as managed services, necessitating hosting and
11
management to be included
12. Setting Up The Programme e-Delivery Team
Select a core of important websites based on:
Total size (aiming to isolate 50% of the content in government)
Visitor count (capturing a large chunk of the audience, say 50%)
Transaction generation (targeting the bulk of online transactions for both business
and citizen)
Content management status (looking first for unmanaged systems still based on
HTML or those that are not well advanced in terms of a content engine)
Outline the information architecture as it is coupled with the target
architecture for how it should be – taking each site and fitting it into an overall
architecture and design that is consistent across all of them
It is assumed that these sites – ranking as the most popular and largest in
government – will need rearchitecting to make the most of them (including a new
layout, new navigation and so on)
This rework will give a good chance to eliminate duplication and inconsistency, as
well as remove as much as 30-50% of content as redundant (based on experience
with Department of Health).
12
13. Establishing The Target Platforms e-Delivery Team
To identify the target platforms, the following is proposed:
A “bake off” competition is kicked off where a variety of content management
vendors are given an environment (with workspace, hardware and network
connectivity).
Each vendor is given the same brief – to take an existing, static website – the
“challenge site” - with a known information architecture and transfer it to a new
target architecture (also provided).
The vendors then set up their systems, using templates and guidelines provided
by government, to deliver the challenge site under strict timescales – including
defining the architecture, implementing the style guidelines, integrating the search
engine and migrating the content
At the end of the competition, a subset of the vendors who have met previously
agreed and published criteria is passed through to the next stage
Commercial agreements are then built – using standard templates – with the
vendors, allowing for volume discounts on licences to be obtained.
Websites in the core population are then allocated across vendors and the
implementation task kicked off. Vendors that perform are given more, vendors
that don’t perform are gradually eliminated and their work shared across other,
more successful vendors
13
14. Why a Bake Off? e-Delivery Team
Migrating some 3000 websites is a fearsome task, here is why there should
be more than one solution going:
The problem is not one of only technology – the changes required to government
editorial processes are enormous. The greater the range of experience thrown at
this, the better the result
One single system (or even two or three) would result in bottlenecks that would
delay rationalisation. Having several “similar” but independent systems will
resolve the bottleneck
One large system would be high risk – a single outage could take down
government’s online presence – spreading the systems will, in the end, reduce risk
versus cost.
Competition is healthy – a few players working both together (to complete the
goal) and against each other (to complete the goal first and therefore win
business) will work well
But, we need only a few (5,6,7?) – too many will bring too high an overhead and
risk quality standards
14
15. Estimating the Costs e-Delivery Team
The costs of migration will include:
The initial work to identify candidates
The evaluation of target platforms
The setting up of migration environments
The cost of redesign of some sites to make them consistent with the target
standard (e.g. search engine on home page, navigation through tabs, reducing the
depth of the site etc)
The cost of redesigning pages to fit the new system – e.g. where the site uses
custom techniques that are not easily replicable
The actual migration of data from one format to another (there are tools that claim
to do this, with varying success, or manual methods – these too will need to be
assessed)
15
16. Integrate … Marriott.com e-Delivery Team
One URL
13 brands
Five major redesigns
2,600 locations
142,000 people
16
17. Rationalise … IRS.gov e-Delivery Team
235 sites … to one
47% e-filing
25 million regular users
AOL cache data at peaks
80% of e-filers do it again
Accountants starting to charge $35 for
those who want to do it on paper
17