Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Loading in …3
1 of 19

Too many websites v2



Download to read offline

Too many websites ... analysis of how many, how much and with what issues for UK Government on the web. March 2004.

Related Books

Free with a 30 day trial from Scribd

See all

Too many websites v2

  1. 1. e-Delivery Team Too many websites Too little of interest 30.03.2004 e-Delivery Team Alan Mather 1
  2. 2. Pages per site e-Delivery Team  Using google to spider to count the pages of all 3233 sites …  90% of sites have less than 2,000 pages  Less than 1% of sites have more than 20,000 pages High count with less than 50 pages – many redirects (where domain has changed or is not active, e.g. Gov.UK - Web Site Page Counts And only a few sites have, also sites that use huge page counts (between ASP or frames making it 2000 100% 20,000 and 100,000) … impossible for google to spider 1800 90% including, dh, %age of total site count behind first page 1600 80% scotland, ons, hmso 1400 70% 1200 60% 1000 50% But, notwithstanding 800 40% inability to spider some 600 30% sites, it looks clear that the 400 20% vast bulk of .gov sites have 200 10% less than 2000 pages 0 0% 0 0 0 00 00 00 00 00 00 00 00 <5 0 0 0 0 0 00 0 0 <1 <2 <3 <4 <5 <1 <2 0 <1 e.g. 0<x<50, 50<x<1000 etc Page Count <50 <1000 <2000 <3000 <4000 <5000 <10000 <20000 <100000 Site Count 1891 738 251 120 63 113 4 28 25 %of total 58% 81% 89% 93% 95% 98% 98% 99% 100% Cumulative Site Count 1891 2629 2880 3000 3063 3176 3180 3208 3233 2
  3. 3. The Google Data - Raw e-Delivery Team  The Google data shows:  More than 80% of the content (in pages) is found in around 10% of the total count of sites  There are huge numbers of very small sites (per Google), although that may be because Google is unable to spider or does not cover all sites through the entire hierarchy  Still, errors in Google indexing are likely to be consistent across the entire population of .gov sites, making the shape of the graph likely ok Google's site sizes 100000 100% 90000 90% 80000 80% 70000 70% 60000 60% site size 50000 50% 40000 40% 30000 30% 20000 20% 10000 10% 0 0% 3
  4. 4. Counting Servers e-Delivery Team  Checking on the servers operating behind the websites in Apache 1209 Apache 186  Over 1,200 running Apache Apache/1.3.26 274  And more than 1,500 running Microsoft IIS Apache/1.3.27 282 Apache/1.3.28 62  These figures don’t include servers that Apache/1.3.29 99 may be configured but not active for, e.g. Apache/2.0.40 25 Apache/2.0.45 1 resilience. They also don’t include Apache/2.0.46 32 servers further down the infrastructure other Apache 248 stack, e.g. running content applications Microsoft-IIS 1547 or other code IIS/4.0 377 IIS/5.0 1103  Naturally, each of these servers is likely IIS/6.0 other IIS 65 2 to be accompanied by firewall and Lotus-Domino Lotus-Domino 109 storage configurations Netscape-Enterprise Netscape-Enterprise 74  At a conservative cost of £10,000 per server, the total cost of this infrastructure alone is over £29,000,000 4
  5. 5. Cost of Websites (Benchmarking) e-Delivery Team Not on Record •dti •IR ONS •HMCE •Home Office DH DfT Worktrain •DEFRA •ODPM JC+ (development) Figures drawn from recent PQ (and, unless stated, include only hosting charges and not development or development support) JC+ DWP The Pension Service OFT Large Worktrain Quasi-Public HMT (development) Sector (fully DfES Loaded) 5 250k 500k 750k 1.0m 1.25m 1.5m 1.75m 2.0m 2.25m 2.5m 3.0m
  6. 6. Characteristics of sites e-Delivery Team Inconsistent - five different look and feels Unreliable - Poor uptime Huge - up to 100,000 pages Complex - Nine levels More than 3200 deep More than three sites navigation designs 100s of broken Some parts of the links More than 2.5 site not linked to others million documents ‘orphan content’ More than 200 URLs per dept More than 300 authors Slow - download time more than one minute 6
  7. 7. Looking For The Right Thing? e-Delivery Team  Using Internet search engines in an effort to find “the right thing” can be challenging. The search terms at left were entered, with the results restricted 30/03/2004 to the “” domain only Disability Living Allowance 14,700 Child Tax Credit 5,790  There is a huge amount of duplication Carers Allowance 915 in government online: Working Family Tax Credit 546 Attendance Allowance 13,000  Many local authority sites repeat the Council Tax Benefit 42,000 description of the rules for claiming Housing Benefit 77,800 certain benefits, where to claim, what to Statutory Sick Pay 6,200 claim for and so on … and doubtless, Self Assessment 14,000 every year or so, each of these mentions must be updated with the correct rules (but what if they’re not?)  Even “self assessment” only has 4,950 mentions on the Inland Revenue’s own site, but a further 9,000 across the rest of government 7
  8. 8. And how does .gov look to the consumer? e-Delivery Team  The variety of sites show little in the way of consistency  Navigation varies from site to site, sometimes on the left, sometimes tabbed, sometimes graphic, sometimes text  “Search” is called different things, is often not on the home page and often returns poor results – despite research showing that consumers who can’t see what they want instantly will use search  Accessibility is poor with many sites not attempting to achieve the lowest hurdles  Even sites owned by the same parent are confusing, e.g. pensionservice, pensionguide, agepositive, over50 … 8
  9. 9. The Missing Data e-Delivery Team  To complete the picture and allow the proposed plan of action to be fine tuned, the following data is needed:  Visitor counts (Hitwise may offer an approximation)  Approximate costs to operate (at an infrastructure level including all servers, network equipment, firewalls, software licences etc) – both price bought at and the price for continued operations projected forwards (to allow for annual licence premiums, renewals etc that may be due in the future)  Contractual agreements around exit arrangements, renewal dates etc along with whether the contract for web hosting is part of a wider technology outsource agreement (that might, therefore, make it harder to exit) 9
  10. 10. Proposal For What Next e-Delivery Team  Principles  Government is in the business of helping citizens by making information easy to find. The total number of websites needs to be rationalised dramatically – from over 3,000 to under 600 in the first stage (including Local Authorities).  Government is in the business of presenting information in a way that citizens will understand; it is not in the user interface design business. The range of navigational and interface styles needs to be harmonised to a single core style.  Government has already spent significant sums on its online presence, yet government is not a technology leader. The cost of the programme outlined must be absorbed through saves generated in the first year of the programme, making it self-funding.  Government buys in cycles and these are likely to be maintained. This cycle will allow work to be completed at a constant pace as contracts come to their natural end, thus incurring no exit penalties.  A programme of rationalisation this large will require multiple parallel streams of work – the cost of the overlap reducing the saves inherent slightly but increasing the odds of success through elimination of bottleneck and delay 10
  11. 11. DotP versus Everything Else e-Delivery Team  Condensing 3,000+ sites to a few hundred is no simple task. It will likely require a variety of approaches and software solutions to ensure that there are no bottlenecks.  DotP’s primary characteristics are:  A managed service model (i.e. hardware, software, network included)  A high end content management engine allowing customised workflow, complex information architectures and large numbers of geographical authors  Highly resilient, scalable and secure infrastructure reducing the risk of failure  A model to allow changes to sites through configuration, not code customisation  A range of features tailored to solve government’s main content problems  Other content engines usually:  Come as a software licence with extensive customisation required  Have a range of features that DotP doesn’t have and that have been developed over several product cycles, primarily for commercial customers. Some of these features will be useful for government  Will develop competitively no matter what government does  But they rarely come as managed services, necessitating hosting and 11 management to be included
  12. 12. Setting Up The Programme e-Delivery Team  Select a core of important websites based on:  Total size (aiming to isolate 50% of the content in government)  Visitor count (capturing a large chunk of the audience, say 50%)  Transaction generation (targeting the bulk of online transactions for both business and citizen)  Content management status (looking first for unmanaged systems still based on HTML or those that are not well advanced in terms of a content engine)  Outline the information architecture as it is coupled with the target architecture for how it should be – taking each site and fitting it into an overall architecture and design that is consistent across all of them  It is assumed that these sites – ranking as the most popular and largest in government – will need rearchitecting to make the most of them (including a new layout, new navigation and so on)  This rework will give a good chance to eliminate duplication and inconsistency, as well as remove as much as 30-50% of content as redundant (based on experience with Department of Health). 12
  13. 13. Establishing The Target Platforms e-Delivery Team  To identify the target platforms, the following is proposed:  A “bake off” competition is kicked off where a variety of content management vendors are given an environment (with workspace, hardware and network connectivity).  Each vendor is given the same brief – to take an existing, static website – the “challenge site” - with a known information architecture and transfer it to a new target architecture (also provided).  The vendors then set up their systems, using templates and guidelines provided by government, to deliver the challenge site under strict timescales – including defining the architecture, implementing the style guidelines, integrating the search engine and migrating the content  At the end of the competition, a subset of the vendors who have met previously agreed and published criteria is passed through to the next stage  Commercial agreements are then built – using standard templates – with the vendors, allowing for volume discounts on licences to be obtained.  Websites in the core population are then allocated across vendors and the implementation task kicked off. Vendors that perform are given more, vendors that don’t perform are gradually eliminated and their work shared across other, more successful vendors 13
  14. 14. Why a Bake Off? e-Delivery Team  Migrating some 3000 websites is a fearsome task, here is why there should be more than one solution going:  The problem is not one of only technology – the changes required to government editorial processes are enormous. The greater the range of experience thrown at this, the better the result  One single system (or even two or three) would result in bottlenecks that would delay rationalisation. Having several “similar” but independent systems will resolve the bottleneck  One large system would be high risk – a single outage could take down government’s online presence – spreading the systems will, in the end, reduce risk versus cost.  Competition is healthy – a few players working both together (to complete the goal) and against each other (to complete the goal first and therefore win business) will work well  But, we need only a few (5,6,7?) – too many will bring too high an overhead and risk quality standards 14
  15. 15. Estimating the Costs e-Delivery Team  The costs of migration will include:  The initial work to identify candidates  The evaluation of target platforms  The setting up of migration environments  The cost of redesign of some sites to make them consistent with the target standard (e.g. search engine on home page, navigation through tabs, reducing the depth of the site etc)  The cost of redesigning pages to fit the new system – e.g. where the site uses custom techniques that are not easily replicable  The actual migration of data from one format to another (there are tools that claim to do this, with varying success, or manual methods – these too will need to be assessed) 15
  16. 16. Integrate … e-Delivery Team  One URL  13 brands  Five major redesigns  2,600 locations  142,000 people 16
  17. 17. Rationalise … e-Delivery Team  235 sites … to one  47% e-filing  25 million regular users  AOL cache data at peaks  80% of e-filers do it again  Accountants starting to charge $35 for those who want to do it on paper 17
  18. 18. Unfocused and disorganised e-Delivery Team 18
  19. 19. Organised and Focused e-Delivery Team 19