Sharing Advisory Board newsletter #8

  • 724 views
Uploaded on

The newsletter of Sharing Advisory Board, http://www1.unece.org/stat/platform/display/SAB/Sharing+Advisory+Board

The newsletter of Sharing Advisory Board, http://www1.unece.org/stat/platform/display/SAB/Sharing+Advisory+Board

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
724
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Sharing Advisory BoardSoftware Sharing NewsletterIssue 8: April 2013Editorial (Marton Vucsan, SAB Chairman)Kodak, Sony, Philips, your local newspaper, ithappened to all of them: The main product theybuilt their existence was no longer relevant. Films,walkmans, light bulbs, encyclopedias, the list isendless .. Whole industries are wiped out to makeroom for new ones. Darwin would have enjoyedthis. At the Big Data seminar that was organized bythe UN in New York you could see the first signalsthat for us the bell will ring too in due course.Choices will have to be made…There is a tremendous opportunity in using the newinformation that the planet now produces as by-product of all its processes. It opens the door to allkinds of new statistical products and processes tomake them. Let us for this moment focus on theprocesses. Looking at the way our researcherswork with the new data we see it is fundamentallydifferent from what we are used to. First theamount of data they work with is way too large toedit, second the tools they use are alien to what weare familiar with, third the workflow is intermittentbecause of the time needed for processing in everystep. Aside from setting up rule sets and creatingworkflow type actions no human labor is involved inthe actual production. The creation of theseworkflows is knowledge intensive, and once theyare created very little effort is needed to producethe result. What we see is a shift from manuallabour in production to manual labour in designwhich will supply the multiplier we need for sharingto be effective. In the traditional artisanalproduction system there is no multiplier, morestatistical output means more people. Sharing andcollaborating outside the office seems not verymeaningful because of the logistical overhead andthe differences in execution and definition.Moreover, there is no multiplier present either,sharing work does not make other workunnecessary in this kind of setup.In the big data area where the problems are moreuniform, the real work is in the design of theprocess. The processes are more formally definedand these processes represent the productionknowledge for the statistics they produce. It will bevery profitable to share these processes under the"build one get ten" rule. The reason is that there isa multiplier present in the form of a formalizedmostly automated process.In this Issue:Open data – taming the tigerThe OECD Open Data ProjectDeveloping software sharing in theEuropean Statistical SystemImproving data collection by softcomputingTools for a SprintUnderstanding “Plug and Play”The knowledge is deployed in the design phaseof the process and sharing parts of the processesmeans getting executable process parts in returnincluding the knowledge that created them. Inthis setup sharing means indeed freeing upresources.The threat or opportunity of Big Data will help usto do two things: It will help us shift the balancebetween human labor and machine labor towardsmachine labor and it will help us to become asolution sharing industry that can do much morewith much less money.In the end it might not only be the new productsthat will be our opportunity but also the newindustrial processes that will underlie these newproductsAre you Linked ?The LinkedIn group “Business Architecture inStatistics” aims to share knowledge and ideasand promote activities, such as those undertakenby the Sharing Advisory Board, that supportstandards-based modernisation and greatercollaboration between statistical organisations.Join the discussions at:http://www.linkedin.com/groups?home=&gid=4173055You can also find out more about SAB activitiesand outputs via the MSIS wiki:www1.unece.org/stat/platform/display/msis
  • 2. Open data – taming the tigerEoin McCuirc (Central Statistics Office, Ireland)The term “open data” means different things todifferent people, though the goals of makinginformation freely available and easily accessibleonline are very clear. I’ll start by looking at TimBerners-Lee’s classification of five levels of opendata:★   Make your data available on the Web under an open license★★  Make it available as structured data (e.g. Excel sheet instead of image scan of a table)★★★   Use a non‐proprietary format (e.g. CSV file instead of an Excel sheet) ★★★★  Use Linked Data formats (URIs to identify things, RDF to represent data)★★★★★  Link your data to other people’s data to provide contextSo, there are degrees of “openness” – from simplyputting information up on the web to providinglinked open data. Yes, both are a form of opendata but, though similar in appearance, they aretwo completely different animals, as different sayas a cat to a tiger. In this article I want to talk aboutthe tiger: linked open data, the semantic web andhow the CSO is beginning to meet this newchallenge.In managing the dissemination of statistics we areguided by international standards. Principles 14and 15 of the European Statistics Code of Practiceare particularly relevant:Principle  14  Coherence  and  Comparability:  European Statistics  are  consistent  internally,  over  time  and comparable  between  regions  and  countries;  it  is possible to combine and make joint use of related data from different sources. Principle  15  Accessibility  and  Clarity:  European Statistics are presented in a clear and understandable form,  released  in  a  suitable  and  convenient  manner, available  and  accessible  on  an  impartial  basis  with supporting metadata and guidance. Clearly, the opportunities offered by open data willhelp statistical offices to deliver outputs whichmatch these two principles.The data deluge, or the accumulation of data aboutpeople, places and things, is changing the world inwhich statistical offices process and publishstatistics – and is another important driver for opendata. In general, it is getting more and moredifficult to find the information you need – theproblem of the needle in the haystack. Soonerrather than later we will need machines to trawlthrough all available data in order to find theproverbial needle. But for this to become possible,data needs to be structured in a particular way – achallenge to which the semantic web offers asolution.The semantic web provides a way of making datamachine-readable, independent of the variety oftechnical platforms and software packages in usethroughout the web. A key concept is that of linkedopen data. In many ways, linked open data issimilar to open data: An organisation such as astatistical office decides what information it wantsto publish on the web and makes the necessarytechnical choices about hosting, security, domainnames, content management, maintenance, etc.These choices apply equally to publishing linkedopen data.However, the key difference is one of language. Inlinked open data, semantic web objects are namedto indicate all the attributes needed to make thedata machine-readable without humanintervention. For statisticians, this is an opportunityto use international classifications (e.g. NACE,ISCO, ISCED etc.) as de facto standards for linkedopen data. Indeed, if we don’t take this opportunity,it’s possible that other early adopters could set adifferent standard.We are starting on a journey and, unfortunately,there is no clear road map yet and few precedentsto give guidance. So, how do we acquire theexpertise needed to publish statistics as linkedopen data?In 2012 the CSO began a pilot project with theDigital Enterprise Research Institute (DERI) at theNational University of Ireland, Galway (NUIG), topublish some of the Census 2011 results as linkedopen data. The project has given valuableexperience to the CSO dissemination team and thefollowing are some of the important lessons so far:1. For data to be linked across the semanticweb, objects need to be named. Uniform ResourceIdentifiers (URIs) are the code that identifies anobject. Official statistics use many standardclassifications to define their data and, as notedabove, this is very useful when creating URIs.2. Once the objects have been named aframework is needed to publish this data on theweb. Using the Resource Description Framework(RDF) which views the world in triplets –Special Feature: Open DataThe next two articles explore the implications ofopen data for official statistics. The first presentsthe view from a national statistical organisation(CSO Ireland). The second gives the perspectiveof an international organisation (OECD).
  • 3. (Resource, Attribute, Value) – the information ispublished on the web. An example of an RDFstatement is (Population of Ireland 2011, Statistic,4588252).3. To publish data on the semantic web anorganisation needs to put it in a place and in aformat that a machine will expect. The CSO willpublish its Census 2011 open data on data.cso.ienot on the CSO website www.cso.ie. In thisscenario machines should get RDF data and usersshould get some readable representation of thedata e.g. HTML.4. Ideally all the URIs an organisationproduces related to a single real world concept –e.g. Population of Ireland 2011 – should be linkedtogether.5. Ideally the URIs would be “cool”, built to last2000 years or more.http://www.w3.org/TR/2008/WD-cooluris-20080321/6. For the Census 2011 pilot it is proposed toproduce a SPARQL (SPARQL Protocol and RDFQuery Language) service to facilitate access to thedata.The following table sets out the framework which isplanned for Census 2011 results as linked opendata:Base URI: http://data.cso.ie/Entity  URI pattern (rel. to base) RDF class Classification  /classification/{id}  skos:ConceptScheme Concept in a classification /classification/{id}#{code} skos:Concept Dataset  /dataset/{id}  qb:DataSet Data structure definition /dataset/{id}#structure qb:DataStructureDefinition Observation  /dataset/{id}#{dim1};{dim2};{dim3} qb:Observation Property (dimension, attribute) /property#{id}  qb:DimensionProperty, qb:AttributeProperty Later in 2013, we will publish the outputs from theproject – i.e. Census 2011 results as linked opendata and, to mark International Year of Statistics,we will have a competition for the best “mash up”using those statistics. We hope this will not only bea proof of concept, but also a proof of the value oflinked open data.The OECD Open Data ProjectThe OECD is currently undertaking an OpenData project with the aim of making its statisticaldata content machine-readable, retrievable,indexable and re-usable. The Open Data projectwill implement an Application ProgrammingInterface (API) to provide machine-to-machineaccess to the OECD statistical data warehouse“OECD.Stat” via a number of formats along withthe challenges involved in standardising thestatistical content from the 800+ datasets.In addition an Open Innovation community will becreated to encourage the re-use of OECD datavia external innovation.The Open Data project is aligned to theKnowledge Information Management (KIM)Ontology Management and SemanticInfrastructure project to make data accessible vialinked data.Background to the Open Data ProjectStatistics are of strategic importance to theOECD both as an input for internal analysis andalso as a product for dissemination to a wideraudience in their own right. Following a review ofthe OECD Publishing Policy in 2011 a number ofrecommendations were proposed to make OECDstatistics “open, accessible and free”. The OECDCouncil welcomed this proposal and as a resultthe DELTA programme was initiated toimplement these aims.DELTA Project – Open DataOpenness is one of the key values that guide theOECD vision for a stronger, cleaner and fairereconomy. Making data open is an important partof this and to this end a number of openbenchmarks in the project have been defined asfollows:Completeness – content should include data,metadata, sources and methods.Primacy – datasets should be primary and notaggregated and include details on how the datawas collected.Timeliness – data should be automaticallyavailable in trusted third-party repositories uponpublication.Ease of access – data made available via asimple Application Programming Interface (API)Machine Readability – data and metadataprovided in machine-readable standard plusdocumentation.Non-discrimination – No special permissionsrequired to access data.
  • 4.  Use of common standards – Stored data can beaccessed without a special software license Licensing – Creative Commons CC-BY(Licensees may copy, distribute, display andperform the work and make derivative worksbased on it only if they give the author orlicensor the credits in the manner specified bythese). Permanence – Information made availableremain online with archiving over time togetherwith notification mechanism. Usage costs – Free.Open Data Project goalsData today can be extracted only via downloadsfrom OECD.Stat. The ODWS will make themavailable to other web sites directly for creatingcustom data visualizations, live combinations withother data sources etc. The goals of the OpenData project are: to make OECD data machine-readable, retrievable, indexable and re-usable; toincrease the dissemination and impact of OECDdata via open data services for its statistical data;and, to encourage re-use of OECD data byexternal innovation communities.The Open Data Project has 3 main deliverables: i)a full set of “Open-ready” data and metadata; ii) aset of Open Data Web Services and iii) aninterface for managing the OECD Open InnovationCommunity.“Open-Ready” Data and metadataFor data to be considered “Open-ready” theexisting data and metadata content of the OECDcorporate data warehouse OECD.Stat will berequired to meet certain criteria of structure andcontent necessary for machine-to-machine access.To achieve this, data owners will carry out a self-assessment of all OECD.Stat data content togauge the state of open-readiness for eachdataset. This will involve analysing the metadatacontent according to the criteria.Open Data Web Services (ODWS)In parallel to the data assessment exercise, theOpen Data Web Services will be developed. Thiswill involve building a set of Web Services toprovide machine-to-machine access to OECD.Statdata via a number of formats. This will involvedefining the technical standards for data to bemachine-readable that meet the needs of bothexpert and non-expert audiences. ApplicationProgramming Interfaces (API) will be developed tomake the data and metadata in OECD.Statavailable to systems outside the organisation via anumber of formats.These Web Services will be available to otherorganisations currently sharing the .Stat Datawarehouse software via the OECD StatisticalInformation System Collaboration Community(SIS-CC) .Open Data formatsData and metadata will be made available toexternal users in as many output formats aspossible to maximise data access. The project willstart with formats including: SDMX/JSON, RestfulAPI, OData, XLS and CSV. Additional formats willbe added as needed over time. These formatshave been chosen for the reasons describedbelow.a) Excel/CSV - Excel and CSV are already widelyused exchange standards so including them asoutput formats was a fairly obvious decision.b) SDMX/JSON - JavaScript Object Notation(JSON) is a text-based open standard designed forhuman-readable data interchange and hasbecome one of the most popular industry-usedopen data formats on web sites today.The Statistics Data and Metadata eXchangestandard (SDMX) provides a standard model forstatistical data and metadata exchange betweennational agencies and international agencies,within national statistical systems and withinorganisations. OECD is a member of the SDMXSponsor Group (together with the Bank ofInternational Settlements, European Central Bank,Eurostat, International Monetary Fund, UnitedNations Statistics Division and World Bank). SDMXdata extracts from OECD.Stat are already providedvia a web service; this will be adapted as an APIusing the SDMX compact version.c) Open Data (OData) - OData is an open protocolfor sharing dataFuture formats could include Google Data (aREST-inspired technology), Google DatasetPublishing Language (DPSL) or Google KML, aGeospatial file format.Linked Data and the OECD KIM projectThe OECD Knowledge and InformationManagement (KIM) has been established tointegrate information and centralise access to allOECD content (corporate content management,record management, authoring, etc.). KIM waslaunched in parallel to the DELTA project and isconcerned with developing semantic enrichmentand centralized taxonomy linked data support.A long-term goal of the project is to create linkeddata sources with the Resource DescriptionFramework (RDF) using existing vocabularies tomap data to related subjects and generating a
  • 5. Softwareinventory Over 60 statistical softwaretools available for sharing Find new software, or postinformation about yours at:www1.unece.org/stat/platform/display/msis/Software+Inventorycollection of “triples” (consisting of a subject, apredicate and an object) known as a “triple-store”.Each component of the triple has a UniqueResource Identifier (URI) enable data to be linkedto related sources.Creating a triple-store from the OECD.Stat datawarehouse will be a huge task and workinvestigating the possibilities has only recentlystarted (at time of writing the tools have not yetbeen selected), but the long-term goal is toconform to the Tim Lee-Berners “5 star” level ofopen data.The vision of the Semantic Web is to extendprinciples of the Web from documents to data.Data should be accessed using the general Webarchitecture using, e.g., URI-s; data should berelated to one another just as documents (orportions of documents) are already. This alsomeans creation of a common framework thatallows data to be shared and reused acrossapplication, enterprise, and community boundaries,to be processed automatically by tools as well asmanually, including revealing possible newrelationships among data items.The OECD Open Innovation CommunityThe Open Innovation Community will consist of aninterface for managing Open InnovationCommunity (OIC) content and involves designing,building and maintaining this interface to providethe following: Information describing the open platform Registration services Examples of products developed using theopen platform Open Services available with associatedtechnical documentation OIC Blog FAQ
  • 6. Developing software sharing in theEuropean Statistical SystemDenis Grofils (Eurostat)Software represents an important part of theassets of the European Statistical System (ESS).In statistical institutions as in many modernbusinesses quality and availability of software is ofprimordial importance as it affects directly the waybusiness processes are executed. If all membersof the ESS are possibly developing software tosome extent, all are using software without anydoubt. The development of software is usuallyrecognized as costly as well at development as inmaintenance stages. The simple usage ofsoftware may be costly in different respects:licensing fees, consultancy, training, …Software may be of different nature and extent, forexample some types of software could be: Data collection systems Procedures developed in statistical computinglanguages for different purposes (sampling,imputation, weighting, aggregation,confidentiality protection, etc.) Tools for the management of statisticalmetadata Web portals for data disseminationAs the level of standardisation grows in thestatistical community through harmonization atinternational level and through initiatives thatpromote industrialisation of official statisticalproduction (see the work of the HLG of theUNECE or the Joint strategy and the ESS.VIPprogramme at ESS level), the sharing of softwareat a wider level becomes easier.The move toward service oriented architecture(SOA) and the development of a so called "plugand play" architecture for statistical productionreinforce strongly the potential of sharing.Platform-independent services allow distributedarchitecture models that promote a high level ofreuse of software components. Services can bedeveloped independently or cooperatively andshared among partners. Functionalities of existingsoftware can be offered as service at limited costvia proper wrapping. All this make the potential ofsoftware sharing higher than ever.The possibility to share software amonginstitutions of the ESS represents severaladvantages, notably: Increase efficiency and reduce costs byavoiding multiple developments of virtually thesame products by different organisations Increase harmonization and interoperabilitythrough the use of standard software buildingblocks Improve quality of the data through the use ofwidely accepted and validated softwarebuilding blocks and improve comparabilityamong data coming from different countries Increase the level of collaboration and resourcesharing between members of the ESSSeveral important achievements relating to OSShave been realised at the European level, notably: The European Union Public Licence (EUPL):The first European OSS licence. It has beencreated on the initiative of the EuropeanCommission and is approved by the EuropeanCommission in 22 official languages of theEuropean Union. Joinup: A collaborative platform created by theEuropean Commission that offers a set ofservices to help e-Government professionalsshare their experience with interoperabilitysolutions and support them to find, choose, re-use, develop, and implement open sourcesoftware and semantic interoperability assets.The ESS IT Directors Group (ITDG) mandated theStatistical Information System Architecture andIntegration working group (SISAI) to launch a taskforce dealing with the development of policy andguidelines supporting ESS software sharing. Thework of this task force started during fall 2012.The following aspects of software sharing aretackled: Definition of software of interest: In thiscontext the term ‘software’ is to be understoodin its broadest sense as any set of computerprograms, these being defined as any set ofinstructions for computers. Objective criteria fordefining the target of the recommendations arenecessary. Software of interest is defined as“software used by members of the ESS tosupport directly activities of the GSBPM inorder to realise the statistical programme of theESS”. It should be noted that this definition isindependent of technological characteristics ofsoftware (web-based, command-line batch,macros, web-services, etc.). Software catalogue: The way a catalogue ofESS software should be maintained and whichinformation should be recorder is defined. Adistinction is made between unshared software(used by only one ESS-member) for which aminimal set of information is collected andshared software (used by several ESS-members) for which an extensive set ofinformation is collected.
  • 7. This newsletter and previous editions are alsoavailable on-line at:http://www1.unece.org/stat/platform/display/msis/SAB+Newsletter Sharing scenarios: Several scenarios areidentified and the applicability ofrecommendations per scenario is defined (i.e.all recommendations do not apply to allscenarios). Sharing software use: The federation ofsoftware users through the creation of usercommunities is organized. This concernssoftware published under any type of licenceinclusive commercial software. Sharing software development:Recommendations are made for each step ofthe development cycle. As an example it isrecommended to consider several type ofconstraints when designing software:Architectural constrains (consistency withGSBPM & GSIM, link with PnP constraints),clear documentation of methodologicalaspects, data protection constraints specific tothe ESS, support for multilingualism and legalroadmap (particularly intellectual property righttracking when developing component-basedapplications). Software quality evaluation: A template forsoftware quality assessment is provided.Evaluations of elaborated recommendations onreal cases were performed to evaluatepropositions against reality and incorporatefeedback from these experiences. Threeillustration cases were used: Blaise, Demetra+and SDMX-RI.The set of draft recommendations elaborated bythe task force will be submitted in the comingweeks to the Statistical Information SystemArchitecture and Integration working group (SISAI)and then to the ESS IT Directors Group (ITDG).Improving data collection by softcomputingMiroslav Hudec and Jana Juriová (Infostat)The applicability of soft computing (fuzzy logicand neural networks) as a modern means toimprove the collection and the quality of data forbusiness and trade statistics is one of topics ofthe Blue-ETS project (http://www.blue-ets.istat.it/).The main findings which support this line ofdevelopment are: Large complex administrative and statisticaldatabases contain valuable informationwhich can be mined using powerfulmethodologies; Statisticians possess knowledge on how todeal with their tasks, but this knowledgecannot be always expressed by preciserules.In order to estimate missing values, relationsbetween similar respondents are relevant.Mining the Intrastat database by neuralnetworks (NN) reveals that it is a rational optionthat could present a solution. NNs find patternsand relations between similar respondents. Inthis way we are able to estimate items if wehave enough data available from otherrespondents.Fuzzy rules expressed by linguistic terms andquantifiers reveal levels of similarity betweenimputed and surveyed values.Similar techniques are also promising fordissemination. People prefer to use expressionsof natural language in searching for useful dataand information. For example: select regionswhere most municipalities have a small altitudeabove sea level, etc. The result is entitiesranked according to the degree of match to thequery condition.Modernization of the first and the last stage ofdata collection could create a chain reaction ofimprovements in data quality. Better datadissemination (by flexible queries) couldmotivate respondents to provide their own datamore timely and accurately, and reduce thefrequency of missing values implying moreefficient imputation (less missing values andpowerful neural networks).Relevant equations, models and experimentaltools have been created in order to evaluatepros and cons. The next step is the creation offull functional tools and their adaptation toparticular needs.What does Big data meanfor official statistics?A new paper prepared by leading internationalexperts has recently been released by the High-Level Group for the Modernisation of StatisticalProduction and Services:http://www1.unece.org/stat/platform/pages/viewpage.action?pageId=77170614
  • 8. Tools for a SprintCarlo Vaccari (a sprinter)In Ottawa from 8 to 13 April 2013 we had a Sprint for the Plug & Play Architecture Project. People fromAustralia, Canada, Eurostat, Italy, Mexico, Netherlands, New Zealand, Sweden, UNECE and UnitedKingdom met together to start defining a “common statistical production architecture for the worlds officialstatistical industry.” as stated by High-Level Group for the Modernisation of Statistical Production andServices (see http://www1.unece.org/stat/platform/display/hlgbas).The objective of the sprint session was to ensure agreement on key principles regarding the architecturalblueprint to build an interoperable statistical industry platform.We will discuss the documents produced by this meeting over the next few weeks. Here I just want to showyou which tools were used in the Sprint.Paper, a lot of paperWe wrote a lot on sheets, flip charts, post-its of any color and any shape.Paper was used to explain, show, collect, store, group and debate ideas.White-boardsWe used many white-boards writing there with markers of any type – often we used cameras/mobiles to geta picture of what was written to be able to transfer the concepts to digital files (we would love smart boardslike: www.youtube.com/watch?v=NZNTgglPbUA)MixedAnd yes, we used paper and boards together, very useful when youwant to group concepts and keep track of what was done.
  • 9. WikiWe inserted documents, presentations, images,discussion, glossary and so on in the UNECEwiki.Mind MapsOften we used Mind Maps to capturebrainstorming and discussions: one of thebest ways to avoid losing ideas andsummarize what has been said in lengthydiscussionsPresentationsPresentation software was used not only toprepare slides to present, but also to draw anddevelop schemas. Using notes and colors,presentation software has then been used as akind of digital dashboard.Lego bricksEach participant received from our wonderful facilitators threeLego bricks of different colors. We had to raise them toindicate respectively: “I want to speak” “Off topic” “Too muchdetail”. A very simple way to force participants to follow rulesfor an efficient exchange of views.LolliesEach participant brought sweets (“lollies” in Australian English) fromtheir country, to share with partners. Biscuits, chocolates, sweets of allkinds were the fuel to provide energy to tired brains
  • 10. Understanding “Plug and Play”Marton Vucsan (Statistics Netherlands)I have high hopes for the Common StatisticalProduction Architecture (CSPA) Project, commonlyknown under the more profane title of Plug andPlay. Although it sounds easy, it may prove to behard, very hard. From what I hear, “plug and play”has many interpretations. Some point in thedirection of the feared “mother of all systems”projects that never work. Understanding what isreally meant by the current CSPA project isimportant because CSPA is something completelydifferent from the big feared projects of the past.CSPA is about reducing complexity and gettingoperating system independence. It is also aboutsharing and reducing our efforts while still gettingwhat we need. To achieve this we have to realizethat our means of production are composed ofdifferent levels of abstraction. There are themethods, the process descriptions and finally theapplications (I am deliberately keeping it simplehere). Normally, to arrive at a statistical output wedescribe a method, create a process and build anapplication. All three are normally monolithic innature and custom made. Unshareable, un-reusable, expensive, complex. The stuff called“legacy apps”, the stuff we should stop making.CSPA starts with the insight, that splitting things upreduces complexity. The GSBPM does this; it splitsup the statistical process into easy to understandsub-processes. Thinking and building in sub-systems reduces complexity and increasesreliability. Many programmers struggle with this,trying to split up a given solution into meaningfulparts and often failing. In hindsight, the reason forthat failure is obvious; the reduction in complexityhas to be done at a much higher level. Thecomplexity is often in the methods and the way theprocesses were thought up, independent from theIT implementation. If we really want to reducecomplexity, that is our point of attack: the levelwere we specify our statistical recipe.As a statistical community, we seem to agree thatstatistical outputs can be produced by processescomposed of GSBPM sub-processes. With theright compromises we will be able to use thesesub-processes or components across a broadrange of statistics and agencies; like the engineson a plane or the motor management system inyour car. Just like the conceptual understandingthat a car and a plane are a collection of functionalsub-systems, we need to understand that astatistical production system is a collection offunctional sub-processes.Once we are able to think of our processes asassemblies of components we can reuse them orexchange them. Of course it is not that simple, butthere are powerful forces at work to make thathappen. Look what happened in other industries.When a component, say a motor managementsystem, is available, most designs gravitate tousing that component because it is much cheaperthan to “roll your own”.A component can be manufactured separatelyfrom the system it will be used in. Rolls Roycedon’t need planes to manufacture engines. Thekey is to do it at the right conceptual level. Othershave done it (look at your phone); we can do ittoo!Many statistical organisations are modernisingusing Enterprise Architecture to underpin theirvision and change strategy. This enables them todevelop statistical services in a standard way.Enterprise architecture creates an environmentwhich can change and support business goals. Itshows what the business needs are, where theorganisation wants to be, and ensures that ITstrategy aligns with this. It helps to remove silos,improves collaboration and ensures thattechnology is aligned to business needs.In parallel, the High Level Group for theModernization of Statistical Production andServices (HLG) is developing the CSPA. This willbe a generic architecture for statistical production,and will serve as an industry architecture forofficial statistics. Adopting a common architecturewill make it easier for organisations to standardiseand combine the components of statisticalproduction, regardless of where the statisticalservices are built.The CSPA also provides a starting point forconcerted developments of statisticalinfrastructure and shared investment acrossstatistical organisations.Version 0.1 of the CSPA documentation hasjust been released for public comment at:www1.unece.org/stat/platform/x/_ISwBYour feedback is welcome!