Your SlideShare is downloading. ×
What's a City Transport System Got to Do With Publishing Data in an Output Database?
What's a City Transport System Got to Do With Publishing Data in an Output Database?
What's a City Transport System Got to Do With Publishing Data in an Output Database?
What's a City Transport System Got to Do With Publishing Data in an Output Database?
What's a City Transport System Got to Do With Publishing Data in an Output Database?
What's a City Transport System Got to Do With Publishing Data in an Output Database?
What's a City Transport System Got to Do With Publishing Data in an Output Database?
What's a City Transport System Got to Do With Publishing Data in an Output Database?
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

What's a City Transport System Got to Do With Publishing Data in an Output Database?

837

Published on

It is easy to compare a city transport system to the process of publishing statistical data on a statistical website. There are completely unorganized systems, where everyone drives to work in their …

It is easy to compare a city transport system to the process of publishing statistical data on a statistical website. There are completely unorganized systems, where everyone drives to work in their own cars, takes whatever route is most convenient at the time and expects to park as close as possible to their destinations. This is similar to those systems in which there are no rules as to how, when, where and in what form data are published. There are several reasons why neither such a transport system nor such a statistical output database is preferable.
Conversely, there are completely organized systems, where all of the commuters use a public transportation system designed to their needs. Users adjust to the various schedules and transportation availability in order to reach their goals. This corresponds to a metadata-driven system where a well organized metadata repository runs data publishing through a pre-defined process based on integrated databases and templates.
This paper focuses on work done and lessons learned during a project of upgrading the Slovenian statistical output database from a file server to a macro database.
Presented at International Marketing and Output Database Conference, Ireland, Cork 2007

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
837
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. International Marketing and Output Database Conference Blarney, Cork, 24th –28th September 2007 What's a City Transport System Got to Do With Publishing Data in an Output Database? Katja Šnuderl Statistical Office of the Republic of Slovenia katja.snuderl@gov.si Abstract It is easy to compare a city transport system to the process of publishing statistical data on a statistical website. There are completely unorganized systems, where everyone drives to work in their own cars, takes whatever route is most convenient at the time and expects to park as close as possible to their destinations. This is similar to those systems in which there are no rules as to how, when, where and in what form data are published. There are several reasons why neither such a transport system nor such a statistical output database is preferable. Conversely, there are completely organized systems, where all of the commuters use a public transportation system designed to their needs. Users adjust to the various schedules and transportation availability in order to reach their goals. This corresponds to a metadata-driven system where a well organized metadata repository runs data publishing through a pre-defined process based on integrated databases and templates. This article focuses on work done and lessons learned during a project of upgrading the Slovenian statistical output database from a file server to a macro database. Context Following the general trend of making statistical data available on the web, the Statistical Office of the Republic of Slovenia (Statistics Slovenia) decided to build an output database. First databases (Agriculture Census and Population Census) in 2003 were based on the PC-Axis file format and tools. As the concept has proven to be efficient, Statistics Slovenia has decided to migrate all of its dissemination to the output database. The dilemma of choosing either a file server system or an SQL macro model was always present, until some largest tables hit the technical limitations of the file server system. Within a new project in the field of External Trade a new PC-Axis SQL macro database was built. Having experiences with both systems and with migrating from one to another helped at identifying a metaphor that can help "non-IT people" understand the differences between table and database management.
  • 2. Katja Šnuderl: What's a City Transport System Got to Do With Publishing Data in an Output Database? 1. Introduction It is all about people. There is no IT solution being run by machines for machines. Each is created by people, maintained by people and used by people. Therefore, when building an output database it is important to understand how the human mind works. Somehow it seems we believe that everything that looks simple is simple. But in reality to make a simple application, where a user can understand the features easily and learn only by doing, it takes thorough analysis of users' needs, their behaviour, technical possibilities and an exacting decision process. It takes less work to make something that looks complicated and is difficult to use. In terms of a transport system we could say that good transport networks don't just happen. It takes a lot of effort to turn a chaotic situation into a well run public service. Good route maps and schedules are based on user needs analysis and technical possibilities. They evolve for years. Basic preconditions for a succesful project are sharing the information (among al participants and cooperating parties in the project), understanding the project goal and decision and (management) support. No support is possible without understanding the problems. The comparison of building an output database with a transport system can sometimes help us explain basics of standardization and changes to someone who sees building a database purely as an IT matter. Management can support our needs even without understanding IT matters – if we know how to explain them in an understandable way. Since transport is somehing most people know and use, it can be used as a useful comparison. 2. "Keep it as it is, we're fine" There is always a problem when a system changes. The new one doesn't always support all the options the old one had. Many people ask why changing a system that runs well at all, but if this view was always respected we'd be still using carriages. The project on External Trade was built in order to replace dissemination of data in the Statistical Databank, an older instance of the output database. The Statistical Databank had a lot of regular users who extracted data monthly. However, only one kind of extraction was possible: one flow (exports or imports) for one time period by tariff codes (for one country or total) or by countries (for one tariff code or total). In the new database users can combine flows, several time periods, many tariff codes and many countries. The output table always has a multidimensional structure and presents also empty cells – if a user selects a country with no flows, the country is listed in the table with appropriate statistical sign. -2-
  • 3. Katja Šnuderl: What's a City Transport System Got to Do With Publishing Data in an Output Database? Regular users, who were adjusting to the old database for years, had many problems and special requests when introducing the new system. We had to enlarge the selection size limit in the first week and we introduced new functions to filter data according to existing data flows. Luckily all users will benefit from the new functions, though not all parameters of the previous output were met. 3. "Don't just just replace cars with buses" We often say that there is no IT solution that could change a process by itself. Changing only the technical part of the process is similar to giving people buses instead of cars. Without changing anything else, people would probably start driving one bus each to the workplace. A project manager should be careful in preventing usage of new tools in old and obsolete ways. At the same time it is essential to know that users have to adjust to new tools at different levels and not all of them can be the "drivers". At Statistics Slovenia we chose a step-by-step approach when building the output database. The first stage was building the file server, where procedures and tools are easiest to understand for statisticians who were used to preparing tables in spreadsheets. The first tables were always prepared by the support team in order to meet all the general rules. The first examples also helped statisticians understand the multidimensional table structure. At the beginning we always took what was available and tried to create a comprehensive multidimensional table from existing tabulations (published tables). In the last year a major step was made when we introduced new tabulation rules based on our experiences. The new rules introduce a clear multidimensional structure, where the statistician only defines the content of the table. The programming unit then prepares a new tabulation with the available tool (from the view of the source or the responsible person) by the general rules of tabulation for the PC-Axis database. The main result of the whole exercise is higher understanding of multidimensional table structure by the statisticians and the programming unit. But, when preparing these tables statisticians had to learn and use new tools for table management. They have to update existing tables with new time periods themselves. When building the new macro database, the next step was taken. Here statisticians only deal with content definition and don't manage the tables in any way. Once the data for the new time period are ready, the support unit pulls data into the macro database. The statistician can make the final check whether data and metadata are ready to be published. The procedure of pulling data is manual for now and will be automated when it is stable. At the early stage we prefer to do it manually in order to learn how the automated process should run. -3-
  • 4. Katja Šnuderl: What's a City Transport System Got to Do With Publishing Data in an Output Database? 4. Transport logistics is complex In reality nobody expects a city tram system to cover all the areas of the city. Transport modules (trains, trams, metro, buses, cars, etc.) are differentiated but at the same time integrated and can be used successively. In the same way a good IT system should be developed in modules - coherent, integrated and supporting each other. When building our new output database a decision was made that new applications shouldn't depend on any other system within Statistics Slovenia. It was understood that the dissemination "module" will be integrated with the metadata system, but only at a later stage. Working other way could reasonably slow down the project or even cause failure. For classifications we decided to pull them from the classification server and maybe at a later stage use direct views. But, as not all classifications are always prepared in the server, a backup option to be able to import classifications as TXT files was introduced. A similar solution was introduced for importing data into the output database. We expect all data to be available in micro or macro databases eventually. Currently at Statistics Slovenia we still maintain the variety of sources of data. Input tables are created from relational databases, flat files and Excel spreadsheets. Tools for tabulation are versatile, from SQL queries and views to Cobol, TPL, SAS and Excel tabulations. We even prepared a simple converter for TXT files from TPL to be converted to the correct CSV structure. So even though the project was run on data for External Trade (available in an Oracle database), procedures to import data from other SQL databases or CSV files or even existing PC-Axis files were developed. Having the old output database (file server) and building the new one at the same time brought us the luxury of having an option to keep them both. Our strategy is to eventually migrate all data to the SQL Macro Database, but there is no need to do it before input data sources are consolidated. For now both systems will be supported and integrated. Another aspect of coexistance of transport systems is the image of simplicity. When a system runs smoothly and is easy to use, usually a lot of efforts were made towards integrating and coordinating different modules. Intuitive tools are based on lots of axperiences, selection of needs and testing. On the other hand, if a system looks complicated and is difficult to use is very easy to develop. You simply respct all needs and make no selection. In the proces of preparing the specifications of the output database project a lot of emphasis was given to the expected outcome, especially with the end-user solution (web interface to view the data) in order to make it intuitive and easy to use. Unfortunately fewer experiences were available when building the database management application, so the tool turned out to be rather complicated to use. -4-
  • 5. Katja Šnuderl: What's a City Transport System Got to Do With Publishing Data in an Output Database? 5. "Let the grass grow, please!" Allowing exception to rules is similar to building parking places where people tend to park on the grass. Finally everything is a parking place and the chaos remains. There is no green colour to calm the nervous drivers down anymore. Already in our first output database some general rules were introduced. We had the file naming convention (unique file names within the whole system), corporate metadata, common classifications and some standard links (to methodological explanations, the release calendar and questionnaires). But in a file server it is difficult to validate each and every file whether it is compliant to the rules. As it was done manually, not all exceptions were noticed and some were even agreed upon. On the other hand, when we built the macro database we formed some very strict rules. For example, all classifications in use have to be maintained in the classification server. Even though there is an alternative to import classifications, all tables with exceptions will be maintained in the file server. This decision is based on the workload balancing – in the macro database the management of metadata is done by the support unit. If statisticians demand to maintain an exception to the rule, they have to manage the table themselves. They can only do that within the file server. Even in the long run we don't plan to allocate management of the metadata from the support unit to the statisticians. 6. Why bother with anything else than a taxi? In some big cities around the world people don't use public transportation but the taxi service. There is no worrying about schedules or need to learn which route to go and which number to take. In output database management terms there can be a support unit that manages all the dissemination of statistical data. Statisticians are only involved in managing the statistical process up to dissemination. They don't have to learn or use any new tools to prepare data for dissemination. Statistics Slovenia is relatively small. The output database support unit grew to 5 members who work on regular production and development in parallel. Therefore the process of producing files to be published was organized within the subject-matter units from the early beginning. One of the arguments for such a decision was also knowledge, as only statisticians knew the content of a statistical survey and could define expected outputs. But through the file server management also experiences and knowledge within the support unit were collected. While building the new macro database we wondered whether there is any need to put any technical burdens on the content managers. We decided no to do so for the start, so all technical matters are done within the output database unit. -5-
  • 6. Katja Šnuderl: What's a City Transport System Got to Do With Publishing Data in an Output Database? It is always a matter of balancing – if all management is given to subject-matter units, it is not very probable that the coherence principles would be met. If all management is centralized, subject-matter units could oppose solutions that don't support their special requirements. So it is important to set some clear rules and introduce validation tools that support these rules on one hand, and balance management between the content managers and output database team on the other. 7. "Lost" Not many people get lost in the Paris Metro network. At every station it is easy to find maps and information where to exit and where to continue to go the right way. But in another country it is fairly easy to miss the Haag train station and end up in Rotterdam. As an output database includes more and more data, it also grows larger and larger. It is important to build a navigation system that helps users easily navigate within the database. This refers either to entering the database to find the data or later to find the way back. The first challenge is how to build an efficient way to find the data. The new output database at Statistics Slovenia offers several options. One is browsing through the content tree from the starting page of the database. There all subjects are available and users have to open the content tree and check table titles whether they seem compliant with their needs. This option is available without additional maintenance of metadata, just using the database content definitions. But, besides the entry page we've introduced an option to open the content tree at any level within a subject area. For this purpose we use content identification numbers, unique and standardized among different dissemination products. For example, on our website every theme (e.g. Prices) has an ID number. Opening the database content tree with the same ID number opens only items within the same theme (Prices). Identifications go down to a single table. When the content tree opens partially, the current location is read from the database and written in the header section. In the next step we will add an option to search for tables. We plan to introduce a keyword search, where a pre-defined list of keywords will be prepared and linked to the tables. Users will only be able to select words from the list. The words will be suggested while typing the letters. The list of keywords will be maintained regularly in order to support users' needs. When users select data from a table in the database, they are often interested in continuing work on other tables from the same content. To support such request, we introduced a command "List of tables" in the menu bar, which opens the content tree for the same content. -6-
  • 7. Katja Šnuderl: What's a City Transport System Got to Do With Publishing Data in an Output Database? 8. "Shinkansen or a good old tram"? We are yet far from Shinkansen and the Japanese transportation system. Actually in Slovenia one can experience that it is not enough to replace old trains with new ones that can speed up to 200 km/h. Here at some places they have to slow down to 50 km/h or less, otherwise the tracks would collapse. Or you get stuck on a train station because nobody knows how to unlock a secured carriage and after half an hour of trying and thinking they have to move people and uncouple the carriage so the train can proceed. So, what we did for now is limiting parking places for cars within the city, introduce many bus routes, one intercity train route ending in the suburbs and one tram route from the suburbs to the centre. The system might be not the most modern, but it has proven to be is reliable. In reality we reduced the number of published Excel spreadsheets in favour of multidimensional tables, introduced standard procedures for tabulation of multidimensional tables, included the classification server in the dissemination process and built a macro database for data on External Trade. The next "tram" routes will be prepared for Earnings and Tourism Statistics. After deciding to maintain both systems (the file server and the macro database) our main goal was to integrate them without putting burden on the user when searching for data. Basic principles are: a) Single entry point b) Same "Look and feel" c) Same functions + advanced options in the macro database d) Same support (header menus) e) Single registration for advanced user (option to save queries). A lot of effort was put into coherent design of the two systems, adjusted to the design of the Statistics Slovenia website. The only connecting point of the two databases is the content tree view, the entry page of the database. From there users are redirected either to a table in the file server database or in the macro database. In the tree view there are also links to related content: First Releases, methodological explanations, statistical questionnaires, special publications, links to external websites (data on websites of other governmental bodies) and links to the Eurostat database. To view or download, a data user can select any values from the table, change texts/codes presentation of values, pivot the table, view selection-specific footnotes, change decimals presentation, display data in graph or map and download data to several formats. Advanced features in the macro database support selection and filtering of hierarchical variables by levels, removing empty lines, sorting and a better structured presentation of footnotes. With the new database structure we are also introducing pre-defined tables, where less experienced users can look at data just by clicking the table title. The content of pre-defined tables was defined by each theme editor. -7-
  • 8. Katja Šnuderl: What's a City Transport System Got to Do With Publishing Data in an Output Database? 9. Conclusion Our habits differ in different societies. Not every country has as many problems with transport systems as Slovenia. But still, there are some basic principles that everyone understands and that can be used when explaining the principles of building a new IT solution to a non-IT person. During the project of building the new dissemination macro database our main goal was to build a system that will support different contents, different input data formats and versatile users. From the start we have been careful about standardisation, coherence and process management. We are building on our experiences with the file server database. At the same time we are trying to meet most users' needs. Statistics is produced by people for people and our role in this process is to make it accessible, reliable and understandable. -8-

×