Access to data sources is a crucial factor for empirical research and will gain further importance in the future. While financial data, also historical, for US companies are available, the same is not true for German companies. Our goal is to establish a widely accessable database for reliable financial data of in German stock markets listed companies over a large time period. The yearly published “Aktienführer Hoppenstedt” is a widely used source for financial of German companies, but older volumes are only available in printed books.
Moreover, the books is copyright protected material. We overcome both limitations by transforming the information into a database open for research. The final result of the project is to create a database in which researchers in Germany can query and export the collected data to stock companies for two decades (1979-1999). In the spirit of open science, the research findings based on this data can easily be replicated and validated by other researchers as well. The project is funded by the German Research Foundation (DFG).
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Born print, reborn digital - the Hoppenstedt Data Archive
1. Born print- reborn digital
the Hoppenstedt Data Archive
ECDA Bremen 2014
Session Con1-B 02.07.2014
Dr. Irene Schumm, Sebastian Weindel, Dr. Philipp Zumstein
(Universitätsbibliothek, Universität Mannheim)
2. / 24
Content
• Preface
• Outline & motivation
• Creation of the database
• Status quo
• Outlook
• Questions and discussion
Content 2
3. / 24
University of Mannheim
• Business School: Triple Crown
• Focus on empirical research in business studies, economics
and social sciences
• Strong cooperation with research institutions & consulting
• Library supports this research
• DFG projects: Desbillons, InFoLiS, Linked Open Data,
Semantic Web, Collaborative Tagging
0 Preface 3
4. / 24
A common Problem: Data Availability
• Stock data, financial statements, interest rates, option prices,
money markets, highly specialized shares easily accessible,
but mainly facts and figures
• No easily accessible source of historical data available for
German market
• Result: high costs of research
• Aktienführer Hoppenstedt contains high quality data
• Offers a long time period
I Motivation & Outline 4
6. / 24
Goal
• Creation of a research database based on the years 1979 –
1999
• Free access for research in Germany
• Data export for further computation
I Motivation & Outline 6
7. / 24
Image Digitization
• Migration of the printed pages to an electronic image format
including an online presentation
• Coverage: the latest 21 volumes (1979 – 1999)
I Motivation & Outline 7
8. / 24
Creation of a Database
• Extraction of the raw data from the image scans
• Data modelling, management and creation of a database
• Data access
I Motivation & Outline 8
10. / 24
Step 1: Requirement Analysis
• Cooperation with researchers specialized in
Corporate Finance
• Implementation of scientific expertise
• Detailled schematics prior to actual database
setup
• Testing of the planned framework for real research
problems
• Data scheme directly translates to possible filtering and
exporting
• Data modelling with ERM -> optimal blueprint
II Creation of a 10
11. / 24
Step 2: Data Structure & Modeling as an ongoing
process
• Creating rules for different type of data
• Standardized denomination of data fields
• Optimization based on actual data
• Transfer of data files to actual SQL tables
II Creation of a 11
12. / 24
Step 3: Web-Based User Interface
• Extraction mask instead of SQL interface
• Provision of categories for easy access
• SQL requests happen in the background
• Selection of the data via
checkboxes, dropdown, …
• Download as CSV, TXT, …
for further computation
II Creation of a 12
13. / 24
Status quo
• Data capturing completed
• Refined modelling of the structure based on delivered files to
get a common structure
• Approach of reducing redundant data
• Adding volumes on year at a time
• Building of a web frontend
• Feedback from researchers ist constantly collected
III Status quo 13
15. / 24
Challanges
• Automated combination of different names
– AKA Ausfuhrkreditgesellschaft GmbH
– AKA Ausfuhrkredit-Gesellschaft mbH
– AKA Ausfuhrkredit GmbH
– AKA Ausfuhrkredit-GmbH
• Identification of unique persons
– Throughout the years
– Throughout the companies
III Status quo 15
16. / 24
Challanges
Aktienführer 1979 Aktienführer 1994
16
• Automated combination: same name but different data
III Status quo
17. / 24
Outlook
• Extension of the coverage with older and newer volumes
• Adding freely available data like Wikipedia, Geocode
• automated computation / visualisation
• Extension of the license to provide
wider access
• Predefined datasets (eg. DAX30)
IV Outlook 17
22. / 24
Milestones by April 2015
• Fully functional database containing 1979 -1999
• Full search via frontend
• Free access of digital company profiles
• Free database access for research in Germany
• Free exporting function as panel data set
• Side effect: historical record of WKN
IV Outlook 22