case study: unido 11.4.20081metis 2008, luxembourg: valentin todorov case study: unido valentin...
TRANSCRIPT
Case Study: UNIDO
11.4.2008 1METIS 2008, Luxembourg: Valentin Todorov
Case Study: UNIDO
Valentin Todorov
UNIDO
METIS 2008 (Luxembourg, 9-11 April 2008)
Case Study: UNIDO
11.4.2008 2METIS 2008, Luxembourg: Valentin Todorov
Outline
• Introduction and Overview• Statistical Metadata Systems and the Statistical Cycle• Statistical Metadata in each phase of the Statistical Cycle• Systems and Design Issues• Organizational and Cultural Issues
Case Study: UNIDO
11.4.2008 3METIS 2008, Luxembourg: Valentin Todorov
About UNIDO
• UNIDO was set up in 1966 • Became a specialized agency of the UN in 1985• Promote industrialization throughout the developing world • 172 Member States (as of 3 December 2007)• Headquarters in Vienna• Represented in 35 developing countries
Case Study: UNIDO
11.4.2008 4METIS 2008, Luxembourg: Valentin Todorov
About Statistics in UNIDO
• Service Module “Industrial Governance and Statistics”:– monitor, benchmark and analyse their industrial performance and
capabilities – formulate, implement and monitor strategies, policies and
programmes to improve the contribution of industry to productivity growth and the achievement of the UN Millennium Development Goals (MDGs)
• Building capabilities in industrial statistics - providing technical assistance to: – Introduce best practice methodologies and software systems – Enhance the quality and consistency of the
industrial statistics databases
Case Study: UNIDO
11.4.2008 5METIS 2008, Luxembourg: Valentin Todorov
About the Organisation
All statistical activities are carried out by the Research and Statistics Branch – PCF/RST
Case Study: UNIDO
11.4.2008 6METIS 2008, Luxembourg: Valentin Todorov
Overall strategy and metadata management principles
• Conceptual development was initiated in 1999• An integrated data and data documentation (metadata)
framework• A smooth migration policy - must not disrupt established
UNIDO data services• Stepwise development in the context of a migration project
of the statistical databases from an IBM mainframe to a client/server platform
• Backed by the UNIDO Quality Assurance Framework
Case Study: UNIDO
11.4.2008 7METIS 2008, Luxembourg: Valentin Todorov
Overall strategy (cont.)
• Following the International Recommendations for Industrial Statistics (2008)
• Common formats and nomenclatures for exchange and sharing of statistical data and metadata- SDMX
• Availability of the metadata in three languages (English, French and Spanish)
• Based on a formal framework - the proposed information system architecture comprises two cubes, one for statistical data and another for the metadata interrelated by a set of shared dimensions - see Froeschl et al. (2002), Froeschl and Yamada (2000)
Case Study: UNIDO
11.4.2008 8METIS 2008, Luxembourg: Valentin Todorov
UNIDO Statistical Process
• Initialisation– Pre-filling of the out-going UNIDO General Industrial Statistics
Questionnaire with previously reported statistical data and metadata (non-OECD countries)
– Excel format– In the appropriate language - English, French or Spanish– Automated using the available data and metadata
• Data Collection – NSO: the completed and returned to UNIDO by the NSO
questionnaires (excel format, rarely hard copy) are entered into the system and are ready for further validation and processing
– OECD: Data for OECD member countries (excel format) are ready for further validation and processing
Case Study: UNIDO
11.4.2008 9METIS 2008, Luxembourg: Valentin Todorov
UNIDO Statistical Process• Transformation/Processing
– The data collected from the primary or secondary sources are further transformed to a ready-to use data sets
– The data transformation is done in five stages, which not only constitute an operational framework for UNIDO statisticians, but also provides additional description of statistics (generated metadata which are attributed to each data item)
– After undergoing the complete processing phase the incoming and generated data and metadata are stored in the databases
• Dissemination– International Yearbook of Industrial Statistics– INDSTAT and IDSB CD products– Web Country Statistics (Country Brief)– Ad hock requests by internal and external users
Case Study: UNIDO
11.4.2008 10METIS 2008, Luxembourg: Valentin Todorov
Mapping of the UNIDO cycle phases to these developed by the METIS group
METIS UNIDO
Need Need [optional]
Develop and design Develop and design [optional]
Build Initialisation
Collect Data Collection
Process Transformation/Processing
Analyse Analysis
Disseminate Dissemination
Archive -
Evaluate Evaluation
Case Study: UNIDO
11.4.2008 12METIS 2008, Luxembourg: Valentin Todorov
ISDE Applications
• ADMIN – provides administrative services, like user and authorisation management, logging and auditing of the system, backup and restore management– outside of the life cycle
• Nomenclature Explorer - maintenance of the core definitional metadata (not related to particular data sets or items) – outside of the life cycle
• Questionnaire - management of the pre-filling and distributing of the questionnaires – used in the Initialisation phase
Case Study: UNIDO
11.4.2008 13METIS 2008, Luxembourg: Valentin Todorov
ISDE Applications
• Data Wizard – the main data and metadata maintenance tool – Used in the Data Collection and Transformation phases– Provides services for
• Reading in the data and metadata from the returned back questionnaire (Excel)
• Initial validation of the read in data and storing in the database (at stage 1)
• Maintenance of the data and metadata • Screening • Aggregation and further data validations and transformations
Case Study: UNIDO
11.4.2008 14METIS 2008, Luxembourg: Valentin Todorov
ISDE: Publication applications• Yearbook – a complex set of applications for production
of the International Yearbook of Industrial Statistics – aggregation, layout, – PDF file generation according to pre-defined templates and other
tools– The final result is a publication ready PDF file of about 700 pages
• INDSTAT CD – produce the INDSTAT type of CD products
• IDSB CD – produce the IDSB type of CD products • WEB – generate the necessary data and metadata for
updating the WEB dissemination database – This database is outside of the ISDE system– Managed by the computer section
Case Study: UNIDO
11.4.2008 15METIS 2008, Luxembourg: Valentin Todorov
ISDE Applications
• Presentation Wizard – mainly a visualization tool which can be used in the Dissemination phase for answering ad hock requests, but because of its versatile functionality it finds a wide usage also in the Data Transformation phase
• Other applications – in this category are included any other applications used in the process, like SAS, R, tools for compilation of Production index numbers and National Accounts data (which are outside of the scope of this document)
Case Study: UNIDO
11.4.2008 16METIS 2008, Luxembourg: Valentin Todorov
Implementation Strategy
• Developed in the context of migration from Mainframe to a Client/Server platform
• A stepwise approach was chosen because of the following reasons:– The project was not urgent– The software test and sustaining of the new system - in-house– Only limited resources/funds were available– The staff was very willing to participate in the project– The goal was not only to migrate the system but rather to develop
a completely new one and the requirements were not yet completely specified
– A key requirement was that the established UNIDO data services must not be disrupted
Case Study: UNIDO
11.4.2008 17METIS 2008, Luxembourg: Valentin Todorov
Implementation Steps
• Step 1: High level architecture design, Data model, physical C/S database, definitional metadata tool– Rigorous analysis of the existing system and development of a
data model - as generic as possible in order to be able to accommodate any subsequent changes
– Based on the data model a loader application was developed which allowed in any moment to synchronize the data in Mainframe and in the Sybase database
– The development of the new metadata subsystem was initiated by implementing a tool for maintenance of the definitional metadata
– Thus a kind of proof of concept was successfully completed
Case Study: UNIDO
11.4.2008 18METIS 2008, Luxembourg: Valentin Todorov
Implementation Steps
• Step 2: Reference metadata, dissemination applications – A capture/maintenance tool was developed – The description/methodological metadata – Word, Excel - were
entered into the system– The Mainframe footnote database (data-item level metadata) was
imported– Thus the complete process of maintenance of the available
metadata was migrated to the Client/Server platform – Data dissemination applications were developed which allowed to
produce the recurrent statistical publications/products from the Mainframe system and from the Client/Server platform in parallel - an ideal acceptance test for the new applications by just comparing the results
Case Study: UNIDO
11.4.2008 19METIS 2008, Luxembourg: Valentin Todorov
Implementation Steps• Example: International Yearbook of Industrial Statistics
– From the Mainframe was produced as a camera-ready line printer output which was glued together with many MS Word and MS Excel documents
– From the Client/Server system a page numbered PDF file of about 700 pages is automatically generated
• Step 3: Pre-filled questionnaire, data capturing and maintenance– Pre-filling of the questionnaire - for a second time from the new
Client/Server data- and metadata-base– Development of the data capturing/maintenance tools - now in the
phase of final acceptance testing– From June 2008 - only the Client/Serve system will be used– Ultimate decoupling of the new system from the Mainframe
Case Study: UNIDO
11.4.2008 20METIS 2008, Luxembourg: Valentin Todorov
Metadata classification
No formal metadata classification, but according to their usage and their role in the statistical production process we distinguish roughly between:
• Structural or definitional metadata: refer to metadata that act as identifiers and descriptors of the data (and metadata)
• Reference metadata: describe the properties and quality of the statistical data
• System metadata: used to drive automated processing throughout the phases of the lifecycle
Case Study: UNIDO
11.4.2008 21METIS 2008, Luxembourg: Valentin Todorov
Metadata in the lifecycle• In each phase of the lifecycle the structural/definitional
metadata are used • The structural metadata are created/updated relatively
independently from the lifecycle– Add a new country (e.g. Serbia and Montenegro recently)– Currency change (e.g. Slovenia, Malta and Cyprus recently)– Country groupings: two more countries joined EU (Bulgaria and
Romania)
• No metadata are created in the first and last phase (Initialisation and Dissemination) but it is possible that corrections are performed
Case Study: UNIDO
11.4.2008 22METIS 2008, Luxembourg: Valentin Todorov
Metadata in the life cycle: Initialisation• Pre-filling of the out-going UNIDO General Industrial
Statistics Questionnaire with previously reported statistical data and metadata
• System metadata: drive the automated processing – Template for the questionnaire– Language– ISIC revision– Output format (unit exponent, digits)
• Operational metadata: stage 1 data used for pre-filling• Descriptive, methodological, implicit metadata used for
pre-filling into the questionnaire
Case Study: UNIDO
11.4.2008 23METIS 2008, Luxembourg: Valentin Todorov
Metadata in the life cycle: Data Collection
• After receiving back the completed questionnaires, they are entered (automatically) in the system for validation and further processing
• Together with the data the received metadata are entered into the system
• The provided metadata are sometimes not described from the viewpoint of international comparability but rather from the viewpoint of national standards. In such cases the UNIDO statistical staff re-describes/rearranges the provided metadata into explicit information for the deviation from the international standard
Case Study: UNIDO
11.4.2008 24METIS 2008, Luxembourg: Valentin Todorov
Metadata in the life cycle: Data Collection (cont.)
• Metadata can be attached to each data item– “Missing because of confidentiality reasons” or – combinations of ISIC codes like “1511 includes 1512”
• Data for OECD member countries– collected through joint OECD/UNIDO questionnaire and – transmitted to UNIDO (Excel format)– do not contain metadata (extracted from other OECD
publications)
Case Study: UNIDO
11.4.2008 25METIS 2008, Luxembourg: Valentin Todorov
Metadata in the life cycle: Transformation
• The metadata collected from the NSOs together with the data undergoes the same transformation process as the data and is complemented by metadata generated by the transformation process
• The data transformation is done in five stages - additional description of the data
• At the same time Source and Method metadata are maintained for each data item
• If appropriate, re-description of the provided metadata from viewpoint of international comparability is performed
Case Study: UNIDO
11.4.2008 26METIS 2008, Luxembourg: Valentin Todorov
Metadata in the life cycle: Dissemination
• International Yearbook of Industrial Statistics– the main UNIDO statistical product– the latest yearbook released in 2008 covered the data for the
period from 1995 to latest year– The country data was updated for 74 countries and is compiled
from the Stage 1 and Stage 2
• CD products, which might include data from all stages described earlier - www.unido.org/statistics
• Country Brief - statistics by selected variables from the different UNIDO databases for each member state which are posted in UNIDO web-site: http://www.unido.org/statistics
Case Study: UNIDO
11.4.2008 27METIS 2008, Luxembourg: Valentin Todorov
Metadata in the life cycle: Dissemination
Case Study: UNIDO
11.4.2008 28METIS 2008, Luxembourg: Valentin Todorov
Metadata in the life cycle: Dissemination
Case Study: UNIDO
11.4.2008 29METIS 2008, Luxembourg: Valentin Todorov
Metadata in the life cycle: Dissemination
Case Study: UNIDO
11.4.2008 30METIS 2008, Luxembourg: Valentin Todorov
Metadata in the life cycle: Dissemination
Case Study: UNIDO
11.4.2008 31METIS 2008, Luxembourg: Valentin Todorov
Systems and Design Issues
• Client/Server architecture build on .Net technology• Centralised database:
– Sybase ASE 12.5 on Linux– Test and production databases
• Client (desktop) applications developed using MS Visual studio in C#
• Commonality through using shareable component libraries – C#
• Other tools:– SAS, R, STATA
• Development tools
Case Study: UNIDO
11.4.2008 32METIS 2008, Luxembourg: Valentin Todorov
Organizational and Cultural Issues
• No specialised metadata roles are necessary– processing of metadata and data are tightly coupled– responsibilities are organized by country
• No special training for the staff was necessary– all statisticians participated actively in the specification and the
development of the system– the system testing was performed by parallel runs on the
Client/Server and Mainframe
• Nevertheless a complete set of documentation and training materials is being prepared– unifying the terminology and the information about the system– induction training of new colleagues– operational and maintenance concept documents
Case Study: UNIDO
11.4.2008 37METIS 2008, Luxembourg: Valentin Todorov
Example:DataWizard
View/EditQuestionnaire
Case Study: UNIDO
11.4.2008 38METIS 2008, Luxembourg: Valentin Todorov
Example:DataWizard
View/EditMetadata
Case Study: UNIDO
11.4.2008 39METIS 2008, Luxembourg: Valentin Todorov
Example: R Graphics
Histogram
Sepal.Width
Den
sity
2.0 2.5 3.0 3.5 4.0
0.0
0.4
0.8
1.2
setosa versicolor
4.5
5.5
6.5
7.5
BoxplotS
epal
.Wid
th
setosa versicolor
4.5
5.5
6.5
7.5
4.5 5.5 6.5 7.5
2.0
3.0
4.0
Sepal.Length
Sep
al.W
idth
Bagplot
-2 -1 0 1 2
2.0
3.0
4.0
Normal Q-Q Plot
norm quantiles
Sep
al.W
idth
Scatter Plot Matrix
SepalLength
SepalWidth
PetalLength
setosa
SepalLength
SepalWidth
PetalLength
versicolor
SepalLength
SepalWidth
PetalLength
virginica
Three
Varieties
of
Iris
Case Study: UNIDO
11.4.2008 40METIS 2008, Luxembourg: Valentin Todorov
Example: Implicit metadata• For example several industry categories can be combined and
reported together by a given country for a given indicator and years• In the questionnaire returned by the NSOs such a combination is
expressed in the following way
…1511 Processing/preserving of meat 1234 a/1512 Processing/preserving of fish … a/1513 Processing/preserving of fruit & vegetables … a/… REMARKS: a/ 1511 includes 1512 and 1513
• ‘Exclude’ for other country specific classification discrepancies
• ‘Substitute’ for synonyms
• Aggregations
Case Study: UNIDO
11.4.2008 41METIS 2008, Luxembourg: Valentin Todorov
Example: System metadata in the Initialisation phase - I
Case Study: UNIDO
11.4.2008 42METIS 2008, Luxembourg: Valentin Todorov
Example: System metadata in the Initialisation phase - II
Case Study: UNIDO
11.4.2008 43METIS 2008, Luxembourg: Valentin Todorov
Example: Descriptiveand methodologicalmetadata used in the Initialisation/Data Collection phase
Case Study: UNIDO
11.4.2008 44METIS 2008, Luxembourg: Valentin Todorov
Example: Metadataattached to each data item used or created in the Initialisation and Data Collection phase
Case Study: UNIDO
11.4.2008 47METIS 2008, Luxembourg: Valentin Todorov
Operational Framework: Stages
• Stage 1 – responses to national questionnaires. Detection and if possible correction of obvious reporting errors– Used for re-filling the following edition of the questionnaire– Data are considered official
• Stage 2 – incorporation of published national data. Inconsistent data are corrected using supplementary information from national publications– Published in International Yearbook of Industrial Statistics– Data are considered official
Case Study: UNIDO
11.4.2008 48METIS 2008, Luxembourg: Valentin Todorov
Operational Framework: Stages (cont.)
• Stage 3 – disaggregation of data. Data are adjusted to eliminate the departures from the level of ISIC aggregation– using national and international sources– using supplementary data
• Stage 4 – automatic disaggregation and interpolation. Missing data are estimated applying related proportion or interpolation whenever applicable– For ISIC 3-digit only
• Stage 5 – estimation of provisional data for the latest years– Selected variables only
Case Study: UNIDO
11.4.2008 49METIS 2008, Luxembourg: Valentin Todorov
Reference metadata
• Implicit metadata – a special class of metadata arising throughout the specific usage of other metadata. Typical example are the ISIC combinations
• Operational Metadata – generated by the process of data transformation and attributed to the respective data items– a stage indicator reflecting the data item’s credibility– “Source” and “Methods” metadata, describing the source of the
data item and methods applied for its generation
Case Study: UNIDO
11.4.2008 50METIS 2008, Luxembourg: Valentin Todorov
Reference metadata (cont.)
• Descriptive and Methodological metadata – received from the primary data reporters and than are further processed together with the data.– During this processing additional metadata can be added.– Can be attached to all possible levels ranging from the complete
data set down to individual data items.
Case Study: UNIDO
11.4.2008 51METIS 2008, Luxembourg: Valentin Todorov
System metadata• Used to drive automated processing throughout the
phases of the life cycle.– layout definitions for the yearbook (for each country, for each
edition of the yearbook).– country lists, used in the automatic generation of the PDF.– installation and packaging lists, directories, templates, etc. for
creation of the CD product.– specific for the application where they are used.