This page describes the metadata standard that the GEODE project requires in order to curate occupational information resources. The metadata is arranged in xml format following the standards of the widely applied Michigan Data Documentation Initiative (DDI , version 2.1). Not all of the fields of the DDI are essential to occupational data curation, so we use a subset of the DDI, which we refer to as the 'GEODE-M' metadata standard.


Introduction / GEODE-M structure

Introduction: GEODE-M data curation

The GEODE project is concerned with organising the distribution and supply of occupational information resources within the social sciences. Information scientists are widely agreed that the most important aid to organising data resources is the consistent use of metadata (data which describes the underlying data resource). Therefore much of the work of GEODE involves 'curating' occupational information resources (that is, adding clearly defined metadata), in order that it may then be readily indexed within the GEODE data access services.

In GEODE we use a metadata standard known as 'GEODE-M' (GEODE-Metadata), which comprises a subset of the widely used DDI metadata system (version 2.1). The motivations behind, and the structure of, this standard are described in several working papers of the GEODE project, see our publications page (see esp. the paper to the Data Curation Conference 2006).

Metadata is information which tells us about the nature of a particular data resource. Metadata on an occupational information file might tell us, for instance, who is the author of the information file, when it was released, and what country and time period it is applicable to. Crucially, in the case of the GEODE occupational information resouces, it should also tell us what is the occupational index unit (typically the Occupational Unit Group) to which the data is relevant (see the GEODE page on occupational unit groups).



Structure of GEODE-M

GEODE-M defines an xml format data file which arranges a selection of xml tags that convey relevant information on the occupational information resources. GEODE-M involves only a subset of the available xml tags from the DDI verson 2.1 scheme (for a full list, see the DDI XML-tag library).

These tags fall within the same 5-category structure of the DDI scheme, illustrated below:

<docDscr> Release date

<stdyDscr> Country; Time period; Author


File sizes;

file formats

<otherMat> Missing data;

Data extensions


<varGrp> OUG variable; Other identifier variables; Output variables

<var> OUG details; output details


The five structures may contain a number of details within their component xml tags. Broadly they involve the following data:

  • <docDscr> - Gives information about the actual metadata, e.g., the contact details of whoever made the data resource available to GEODE
  • <stdyDscr> - Gives information about the underlying occupational data resources, such as which country or countries it is relevant to
  • <fileDscr> - Gives the names and structural details of the electronic file (or files) which comprise the occupational information resource
  • <dataDscr> - Gives information about the content of the electronic file or files (such as the location of variables from a relevant occupational unit group)
  • <otherMat> - Gives further optional information on the data resource

When curating data for the GEODE service, we usually apply a two stage process:

  • Stage 1: Initial supply of only the most crucial data necessary to define an occupational information resource
  • Stage 2: Further updates to the xml files of any further available metadata

The data required at stage 1 is quite limited, illustrated by the exert below (from the Digital Curation Conference 2006 paper):

    <docDscr> … <distStmt> <contact email=""> Paul Lambert</contact> </distStmt>
                         <prodDate date="2006-07-19" >July 19, 2006</prodDate>
    … </docDscr>
    <stdyDscr> … <titl>CAMSIS scales for the UK using SOC2000</titl>
                          <IDNo agency="GEODE">131</IDNo>
                          <distrbtr URI="">Cambridge Social
                          Interaction and Stratification Scales website</distrbtr>
                          <stdyInfo> <!-- information about the data context -->
                           <sumDscr> <timePrd event="start" >2000</timePrd>
                           <nation abbr="GB">United Kingdom</nation> </sumDscr>

      … </stdyDscr>
      <fileDscr id="gb91soc2000.sav"> …
                           <fileName id="gb91soc2000.sav">gb91soc2000.sav</fileName>
      … </fileDscr>
      <dataDscr> …
                     <varGrp name="indexs" var="soc2000s ukempsts stdempsts" >
                     <concept>Index term</concept> ... </varGrp>
                     <varGrp name="outcomes" var="MCAMSISs FCAMSISs">
                     <concept>Occupational information</concept> </varGrp>
                     <var ID="soc2000s" file="gb91soc2000.sav" >
                     <stdCatgry uri="">
                        Standard Occupational Classification 2000</stdCatgry></var>
        … </dataDscr>
      <otherMat> … … </otherMat>



At stage 2, a large volume of additional metadata could be added to the resource. Often, members of the GEODE project themselves will manually add in such metadata, although the data depositor might also wish to do so. The xml file examples below illustrate the end result after 4 different occupational information resoures have been fully curated.

xml file
CAMSIS scale scores for Denmark, downloadable from camsis_dk_example.xml
CAMSIS scale scores for UK (2000), dowloadable from camsis_gb2000_example.xml
ISCO-88 to ISEI SPSS translation index (Ganzeboom), downloadable from isei_example.xml
UK SOC-90 gender segregation statistics, published in Hakim (1998) Social Change and Innovation in the labour market. Oxford University Press. hakim1998_example.xml



See the following for further details on metadata standards in the social science and e-Social Science contexts:


Further issues



