*** Translation between Stata format data and plain text files used in GEODE processor. 


*********************************************.
*****  File locations :.


* The name of the user's original data file :.
global file1 "c:\geode\workshops\stir07\demos\data\lfs_2002extract.dta" 

* The name of the plain text file used for the GEODE processor :.
global file2 "c:\temp\lfs_input.dat" 

* The name of the plain text file produced by GEODE processor :.
global file3 "c:\temp\lfs_output.dat" 

* The name of given to the final Stata file produced by this exercise  :.
global file4 "c:\geode\workshops\stir07\demos\data\lfs_2002extract_v2.dta"  


********************************************.


*****************.
** Step (1) Convert the original SPSS file format into plain text 
**             (with variable names in first row). 
*****************.

use $file1, clear
outsheet using $file2, nolabel replace


*****************.
** Step (2) {Run the GEODE matching procedure on the plain text file}. 
*****************.

* {no Stata contribution}. 
* {GEODE portal reads in file2, and produces file3}. 

*****************.
** Step (3) Read the derived plain text file and convert it into Stata . 
*****************.

insheet using $file3, clear  
sav $file4, replace






********************************************.

** Extension :. 

** Note that translating out to plain text format then reading back in will have the 
*   effect of loosing 'data dictionary' information from the original SPSS data file
*   (i.e. variable labels, missing value declarations, file notes, etc).

** There are a few different ways to prevent this happening though all involve taking 
*    additional steps in the analysis process.  
** We recommend the following, which involves extracting out a subset from your data
*   and running the GEODE procedure only on that subset. 
** (This example involves a file where there are two key indentifier variables, called
*    soc2km and ukempst; in other examples other names would be used, and there may be only 
*    one identifier variable needed). 


** Further file location declaration:. 
* A temporary file name :.
global file5 "c:\temp\part1.dta" 

** Define linking variables :.
global var1 "soc2km"
global var2 "ukempst"


use $file1, clear  
gen caseid= _n 
sort caseid
save $file5, replace
gen occ1= $var1  
gen occ2= $var2  

* Stage (1).
keep caseid occ1 occ2
outsheet using $file2, nolabel replace
* {Stage (2)}.
* Stage (3).
insheet using $file3, clear  
sort caseid
merge caseid using $file5
drop _merge
sav $file4, replace


**************************************************************.

** EOF.