06/16/2000: TDT Documentation in MS Word Please read Transmission_disequilibrium_tests.doc in MS Word format. It was created by Dr. Chris Amos 9/29/1998: Updates There is no need for alleles to be down coded. compute7.sas has been modified today to take real allele symbles (numeric please!) into TDT calculation. Final output will show this change. Allele symbols may not be sequential numbers any more. 1. Introduction on TDT: This is a description on one-marker TDT calculation. Model programs are tdt6.sas and compute7.sas as of today (June 12, 1996). Simulation part has a separate explanation which will be implemented soon. In this version, single parent structures are dealt with, an error checking scheme is introduced. To run this program successfully, we recommend you have SAGE available, or a program produces nuclear family structures like SAGE/FSP does. a. Single parent stucture: In a single parent case, suppose the father's type is 12, mother 00 (missing) and a kid has a type of 12. Then first, 1 is assumed transmitted, 2 not transmitted, with a weight of 0.5; second, 2 is assumed transmitted and 1 is not transmitted, also carrying a weight 0.5. For other cases where definitive transmission status is obtainable, mark the transmission pattern with such shuffling and assign a weight of 1. For example, if father has 12, mother 00 (missing), kid has 13, then 1 is definitly considered transmitted, 3 not transmitted, with a weight of 1, and that's the end of the story. b. The error checker: It picks up all the "left-overs" after a genotype matching is exhausted, meaning, a kid's given allele has failed to find a match from any four (4) of the parental allele. In this case, the user needs to go back to the data and fix the error before any meaningful calculation can go forward. For the time being, SAS will stop operating and posts an error message in both its output files (.log and .lst). c. Data: For data preparation, user should prepare the data into a standard pedigree structure. In a typical 2-dimensional table, there could be some columns (variables) as so listed below: famid (family id) id (person id within each family) faid (father's id) moid (mother's id) sex (individual's gender) aff (affection code, 0=missing, 1=non-affected, 2=affected) m11 (marker 1 allele 1) m12 (marker 1 allele 2) m21 (marker 2 allele 1) m22 (marker 2 allele 2) Each line will in turn represent a person. For now, TDT only works with nuclear family structure. If the original data is organized in extended family fashion (grandparents+parents+siblings), then the data should be preprocessed into nuclear family structure first. Currently, SAGE provides a program FSP to cut large families into nuclear pieces. d. The program: The core file is compute7.sas. There has been at least 6 major updates since the beginning of the project. It is in turn called by a master program tdt6.sas. The latter is nothing but a calling engine. More detailed words can be found from within the programs themselves. 2. Example: We figured that the best way to use this program is to follow an example, and observe every step along the way. We borrowed some data from England to illustrate this. The original data shipped to us in Microsoft Execl format and have been converted into a standard ASCII file called raw.dat. Let's start with raw.dat. a. run sas t.sas this would produce a sas dataset file called t.ssd01. It also produces a by-product called marker.names, for an executable program to pick up. Executable program t in trun in created from a FORTRAN program t.f. The most important output file is an ASCII file fsp.dat. This can be linked to fort.11 for fsp to run. Remember to change -9 into two spaces using your favorite editor. NOTE: Modify marker.names immediately to delete lines does not contain allele names. A valid marker.names file should look like this: a1 a2 c1 c2 dr1 dr2 The above lines suggests that you have three (3) markers, a, c and dr, each is represented by two alleles (next to each other). b. Compose your fsp.par file and link it to fort.1, then run fsp to get fort.23, which has nuclear family structure NOTE: Always check to see if fort.21 smokes. Multiple-mating is O.K. though. c. run read23 to convert fort.23 into tdt.dat, an ASCII file retains the nuclear family structure but add a sequential family id into the file. d. run sas nuc.sas to convert tdt.dat file into a sas dataset file called t.ssd01. e. run t, executable from t.f this would produce a file called reorder.sas, to be included in t1.sas. This program also prepares variable labels for marker alleles. f. run sas t1.sas This would reorder allele sequencies to a bounch of sequencial numbers, so that tdt can process later. A permanent sas data file called t1.ssd01 is produced afterwards. By product t1.lst is necessary to show the mapping schems. g. run tdt1.sas, tdt6.sas and tdt8.sas, using t1.ssd01 as input file and conduct different tdt tests h. A detailed explanation on tdt8.sas and sim8.sas will be provided later. For the time being, contact Dr. Amos at camos@request.mdacc.tmc.edu. README file prepared 10/01/1997 Last Updated: 11/06/1997