[R] filter a tab delimited text file
    Duke 
    duke.lists at gmx.com
       
    Fri Sep 10 19:24:46 CEST 2010
    
    
  
  Hi all,
I have to filter a tab-delimited text file like below:
"GeneNames"    "value1"    "value2"    "log2(Fold_change)"    
"log2(Fold_change) normalized"    "Signature(abs(log2(Fold_change) 
normalized) > 4)"
ENSG00000209350    4    35    -3.81131293562629    -4.14357714689656    TRUE
ENSG00000177133    142    2    5.46771720082336    5.13545298955309    FALSE
ENSG00000116285    115    1669    -4.54130810709955    
-4.87357231836982    TRUE
ENSG00000009724    10    162    -4.69995182667858    
-5.03221603794886    FALSE
ENSG00000162460    3    31    -4.05126372834704    -4.38352793961731    TRUE
based on the last column (TRUE), and then write to a new text file, 
meaning I should get something like below:
"GeneNames"    "value1"    "value2"    "log2(Fold_change)"    
"log2(Fold_change) normalized"    "Signature(abs(log2(Fold_change) 
normalized) > 4)"
ENSG00000209350    4    35    -3.81131293562629    -4.14357714689656    TRUE
ENSG00000116285    115    1669    -4.54130810709955    
-4.87357231836982    TRUE
ENSG00000162460    3    31    -4.05126372834704    -4.38352793961731    TRUE
I used read.table and write.table but I am still not very satisfied with 
the results. Here is what I did:
expFC <- read.table( "test.txt", header=T, sep="\t" )
expFC.TRUE <- expFC[expFC[dim(expFC)[2]]=="TRUE",]
write.table (expFC.TRUE, file="test_TRUE.txt", row.names=FALSE, sep="\t" )
Result:
"GeneNames"    "value1"    "value2"    "log2.Fold_change."    
"log2.Fold_change..normalized"    
"Signature.abs.log2.Fold_change..normalized....4."
"ENSG00000209350"    4    35    -3.81131293562629    
-4.14357714689656    TRUE
"ENSG00000116285"    115    1669    -4.54130810709955    
-4.87357231836982    TRUE
"ENSG00000162460"    3    31    -4.05126372834704    
-4.38352793961731    TRUE
As you can see, there are two points:
1. The headers were altered. All the special characters were converted 
to dot (.).
2. The gene names (first column) were quoted (which were not in the 
original file).
The second point is not very annoying, but the first one is. How do I 
get exact the headers like the original file?
Thanks,
D.
    
    
More information about the R-help
mailing list