[R] Problems with reading data by readWorksheetFromFile of XLConnect Package
David Winsemius
dwinsemius at comcast.net
Fri May 3 17:54:51 CEST 2013
On May 2, 2013, at 11:00 PM, jpm miao wrote:
> Hi Anthony,
>
> Thank you very much. It works very well. However, after this line
>
>> temp <- sapply( temp , as.numeric )
>
> the data becomes a series of numbers instead of a matrix. Is there any
> way to keep it a matrix?
Perhaps (assuming this were a data.frame to be coerced:
temp <- matrix( sapply( temp , as.numeric ), dim(temp)[1])
But the persistence of the "-"'s is puzzling. You should (as always) have posted the output from dput(temp).
Thanks,
>
> Miao
>
>
>
>
>> temp<-readWorksheetFromFile("130502temp.xlsx", sheet=1, header=FALSE,
> startRow=2, endRow= 11, startCol=2, endCol=5)
>> temp <- sapply( temp , function( x ) gsub( ',' , '' , x ) )
>> temp
> Col1 Col2 Col3 Col4
> [1,] "647853" "1413" "57662" "27897"
> [2,] "491400" "1365" "40919" "20411"
> [3,] "38604" "-" "5505" "985"
> [4,] "576" "-" "20" "54"
> [5,] "80845" "21" "10211" "4494"
> [6,] "36428" "27" "1007" "1953"
> [7,] "269915" "587" "32988" "12779"
> [8,] "224494" "-" "30554" "9184"
> [9,] "11858" "587" "-" "686"
> [10,] "3742" "-" "81" "415"
>> temp <- sapply( temp , as.numeric )
> Warning messages:
> 1: In lapply(X = X, FUN = FUN, ...) : NAs introduced by coercion
> 2: In lapply(X = X, FUN = FUN, ...) : NAs introduced by coercion
> 3: In lapply(X = X, FUN = FUN, ...) : NAs introduced by coercion
> 4: In lapply(X = X, FUN = FUN, ...) : NAs introduced by coercion
> 5: In lapply(X = X, FUN = FUN, ...) : NAs introduced by coercion
>> temp
> 647853 491400 38604 576 80845 36428 269915
> 647853 491400 38604 576 80845 36428 269915
> 224494 11858 3742 1413 1365 - -
> 224494 11858 3742 1413 1365 NA NA
> 21 27 587 - 587 - 57662
> 21 27 587 NA 587 NA 57662
> 40919 5505 20 10211 1007 32988 30554
> 40919 5505 20 10211 1007 32988 30554
> - 81 27897 20411 985 54 4494
> NA 81 27897 20411 985 54 4494
> 1953 12779 9184 686 415
> 1953 12779 9184 686 415
>> temp[ is.na( temp ) ] <- 0
>> temp
> 647853 491400 38604 576 80845 36428 269915
> 647853 491400 38604 576 80845 36428 269915
> 224494 11858 3742 1413 1365 - -
> 224494 11858 3742 1413 1365 0 0
> 21 27 587 - 587 - 57662
> 21 27 587 0 587 0 57662
> 40919 5505 20 10211 1007 32988 30554
> 40919 5505 20 10211 1007 32988 30554
> - 81 27897 20411 985 54 4494
> 0 81 27897 20411 985 54 4494
> 1953 12779 9184 686 415
> 1953 12779 9184 686 415
>
>
> 2013/5/2 Anthony Damico <ajdamico at gmail.com>
>
>> try adding colTypes = 'numeric' to your readWorkSheetFromFile() call
>>
>>
>>
>> if that doesn't work, try a few other steps
>>
>>
>> # view what data types your file is being read in as
>> sapply( temp , class )
>>
>>
>> # convert all fields to character if they're factor variables.. but i
>> don't think you need this, readWorksheet defaults to `character`
>> temp <- sapply( temp , as.character )
>>
>>
>> # you can also convert a subset like this
>> temp[ , c( 1 , 3:4 ) ] <- sapply( temp[ , c( 1 , 3:4 ) ] , as.character )
>>
>>
>>
>> # remove commas from character strings
>> temp <- sapply( temp , function( x ) gsub( ',' , '' , x ) )
>>
>> # convert all fields to numeric
>> temp <- sapply( temp , as.numeric )
>>
>> # convert all NA fields to zeroes if you prefer
>> temp[ is.na( temp ) ] <- 0
>>
>>
>>
>>
>>
>> On Wed, May 1, 2013 at 11:55 PM, jpm miao <miaojpm at gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Attached are two datasheet to be read.
>>> My raw data "130502temp.xlsx" contains numbers with ' symbols, and they
>>> can't be read as numbers. Even if I copy and paste as numbers to form a
>>> new
>>> file "130502temp_number1.xlsx", they could not be read smoothly.
>>>
>>> 1. How can I read the datasheet as numbers?
>>> 2. How can I treat the notation "-" as (1) "NA" or (2) zero?
>>>
>>> Thanks,
>>>
>>> Miao
>>>
>>>
>>>
>>>
>>>> temp<-readWorksheetFromFile("130502temp.xlsx", sheet=1, header=FALSE,
>>> startRow=2, endRow= 11, startCol=2, endCol=5)
>>>
>>>> temp
>>>
>>> Col1 Col2 Col3 Col4
>>>
>>> 1 647,853 1,413 57,662 27,897
>>>
>>> 2 491,400 1,365 40,919 20,411
>>>
>>> 3 38,604 - 5,505 985
>>>
>>> 4 576 - 20 54
>>>
>>> 5 80,845 21 10,211 4,494
>>>
>>> 6 36,428 27 1,007 1,953
>>>
>>> 7 269,915 587 32,988 12,779
>>>
>>> 8 224,494 - 30,554 9,184
>>>
>>> 9 11,858 587 - 686
>>>
>>> 10 3,742 - 81 415
>>>
>>>> temp[2,2]
>>>
>>> [1] "1,365"
>>>
>>>> temp[2,2]+3
>>>
>>> Error in temp[2, 2] + 3 : non-numeric argument to binary operator
>>>
>>>> temp_num<-readWorksheetFromFile("130502temp_number1.xlsx", sheet=1,
>>> header=FALSE, startRow=2, endRow= 11, startCol=2, endCol=5)
>>>
>>>> temp_num[2,2]
>>>
>>> [1] "1,365"
>>>
>>>> temp_num[2,2]+3
>>>
>>> Error in temp_num[2, 2] + 3 : non-numeric argument to binary operator
>>>
>>>> as.numeric(temp_num[2,2])+3
>>>
>>> [1] NA
>>>
>>> Warning message:
>>>
>>> NAs introduced by coercion
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list