[R] Calculating distance matrix for large dataset

David Carlson dcarlson at tamu.edu
Fri May 3 15:36:23 CEST 2013


Here's the result on R 3.0.0 64 bit under Windows 8:

> A<-matrix(1:365000*144,nrow=365000,ncol=144)
> dim(A)
[1] 365000    144
> d <- dist(mydata_nor, method = "euclidean")
Error in as.matrix(x) : object 'mydata_nor' not found
> d <- dist(A, method = "euclidean")
Error: cannot allocate vector of size 496.3 Gb
In addition: Warning messages:
1: In dist(A, method = "euclidean") :
  Reached total allocation of 8078Mb: see help(memory.size)
2: In dist(A, method = "euclidean") :
  Reached total allocation of 8078Mb: see help(memory.size)
3: In dist(A, method = "euclidean") :
  Reached total allocation of 8078Mb: see help(memory.size)
4: In dist(A, method = "euclidean") :
  Reached total allocation of 8078Mb: see help(memory.size)

Your message suggests that your system could not accurately compute the
requirements. Unless you have access to a computer with 500 gigabytes, you
need to consider alternate approaches such as aggregating the data into
longer time blocks or using kmeans.

-------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77840-4352

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of HJ YAN
Sent: Thursday, May 2, 2013 6:02 PM
To: r-help at r-project.org
Subject: [R] Calculating distance matrix for large dataset

Dear R users


I wondered if any of you ever tried to calculate distance matrix with very
large data set, and if anyone out there can confirm this error message I got
actually mean that my data is too large for this task.

negative length vectors are not allowed


My data size and code used

 dim(mydata_nor)[1] 365000    144> d <- dist(mydata_nor, method =
"euclidean")



Here my data has 1000 samples each has a year data observed by 10 minutes
interval daily, so the size is  (365* 1000) * 144.


I checked the manual of function 'dist' but can not see the upper limit size
allowed, and I bet there should be one, so any hints is appreciated.


I would also be grateful if any other method for calculating distance matrix
for large dataset could be advised.



I appreciate reproducible code should be provided for your advice, so try
below if needed:

A<-matrix(1:365000*144,nrow=365000,ncol=144)> dim(A)[1] 365000    144>
d1<-dist(A,method="euclidean")Error in dist(A, method = "euclidean") :
  negative length vectors are not allowed




Many thanks in advance!

HJ

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list