[R-SIG-Finance] Pattern Recognition / Classification in R for Financial Time Series

stefano iacus stefano.iacus at unimi.it
Sun Nov 2 23:55:45 CET 2008


Yes Ozkan is right.

as I answered privately to Ian, our proposal is not really the best  
choice for Ian's original problem (I've read his email too quickly,  
apologizes).

stefano
p.s. in any even I've updated sde package to include the MOdist (sde  
version 2.0.3)


On 02/nov/08, at 23:08, I. Ozkan wrote:

> This is a typical search problem that needs well defined similarity  
> measure
> (as Stefano quietly pointed out). For the 5 daily Open-High-Low- 
> Close type
> of series similarity measures based on some statistics/probability  
> may not
> work most of the time. There are several distance measures (L1,L2,
> Minkowski, Cosine, edit distance, Statistical distances, etc) one  
> can use to
> obtain similar patterns. The similarity is context dependent and you  
> should
> first select the proper one.
>
> As an example,
> Assume that 3 derived values are obtained by means of simply  
> dividing HLC to
> Open.
> HtoO, LtoO, CtoO (5 observations for each).
> Then for each sequence of these 3 series, simple Euclidean distance  
> can be
> calculated with other stocks. If these 3 characteristics are assumed  
> to be
> equal, just average the distances obtained. If not, try to find out  
> some
> weights.
> And finally the nearest neighbour(s) is (are) selected.
>
> In this example, we simply treat time series as multivariate  
> observations.
> This means that we assume the sequence itself does not carry important
> information, though exact sequences gives perfect similarity. But,  
> increase
> then decrease pattern has the same distance as decrease then increase
> pattern to flat pattern although they have high dissimilarity. If  
> these
> patterns are clustered, they are certainly assigned to different  
> clusters.
>
> For the longer sequences I might consider using Longest Common  
> Subsequence
> type metric. For quite long series, other similarity measures, such  
> as,
> Mutual Information, ARIMA, Markov Operators as Stefano proposed,
> coefficient, best 5-10 fft coefficients, or, some others like,
> Kullback-Leibler, Kolmogorov-Smirnov, Histogram Intersection etc are  
> found
> to be useful to identify the similar processes.
>
> As far as I know, most of the similarity measures are implemented in  
> R (try,
> machine learning, clustering, bioconductor-biodist) and they are  
> ready to
> use.
>
> As for the last suggestion, try something simple first, then  
> identify the
> problem (if any) of this approach, then try another and.... (Occam's  
> razor
> is your guide when selecting the approach).
>
> Good Luck!
>
> Ozkan...
>
> -----------
> Hi I was wondering if there are any good packages in R that would be
> useful in Time Series Pattern Recognition (3rd party software
> suggestions are also welcome!) .
>
> My search problem description is this: Given a specific 5 day OHLC
> sequence in a particular stock A, I want to scan through a list of
> stocks B, C, etc... and return another 5 day OHLC sequence which
> closely 'matches' my given sequence.
>
> The basic brute force algorithm which I'm working on currently is to
> normalize all 5 day sequences in my search universe and to calculate
> the differential in HL and return the top N patterns with the lowest
> differential value. If there are any elegant / intelligent ways to
> solve my problem, I would love to hear it! Thanks...
>
> Rgds
> Ian
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> R-SIG-Finance at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-finance
> -- Subscriber-posting only.
> -- If you want to post, subscribe first.



More information about the R-SIG-Finance mailing list