Library for shapelet extraction from time series using a genetic algorithm.
GENDIS extracts shapelets, i.e. a collection of subsequences, from a time series dataset that are very informative in classifying each of the time series into categories. GENDIS searches for this set of shapelets through evolutionary computation, a paradigm mostly unexplored within the domain of time series classification, which offers several benefits, namely (a) evolutionary algorithms are gradient-free, allowing for an easy configuration of the optimization objective, which does not need to be differentiable and allows escaping local optima more easily, (b) no brute-force search is required, making the algorithm scalable and allowing easy control over the run-time of the algorithm; (c) the total amount of shapelets and the length of each of these shapelets are evolved jointly with the shapelets themselves, alleviating the need to specify this beforehand; (iv) entire sets are evaluated at once as opposed to single shapelets, which results in smaller final sets with fewer similar shapelets that result in similar predictive performances; and (v) the discovered shapelets do not need to be a subsequence of the input time series. Moreover, the proposed technique has a computational complexity that is multiple orders of magnitude smaller than the current state-of-the-art while outperforming it in terms of predictive performance, with much smaller shapelet sets.