Nowadays, with the current diversity of tools at their disposal,
data scientists are capable of manipulating and extracting complex information from time series data.
However, giving the current tools, data exploration and pattern search may require an extensive
amount of time to develop methods that correspond to the data scientist reasoning in order to solve their queries.
The development of new methods, tightly related with the reasoning and visual analysis of time series data is of great relevance to improve
the pattern and query search tasks complexity and productivity.
In this work we propose a novel tool capable of exploring time series data for pattern and query search tasks in a set of 3 symbolic steps:
Pre-Processing, Symbolic Connotation and Search.
The framework is called SSTS (Symbolic Search in Time Series), and uses regular expression queries to search the desired patterns
in a symbolic representation of the signal. With the use of a symbolic set of methods, this approach has the purpose of increasing the expressiveness
in solving standard pattern and query tasks by enabling the creation of queries more closely related with the reasoning and visual analysis of the signal.
Scroll down to find some examples to guide you in exploring this novel approach in time series analysis and
give it a try!
The main purpose of this tool is to search for desired patterns on the signal. This approach profits of
existing search mechanisms for text, namely, regular expressions. Regular expressions are patterns
that can match specific sequences of characters, but to use them on time series, there is the need to transform the numerical
signal into a symbolic sequence, that can then be examined with regular expressions.
The process behind the SSTS tool involves three steps:
Pre-processing
Symbolic Connotation
Search
Each of these steps can be used to retrieve the desired information from the signal, but how?
In this section you can thouroughly explore how each of the SSTS steps work.
PRE PROCESSING
Typically, in signal processing tasks, a pipeline of linear filters,
moving window average filters and statistical de-noising or re-sampling
techniques are a set of the most used procedures to prepare the signal for
further tasks. The current approach uses a symbolic representation
of these techniques, in which each is represented by its corresponding token, which can
be a symbol or a function name. In order to manage the pre-processing tasks, a string
containing a set of tokens and their corresponding arguments is written considering that
the token precedes the corresponding argument(s), and each element is separated
by a white-space character.
Here is a list of all the methods available and the associated characters:
High-Pass filter
Low-Pass filter
Band-Pass filter
Smooth
Normalization
Modulus
SYMBOLIC CONNOTATION
In semiotics, connotation is associated with the second meaning of a sentence, symbol or image.
With the process symbolic connotation we are merging the concept of connotation to time series, in the sense
that, in this case, the transformation of a sequence of numbers to symbols must be made by the visual interpretation
that the user makes by looking at the flow of the signal in terms of its multiple attributes.
With this step, the user chooses the attributes that can best characterize the signal over time. Based on each attribute
the sample of the signal will be translated into a character. If multiple attributes are chosen, the sample will be translated
from each of these attributes into a distinct character.
Here is a list of all the methods available and the associated characters:
Amplitude Threshold
Amplitude Difference
First Derivative
Second Derivative
1, 0
1, 0
p, z, n
p, z, n