SSTS Syntactic Search on Time Series

Motivation

Nowadays, with the current diversity of tools at their disposal, data scientists are capable of manipulating and extracting complex information from time series data. However, giving the current tools, data exploration and pattern search may require an extensive amount of time to develop methods that correspond to the data scientist reasoning in order to solve their queries. The development of new methods, tightly related with the reasoning and visual analysis of time series data is of great relevance to improve the pattern and query search tasks complexity and productivity.

In this work we propose a novel tool capable of exploring time series data for pattern and query search tasks in a set of 3 symbolic steps: Pre-Processing, Symbolic Connotation and Search. The framework is called SSTS (Symbolic Search in Time Series), and uses regular expression queries to search the desired patterns in a symbolic representation of the signal. With the use of a symbolic set of methods, this approach has the purpose of increasing the expressiveness in solving standard pattern and query tasks by enabling the creation of queries more closely related with the reasoning and visual analysis of the signal.

Scroll down to find some examples to guide you in exploring this novel approach in time series analysis and give it a try!

How does SSTS work?

The main purpose of this tool is to search for desired patterns on the signal. This approach profits of existing search mechanisms for text, namely, regular expressions. Regular expressions are patterns that can match specific sequences of characters, but to use them on time series, there is the need to transform the numerical signal into a symbolic sequence, that can then be examined with regular expressions. The process behind the SSTS tool involves three steps:

Pre-processing
Symbolic Connotation
Search

Each of these steps can be used to retrieve the desired information from the signal, but how? In this section you can thouroughly explore how each of the SSTS steps work.

PRE PROCESSING

Typically, in signal processing tasks, a pipeline of linear filters, moving window average filters and statistical de-noising or re-sampling techniques are a set of the most used procedures to prepare the signal for further tasks. The current approach uses a symbolic representation of these techniques, in which each is represented by its corresponding token, which can be a symbol or a function name. In order to manage the pre-processing tasks, a string containing a set of tokens and their corresponding arguments is written considering that the token precedes the corresponding argument(s), and each element is separated by a white-space character.

Here is a list of all the methods available and the associated characters:

Method

Tag

Token

High-Pass filter

Low-Pass filter

Band-Pass filter

Smooth

Normalization

Modulus

HP

LP

BP

Smt

Nrm

Abs

SYMBOLIC CONNOTATION

In semiotics, connotation is associated with the second meaning of a sentence, symbol or image. With the process symbolic connotation we are merging the concept of connotation to time series, in the sense that, in this case, the transformation of a sequence of numbers to symbols must be made by the visual interpretation that the user makes by looking at the flow of the signal in terms of its multiple attributes. With this step, the user chooses the attributes that can best characterize the signal over time. Based on each attribute the sample of the signal will be translated into a character. If multiple attributes are chosen, the sample will be translated from each of these attributes into a distinct character.

Here is a list of all the methods available and the associated characters:

Method

Tag

Token

Conversion

Amplitude Threshold

Amplitude Difference

First Derivative

Second Derivative

A

ADif

D1

D2

1, 0

1, 0

p, z, n

p, z, n

SEARCH

The search procedures uses a regular expression to match the pattern inside the string that is generated on the symbolic connotation step. For more information on regular expressions, you can visit: https://www.w3schools.com/jsref/jsref_obj_regexp.asp

Examples

Here you can find specific examples on how to use SSTS for time series analysis.

ECG peak detector
Straight Line Tracking
Stable Lifting Detection

Try SSTS

Sampling Frequency
You forgot the sampling frequency!
Channels

PRE PROCESSING

In this step you will be able to pre-process the loaded signal with standard methods, such as:

High Pass filter (HP or ☱)
Linear high pass filter

Band Pass filter (BP or ☲)
Linear band pass filter

Low Pass filter (LP or ☴)
Linear low pass filter

Smooth (Smt or ∼)
Smooth function to smooth response data with a moving average window of the desired size

Modulus (Abs or ∥)
Modulus of the signal. No entry parameters are needed

Normalization (Nrm or ⊚)
Gaussian normalization of the signal. No entry parameters are needed

Example of usage: Envelope of a signal: (⊚ ∥ ∼ 500)

SYMBOLIC CONNOTATION

With the sequence of connotation methods you can translate the numerical signal into a sequence of characters based on specific properties of the signal:

Amplitude threshold (A or ⇞)
Define the amplitude threshold (in percentage of the maximum amplitude) that divides the signal. The samples higher than the threshold turn into 1 while the rest become 0
Example of usage: Amplitude superior to 50% of the maximum: (⇞ 0.5)

Amplitude difference (ADif or ↕)
Similar to the previous connotation method, it is an amplitude threshold based on the difference of amplitudes. Therefore, samples that belong to a part of the signal with differences of amplitudes higher than the value specified by the threshold, will be 1, while the remaining will be 0
Example of usage: Amplitude difference superior to 50% of the maximum: (↕ 0.5)

First derivative (D1 or †)
Method that translates the signal into a sequence of characters based on its derivative, that is:
Rising: "p"
Falling: "n"
Stationary: "z"
The threshold in this case represents the value at which the sample can be approximated to zero (stationary).
Example of usage: Derivative of the signal: († 0.05)

Second derivative (D2 or ‡)
Uses the same approach of the previous one, but with the second derivative instead for curvature analysis
Example of usage: Second derivative of the signal: (‡ 0.05)


Area Plot

SEARCH

The search step uses a regular expression to find the events or the areas of the signal where a sequence of characters meets the pattern of the regular expression. Typicall commands of regular expression can be found, for instance:

* - the precedent item will be matched zero or more times
+ - the precedent item will be matched one or more times
? - the precedent item is optional and will be matched, at most, once
. - matches any character
|, & - boolean operators "or", "and"
?=< - positive lookbehind
?=! - negative lookbehind
?=< - positive lookahead

?! - negative lookahead

Example of usage:
Connotation: ⇞ 0.5
Search: 1+ (searches for all the areas with sequences of 1, that is, based on the connotation, the areas of the signal with an amplitude superior to 50% of the maximum of the signal)

About the Team

Hugo Gamboa

Principal Investigator

João Rodrigues

Associated Researcher

Duarte Folgado

Associated Researcher

David Belo

Associated Researcher