Software libraries such as esig facilitate analysing stream data using rough path theory. The following code examples illustrate how to apply signature-based techniques to real-world problems.
Path Signatures: An introduction by example
This notebook is an introduction to path signatures aimed at data scientists and machine learning practitioners. The focus of the notebook is on allowing the practitioner to familiarise themselves with key concepts by example: Using simple illustrative examples, the notebook introduces paths and streams, before defining the path signature and exploring its behaviour using example paths. Finally, the notebook covers the concept of applying various stream transformations as a means of modifying the signature's properties.
Early sepsis detection
Being able to predict accurately whether sepsis will occur in a patient would dramatically improve patient outcomes. With the aim of predicting sepsis, this notebook showcases the use of path signatures as features for training a classifier on electronic health data. The data used to train the model in this notebook are the sequences of physiological and laboratory-observed measurements contained in the MIMIC-III dataset. These data include e.g. patients' heart rates, temperatures, and oxygen saturation levels, recorded repeatedly over time for each patient. The task is to use the classifier to predict whether a given patient will go on to develop sepsis, based on their measurement sequences recorded up to the time of prediction.
Landmark-based human action recognition
In landmark-based human action recognition we are given sequences of points (the landmarks) representing the positions of some of the major human body parts over time. The sequence shows a person performing an actions, as can be seen in the animation. The task is to train a classifier to label the sequences with their action class. Our path signature methodology for this task makes use of various path transformations and the path signature to create a robust feature set for which we are able to learn a linear classifier which is competitive with the state of the art. In this demo notebook we present a simplified version, which can be trained to a good performance within a few minutes on a laptop cpu for ease of presentation.
Handwritten digit classification
This notebook demonstrates the use of path signatures for handwritten digit classification. Given sequences of pen strokes contained in the MNIST dataset, we compute path signatures which we incorporate as features into a linear classifier. Our approach demonstrates the use of the cumulative sum transformation to represent information about individual pen strokes in digits. As alternative approaches, the notebook explores combining the features with unsupervised classification techniques.
In this notebook we construct a classifier for identifying drones. Our assumption is that when we reflect a radio pulse off a drone, the reflected signal received back by the observer is a combination of the reflection caused by the drone's body and the reflection caused by the drone's propeller. We compute path signatures for several thousand simulated radio pulses reflected off drone objects with varying propeller locations, before averaging the path signatures. This approach of using expected path signatures aims at characterising the random behaviour in reflected signals. Taking estimates of expected path signatures as our feature vectors, we consider the task of distinguishing between drone and non-drone objects, in addition to predicting the number of rotations per minute (RPM) of the drone's propeller.
Neural controlled differential equations
Neural controlled differential equations are a continuous-time extension of recurrent neural networks which achieve state-of-the-art performance for modelling functions of irregular time series. This notebook demonstrates the use of the torchcde package for time series classification and using the log-ode method.
Natural language processing
Written text is an example of a high-dimensional data stream. Highly successful and popular machine learning approaches involving such forms of natural language are neural network architectures called transformers. In this notebook, we showcase how to harness the capabilities of path signatures and transformers. Specifically, we consider the challenge of determining whether a text was written by an author that we have not yet encountered in a corpus of unlabelled texts.