arxiv:2408.07292

LiPCoT: Linear Predictive Coding based Tokenizer for Self-supervised Learning of Time Series Data via Language Models

Published on Aug 14, 2024

Authors:

Abstract

LiPCoT, a novel tokenizer for time series data based on linear predictive coding, enables self-supervised learning using BERT and demonstrates superior performance in Parkinson's disease classification compared to CNN-based models.

AI-generated summary

Language models have achieved remarkable success in various natural language processing tasks. However, their application to time series data, a crucial component in many domains, remains limited. This paper proposes LiPCoT (Linear Predictive Coding based Tokenizer for time series), a novel tokenizer that encodes time series data into a sequence of tokens, enabling self-supervised learning of time series using existing Language model architectures such as BERT. Unlike traditional time series tokenizers that rely heavily on CNN encoder for time series feature generation, LiPCoT employs stochastic modeling through linear predictive coding to create a latent space for time series providing a compact yet rich representation of the inherent stochastic nature of the data. Furthermore, LiPCoT is computationally efficient and can effectively handle time series data with varying sampling rates and lengths, overcoming common limitations of existing time series tokenizers. In this proof-of-concept work, we present the effectiveness of LiPCoT in classifying Parkinson's disease (PD) using an EEG dataset from 46 participants. In particular, we utilize LiPCoT to encode EEG data into a small vocabulary of tokens and then use BERT for self-supervised learning and the downstream task of PD classification. We benchmark our approach against several state-of-the-art CNN-based deep learning architectures for PD detection. Our results reveal that BERT models utilizing self-supervised learning outperformed the best-performing existing method by 7.1% in precision, 2.3% in recall, 5.5% in accuracy, 4% in AUC, and 5% in F1-score highlighting the potential for self-supervised learning even on small datasets. Our work will inform future foundational models for time series, particularly for self-supervised learning.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2408.07292 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2408.07292 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2408.07292 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.