Preprocessing of Datasets¶

After installation a dataset needs some steps of preprocessing before being fed to down-stream pipelines. This tutorial shows how to do set up a pipeline of preprocessing for a built dataset. In general the pipeline consists of transformations at 3 consecutive levels:

File-level preprocessing: data selection, label modification, signal resampling & truncation etc.
Data-level preprocessing: feature extraction, windowed view, data augmentation etc.
Model-level preprocessing: modification to meet the specification of a machine learning model.

At the end of pipeline, the dataset will be transformed to some appropriate form for being used with ML models.

We use the dataset CWRU as example in this tutorial, but the same principle can be adapted to other datasets.

In [1]:

            
                Copied!
                
import tensorflow as tf
import tensorflow_datasets as tfds

import librosa
import librosa.display

import numpy as np
from matplotlib import pyplot as plt

from pathlib import Path

from IPython.display import Audio

import os

os.environ["CUDA_VISIBLE_DEVICES"] = "-1"  # disable GPU devices
os.environ["TFDS_DATA_DIR"] = os.path.expanduser("~/tensorflow_datasets")  # default location of tfds database
import tensorflow as tf
import tensorflow_datasets as tfds

import librosa
import librosa.display

import numpy as np
from matplotlib import pyplot as plt

from pathlib import Path

from IPython.display import Audio

import os

os.environ["CUDA_VISIBLE_DEVICES"] = "-1"  # disable GPU devices
os.environ["TFDS_DATA_DIR"] = os.path.expanduser("~/tensorflow_datasets")  # default location of tfds database

In [2]:

            
                Copied!
                
import dpmhm

from dpmhm.datasets import transformer, preprocessing, feature, utils
import dpmhm

from dpmhm.datasets import transformer, preprocessing, feature, utils

Load a built dataset¶

Suppose the dataset CWRU has been correctly installed, first we need to load the built dataset into memory. Note that the original dataset doesn't specify any split, and all data are contained in the field train of the built dataset.

In [3]:

            
                Copied!
                
dataset_name = 'CWRU'

ds_all, ds_info = tfds.load(
    dataset_name, 
    # data_dir='/home/han/Database/tensorflow_datasets/',
    # split=['train[:75%]', 'train[75%:]'],    
    with_info=True,
)

print(ds_all.keys())  # show the split of the raw dataset

ds0 = ds_all['train']
dataset_name = 'CWRU'

ds_all, ds_info = tfds.load(
    dataset_name, 
    # data_dir='/home/han/Database/tensorflow_datasets/',
    # split=['train[:75%]', 'train[75%:]'],    
    with_info=True,
)

print(ds_all.keys())  # show the split of the raw dataset

ds0 = ds_all['train']

dict_keys(['train'])

2023-04-18 10:14:09.587694: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:267] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2023-04-18 10:14:09.587726: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: Pluto
2023-04-18 10:14:09.587732: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: Pluto
2023-04-18 10:14:09.587821: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 530.41.3
2023-04-18 10:14:09.587839: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 530.41.3
2023-04-18 10:14:09.587844: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 530.41.3
2023-04-18 10:14:09.591256: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

The method .element_spec() shows the specification of data elements. Recall that a built dataset of dpmhm always contains the fields {'signal', 'sampling_rate', 'metadata'}.

In [4]:

            
                Copied!
                
ds0.element_spec
ds0.element_spec

Out[4]:

{'metadata': {'Dataset': TensorSpec(shape=(), dtype=tf.string, name=None),
  'FaultComponent': TensorSpec(shape=(), dtype=tf.string, name=None),
  'FaultLocation': TensorSpec(shape=(), dtype=tf.string, name=None),
  'FaultSize': TensorSpec(shape=(), dtype=tf.float32, name=None),
  'FileName': TensorSpec(shape=(), dtype=tf.string, name=None),
  'LoadForce': TensorSpec(shape=(), dtype=tf.uint32, name=None),
  'NominalRPM': TensorSpec(shape=(), dtype=tf.uint32, name=None),
  'RPM': TensorSpec(shape=(), dtype=tf.uint32, name=None)},
 'sampling_rate': TensorSpec(shape=(), dtype=tf.uint32, name=None),
 'signal': {'BA': TensorSpec(shape=(None,), dtype=tf.float32, name=None),
  'DE': TensorSpec(shape=(None,), dtype=tf.float32, name=None),
  'FE': TensorSpec(shape=(None,), dtype=tf.float32, name=None)}}

We note for CWRU that (not a general rule for other datasets of dpmhm)

metadata has a subfiled FaultLocation taking values in {'DriveEnd', 'FanEnd', 'None'}, which is the main label of the record.
The normal data can be equivalently identified by any of the following clauses: FaultLocation==None, FaultComponent==None, FaultSize==0.

Let's plot and play a sample from the dataset:

In [5]:

            
                Copied!
                
eles = list(ds0.take(10).as_numpy_iterator())

fn = eles[0]['metadata']['FileName'].decode()
sr = eles[0]['sampling_rate']
x = eles[0]['signal']['FE']

plt.figure(figsize=(20,5))
plt.plot(np.arange(len(x))/sr, x)
plt.xlabel('Time (s)')
plt.title(fn)

Audio(x, rate=sr)
eles = list(ds0.take(10).as_numpy_iterator())

fn = eles[0]['metadata']['FileName'].decode()
sr = eles[0]['sampling_rate']
x = eles[0]['signal']['FE']

plt.figure(figsize=(20,5))
plt.plot(np.arange(len(x))/sr, x)
plt.xlabel('Time (s)')
plt.title(fn)

Audio(x, rate=sr)

Out[5]:

File-level preprocessing¶

These are preprocessings applied at the level of original records (or data files).

Filter on the channels¶

CWRU contains 3 channels ['DE', 'FE', 'BA'], however some channels may be absent in certain files (e.g. the file '148.mat' contains only channels 'DE' and 'FE', not 'BA'). Given the list of desired channels, e.g. ['DE', 'FE'], a first preprocessing step consists in selecting only the files in which all desired channels are simultaneously present. In this way the number of channels will be fixed.

Ramification of labels¶

A dataset may come with some pre-defined labels, for example, the original labels of CWRU are ['None', 'DriveEnd', 'FanEnd'] (or the numeric value [0,1,2], contained in the field metadata['FaultLocation']), meaning that

there is no fault
the fault locates at the drive-end
the fault locates at the fan-end

We may want to incorporate more information and make new labels at a finer level, for example by taking into account the subfield FaultComponent and FaultSize of the field metadata:

FaultComponent: taking value in {'InnerRace', 'Ball', 'OuterRace3', 'OuterRace6', 'OuterRace12', 'None'}
FaultSize: taking value in {0.007, 0.014, 0.021, 0.028, 0}

Together with the original label, the new label becomes a triplet (FaultLocation, FaultComponent, FaultSize). For example, the triplet (1, 'InnerRace', 0.007) means the fault is located at the drive-end, on the inner-race component, and of size 0.007.

Resampling & sliding window view¶

The record in CWRU has variant duration (~ 10 or 2.5 seconds) and sampling rate (at 12000 or 48000 Hz), which may complicate the subsequent analysis. So we resample all signals at the fixed rate of 12000 Hz and trim them into windows of 1 second with hop size 0.5 second.

Selection of domain¶

We define the domain as the operating load force of a record. Note that in CWRU the field LoadForce (taking value in [0,1,2,3]) is one-to-one mapped to the nominal RPM NominalRPM (taking value in [797, 1772, 1750, 1730]). We want to test if an algorithm of detection trained on one domain can work also on another despite the discrepancy between domains (called domain-shift).

Here is a template of variables for the file-level preprocessing.

In [6]:

            
                Copied!
                
                    
                    
                
                

        
# Filter on the channels
channels = ['DE', 'FE', 'BA']  # containing all channels
# channels = ['DE']  # containing only the channel 'DE'

# Label ramification
keys = ['FaultLocation', 'FaultComponent', 'FaultSize']  # finest label
# keys = []  # original label

# Selection of domain
# filters = {'LoadForce': [0,1,2]}  # source domain
filters = {}  # no selection

# Resampling rate in Hz
resampling_rate = 12000
# resampling_rate = None  # no resampling

# Size of the sliding window (after resampling)
window_size = resampling_rate  # 1 second

# Number of samples to skip between sucessive window
hop_size = window_size//2
# Filter on the channels
channels = ['DE', 'FE', 'BA']  # containing all channels
# channels = ['DE']  # containing only the channel 'DE'

# Label ramification
keys = ['FaultLocation', 'FaultComponent', 'FaultSize']  # finest label
# keys = []  # original label

# Selection of domain
# filters = {'LoadForce': [0,1,2]}  # source domain
filters = {}  # no selection

# Resampling rate in Hz
resampling_rate = 12000
# resampling_rate = None  # no resampling

# Size of the sliding window (after resampling)
window_size = resampling_rate  # 1 second

# Number of samples to skip between sucessive window
hop_size = window_size//2

`DatasetCompactor` class¶

The class dpmhm.datasets.transformer.DatasetCompactor provides functionalities for the file-level preprocessing. Let's define the parameters of preprocessing and construct a compactor object.

In [7]:

            
                Copied!
                
from dpmhm.datasets import transformer

compactor = transformer.DatasetCompactor(ds0, channels=channels, keys=keys, filters=filters,
                                         resampling_rate=resampling_rate, 
                                         window_size=window_size, hop_size=hop_size)
from dpmhm.datasets import transformer

compactor = transformer.DatasetCompactor(ds0, channels=channels, keys=keys, filters=filters,
                                         resampling_rate=resampling_rate, 
                                         window_size=window_size, hop_size=hop_size)

The new construction has the property dataset which is the transformed dataset. Let's compare the signature of the new dataset to the original one:

In [8]:

            
                Copied!
                
ds1 = compactor.dataset

print("Original:", ds0.element_spec['signal'])
print("Compacted:", ds1.element_spec['signal'])
ds1 = compactor.dataset

print("Original:", ds0.element_spec['signal'])
print("Compacted:", ds1.element_spec['signal'])

WARNING:tensorflow:From /home/han/.pyenv/versions/dev310/lib/python3.10/site-packages/tensorflow/python/autograph/pyct/static_analysis/liveness.py:83: Analyzer.lamba_check (from tensorflow.python.autograph.pyct.static_analysis.liveness) is deprecated and will be removed after 2023-09-23.
Instructions for updating:
Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089

WARNING:tensorflow:From /home/han/.pyenv/versions/dev310/lib/python3.10/site-packages/tensorflow/python/autograph/pyct/static_analysis/liveness.py:83: Analyzer.lamba_check (from tensorflow.python.autograph.pyct.static_analysis.liveness) is deprecated and will be removed after 2023-09-23.
Instructions for updating:
Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089

Original: {'BA': TensorSpec(shape=(None,), dtype=tf.float32, name=None), 'DE': TensorSpec(shape=(None,), dtype=tf.float32, name=None), 'FE': TensorSpec(shape=(None,), dtype=tf.float32, name=None)}
Compacted: TensorSpec(shape=(3, None), dtype=tf.float32, name=None)

And take some elements from the new dataset:

In [9]:

            
                Copied!
                
eles1 = list(ds1.take(10).as_numpy_iterator())
eles0 = list(ds0.take(10).as_numpy_iterator())
eles1 = list(ds1.take(10).as_numpy_iterator())
eles0 = list(ds0.take(10).as_numpy_iterator())

As expected, the new field 'signal' is now a 2D-array of fixed shape (3 channels and 12000 points):

In [10]:

            
                Copied!
                
for n, X in enumerate(eles1):
    print(n, f"label={X['label'].decode()}, shape={X['signal'].shape}")
for n, X in enumerate(eles1):
    print(n, f"label={X['label'].decode()}, shape={X['signal'].shape}")

0 label=1c80dbfc87966d6e, shape=(3, 12000)
1 label=1c80dbfc87966d6e, shape=(3, 12000)
2 label=1c80dbfc87966d6e, shape=(3, 12000)
3 label=1c80dbfc87966d6e, shape=(3, 12000)
4 label=1c80dbfc87966d6e, shape=(3, 12000)
5 label=1c80dbfc87966d6e, shape=(3, 12000)
6 label=1c80dbfc87966d6e, shape=(3, 12000)
7 label=1c80dbfc87966d6e, shape=(3, 12000)
8 label=1c80dbfc87966d6e, shape=(3, 12000)
9 label=1c80dbfc87966d6e, shape=(3, 12000)

The new label is an apparently random string. The property compactor.full_label_dict allows to find out the meaning of new labels. It can be seen that the label dc8bcb86c369e78b corresponds to the category FaultLocation=2 (2 for 'FanEnd'), FaultComponent='InnerRace' and FaultSize=0.021. This property actually contains all possible labels that can be found in the dataset.

In [11]:

            
                Copied!
                
compactor.full_label_dict
compactor.full_label_dict

Out[11]:

{'7f67e45381c3652c': ['FanEnd', 'OuterRace3', '0.014'],
 '1c80dbfc87966d6e': ['FanEnd', 'InnerRace', '0.014'],
 'd8957867a1fc0519': ['DriveEnd', 'OuterRace3', '0.007'],
 '60836667e7ee1dec': ['FanEnd', 'OuterRace3', '0.007'],
 'd45bbeb3b8a72222': ['DriveEnd', 'OuterRace6', '0.007'],
 'b0a92d9d7379d8ce': ['DriveEnd', 'InnerRace', '0.007'],
 'e27e22f1f5037a20': ['DriveEnd', 'InnerRace', '0.014'],
 '605222dceca4b27e': ['FanEnd', 'Ball', '0.021'],
 'a760eef52ceaa6f9': ['FanEnd', 'OuterRace3', '0.021'],
 '35929ddfb4abb54d': ['DriveEnd', 'OuterRace12', '0.007'],
 '8af14bb8ad669337': ['FanEnd', 'OuterRace12', '0.007'],
 'dc8bcb86c369e78b': ['FanEnd', 'InnerRace', '0.021'],
 '29b26ba8d01e2e21': ['DriveEnd', 'InnerRace', '0.021'],
 '6c2ba36f712d55e4': ['DriveEnd', 'OuterRace6', '0.014'],
 '44bf66c5d8cd30e6': ['DriveEnd', 'OuterRace12', '0.021'],
 '2533c59036dfe8c8': ['FanEnd', 'InnerRace', '0.007'],
 '9c54396620a4b6a3': ['DriveEnd', 'OuterRace6', '0.021'],
 '8b9e80c02e1fca5b': ['FanEnd', 'OuterRace6', '0.007'],
 'a07ca3c82afe8ae3': ['FanEnd', 'OuterRace6', '0.021'],
 'c3e7af63b24b636c': ['DriveEnd', 'OuterRace3', '0.021'],
 'f6ab3549af9ee45a': ['DriveEnd', 'Ball', '0.014'],
 'd766ecd2592ce5ec': ['DriveEnd', 'Ball', '0.021'],
 '55503c950ed81973': ['FanEnd', 'Ball', '0.014'],
 'd6765bfdf1aca38f': ['DriveEnd', 'Ball', '0.007'],
 '5feef6a8996ff730': ['FanEnd', 'OuterRace6', '0.014'],
 'd6de9a13f405da29': ['FanEnd', 'Ball', '0.007'],
 'ba953077ae4270a9': ['DriveEnd', 'InnerRace', '0.028'],
 '0881fa248109963a': ['None', 'None', '0.0'],
 '0a6027a81f7b3b19': ['DriveEnd', 'Ball', '0.028']}

We notice also that the several first elements in the new dataset all have the same metadata: this is of cause due to the sliding window view on the same data file.

In [12]:

            
                Copied!
                
for X in eles1:
    print(X['metadata'])
for X in eles1:
    print(X['metadata'])

{'Dataset': b'CWRU', 'FaultComponent': b'InnerRace', 'FaultLocation': b'FanEnd', 'FaultSize': 0.014, 'FileName': b'276.mat', 'LoadForce': 2, 'NominalRPM': 1750, 'RPM': 1755}
{'Dataset': b'CWRU', 'FaultComponent': b'InnerRace', 'FaultLocation': b'FanEnd', 'FaultSize': 0.014, 'FileName': b'276.mat', 'LoadForce': 2, 'NominalRPM': 1750, 'RPM': 1755}
{'Dataset': b'CWRU', 'FaultComponent': b'InnerRace', 'FaultLocation': b'FanEnd', 'FaultSize': 0.014, 'FileName': b'276.mat', 'LoadForce': 2, 'NominalRPM': 1750, 'RPM': 1755}
{'Dataset': b'CWRU', 'FaultComponent': b'InnerRace', 'FaultLocation': b'FanEnd', 'FaultSize': 0.014, 'FileName': b'276.mat', 'LoadForce': 2, 'NominalRPM': 1750, 'RPM': 1755}
{'Dataset': b'CWRU', 'FaultComponent': b'InnerRace', 'FaultLocation': b'FanEnd', 'FaultSize': 0.014, 'FileName': b'276.mat', 'LoadForce': 2, 'NominalRPM': 1750, 'RPM': 1755}
{'Dataset': b'CWRU', 'FaultComponent': b'InnerRace', 'FaultLocation': b'FanEnd', 'FaultSize': 0.014, 'FileName': b'276.mat', 'LoadForce': 2, 'NominalRPM': 1750, 'RPM': 1755}
{'Dataset': b'CWRU', 'FaultComponent': b'InnerRace', 'FaultLocation': b'FanEnd', 'FaultSize': 0.014, 'FileName': b'276.mat', 'LoadForce': 2, 'NominalRPM': 1750, 'RPM': 1755}
{'Dataset': b'CWRU', 'FaultComponent': b'InnerRace', 'FaultLocation': b'FanEnd', 'FaultSize': 0.014, 'FileName': b'276.mat', 'LoadForce': 2, 'NominalRPM': 1750, 'RPM': 1755}
{'Dataset': b'CWRU', 'FaultComponent': b'InnerRace', 'FaultLocation': b'FanEnd', 'FaultSize': 0.014, 'FileName': b'276.mat', 'LoadForce': 2, 'NominalRPM': 1750, 'RPM': 1755}
{'Dataset': b'CWRU', 'FaultComponent': b'InnerRace', 'FaultLocation': b'FanEnd', 'FaultSize': 0.014, 'FileName': b'276.mat', 'LoadForce': 2, 'NominalRPM': 1750, 'RPM': 1755}

As expected the first two records of the new dataset has a common part (plot in red) of length hop_size:

In [13]:

            
                Copied!
                
x0, x1 = eles1[0]['signal'][0], eles1[1]['signal'][0]

fig, axes = plt.subplots(1,2, figsize=(20,5))
axes[0].plot(np.arange(len(x0)//2), x0[:len(x0)//2])
axes[0].plot(np.arange(len(x0)//2, len(x0)), x0[len(x0)//2:], 'r')

axes[1].plot(np.arange(len(x1)//2), x1[:len(x1)//2], 'r')
axes[1].plot(np.arange(len(x1)//2, len(x1)), x1[len(x1)//2:])

# axes[1].plot(x1); axes[1].plot(x1[:len(x1)//2], 'b')

assert np.allclose(x0[hop_size:], x1[:hop_size])
x0, x1 = eles1[0]['signal'][0], eles1[1]['signal'][0]

fig, axes = plt.subplots(1,2, figsize=(20,5))
axes[0].plot(np.arange(len(x0)//2), x0[:len(x0)//2])
axes[0].plot(np.arange(len(x0)//2, len(x0)), x0[len(x0)//2:], 'r')

axes[1].plot(np.arange(len(x1)//2), x1[:len(x1)//2], 'r')
axes[1].plot(np.arange(len(x1)//2, len(x1)), x1[len(x1)//2:])

# axes[1].plot(x1); axes[1].plot(x1[:len(x1)//2], 'b')

assert np.allclose(x0[hop_size:], x1[:hop_size])

Data-level preprocessing¶

Data-level preprocessing aims at transforming the original waveform data to more advanced representation (e.g. spectrogram), and are applied after file-level preprocessings. Hereafter is some examples of data-level preprocessing.

Waveform augmentation¶

Currently not supported by dpmhm, see:

https://github.com/facebookresearch/WavAugment

Feature extraction¶

The following feature trasnformations are implemented in dpmhm:

Time-Frequency representation: spectrogram, MFCC etc.
Cyclostationary representation
Time-Scale representation: scalogram, EMD etc.

The output is a 2D representation of the original waveform. Implemented by FeatureExtractor class.

Sliding window view¶

Take a sliding window view on the 2D feature representation. The window has a fixed shape and moves at regular step in the 2D representation. Implemented by WindowSlider class.

Spectrogram augmentation¶

Perform random agumentations on a 2D feature representation. As the sliding window view, the output is also a 2D patch of fixed dimension. Implemented by SpecAugment class.

In the next we show how to apply data-level preprocessing. We start by some file-level preprocessing, including channel selection, label ramification and resampling:

In [14]:

            
                Copied!
                
# File-level parameters
channels = ['DE', 'FE', 'BA']

keys = ['FaultLocation', 'FaultComponent', 'FaultSize']  # finest label

compactor = transformer.DatasetCompactor(ds0, channels=channels, keys=keys, resampling_rate=12000)
# File-level parameters
channels = ['DE', 'FE', 'BA']

keys = ['FaultLocation', 'FaultComponent', 'FaultSize']  # finest label

compactor = transformer.DatasetCompactor(ds0, channels=channels, keys=keys, resampling_rate=12000)

`FeatureExtractor` class¶

This class allows to transform the original 1d waveform to some 2d representation. It takes as input the preprocessed dataset at file-level and a callable of feature extractor, e.g. spec or mfcc.

In [19]:

            
                Copied!
                
from dpmhm.datasets import feature

# Feature extractor
_func = lambda x, sr: feature.spectral_features(x, sr, 'spectrogram', 
                                                time_window=0.025, hop_step=0.0125, normalize=False,
                                                to_db=True)[0]

# _func = lambda x, sr: feature.spectral_features(x, sr, 'melspectrogram', 
#                                                 time_window=0.025, hop_step=0.0125, normalize=False,
#                                                 feature_kwargs={'n_mels':128})[0]

extractor = transformer.FeatureExtractor(compactor.dataset, _func)
from dpmhm.datasets import feature

# Feature extractor
_func = lambda x, sr: feature.spectral_features(x, sr, 'spectrogram', 
                                                time_window=0.025, hop_step=0.0125, normalize=False,
                                                to_db=True)[0]

# _func = lambda x, sr: feature.spectral_features(x, sr, 'melspectrogram', 
#                                                 time_window=0.025, hop_step=0.0125, normalize=False,
#                                                 feature_kwargs={'n_mels':128})[0]

extractor = transformer.FeatureExtractor(compactor.dataset, _func)

The feature-transformed dataset is again contained in the property .dataset. Let's take some elements from it.

In [20]:

            
                Copied!
                
ds2 = extractor.dataset

eles2 = list(ds2.take(10).as_numpy_iterator())
ds2 = extractor.dataset

eles2 = list(ds2.take(10).as_numpy_iterator())

The feature transformed dataset shares the same label and metadata of the compacted dataset:

In [21]:

            
                Copied!
                
ds1 = compactor.dataset
eles1 = list(ds1.take(10).as_numpy_iterator())

assert eles2[0]['metadata'] == eles1[0]['metadata']

assert eles2[0]['label'] == eles1[0]['label']
ds1 = compactor.dataset
eles1 = list(ds1.take(10).as_numpy_iterator())

assert eles2[0]['metadata'] == eles1[0]['metadata']

assert eles2[0]['label'] == eles1[0]['label']

But the field signal is now replaced by feature which is the spectrogram of the original signal. It is a fixed-size 3D array with axes (channel, frequency, time).

In [22]:

            
                Copied!
                
eles2[2]['feature'].shape
eles2[2]['feature'].shape

Out[22]:

(3, 257, 816)

Let's show a sample from the feature transformed dataset.

In [23]:

            
                Copied!
                
X = eles2[9]
plt.matshow(X['feature'][0], origin='lower')
plt.xlabel('Time bin')
plt.ylabel('Frequency bin')
X = eles2[9]
plt.matshow(X['feature'][0], origin='lower')
plt.xlabel('Time bin')
plt.ylabel('Frequency bin')

Out[23]:

Text(0, 0.5, 'Frequency bin')

`WindowSlider` class¶

This class allows to get a sliding window view of a 2D representation. It takes as input the dataset of some feature extractor and apply a rectangle window with regular step. The output is the data on the window hence has fixed dimension.

In [24]:

            
                Copied!
                
window = transformer.WindowSlider(extractor.dataset, window_size=(64,64), hop_size=(32,32))

ds3 = window.dataset

eles3 = list(ds3.take(10).as_numpy_iterator())
window = transformer.WindowSlider(extractor.dataset, window_size=(64,64), hop_size=(32,32))

ds3 = window.dataset

eles3 = list(ds3.take(10).as_numpy_iterator())

In [25]:

            
                Copied!
                
X = eles3[9]
plt.matshow(X['feature'][0], origin='lower')
plt.xlabel('Time bin')
plt.ylabel('Frequency bin')
X = eles3[9]
plt.matshow(X['feature'][0], origin='lower')
plt.xlabel('Time bin')
plt.ylabel('Frequency bin')

Out[25]:

Text(0, 0.5, 'Frequency bin')

`SpecAugment` class¶

This class performs random augmentations on a feature dataset:

crop a random rectangle patch of the spectrogram
randomly flip the time axis
randomly blur the spectrogram
randomly fade along the time axis

These augmentations are applied in order and independently with probability (for step 2,3,4, step 1 is always applied). Note that the random patch in step 1 actually consists in resizing a random rectangle area satisfying some prescribed area & aspect constraints to a desired shape, therefore is different from the sliding window view of WindowSlider class which is just a simple view (without resizing).

In [26]:

            
                Copied!
                
specaug = transformer.SpecAugment(extractor.dataset, output_shape=(64, 64))

ds4 = specaug.dataset

eles4 = list(ds4.take(20).as_numpy_iterator())
specaug = transformer.SpecAugment(extractor.dataset, output_shape=(64, 64))

ds4 = specaug.dataset

eles4 = list(ds4.take(20).as_numpy_iterator())

In [27]:

            
                Copied!
                
X = eles4[1]
plt.matshow(X['feature'][0], origin='lower')
plt.xlabel('Time bin')
plt.ylabel('Frequency bin')
X = eles4[1]
plt.matshow(X['feature'][0], origin='lower')
plt.xlabel('Time bin')
plt.ylabel('Frequency bin')

Out[27]:

Text(0, 0.5, 'Frequency bin')

Let's single out the random crop in step 1. The following code shows its actions under the hood. The output is the resize of a random rectangle area (in red) to a desired shape.

In [32]:

            
                Copied!
                
X = eles2[1]['feature']

import matplotlib.patches as patches

Y, hh, ww = dpmhm.datasets.augment.random_crop(X, (64,64), area_ratio=(0.01,1.), aspect_ratio=(1/2,2), channel_axis=0)
print(hh, ww)

plt.figure(); ax=plt.gca()
ax.imshow(X[0])
rect = patches.Rectangle((ww[0], hh[0]), ww[1], hh[1], linewidth=1, edgecolor='r', facecolor='none')
ax.add_patch(rect)

plt.figure()
plt.imshow(Y[0])
X = eles2[1]['feature']

import matplotlib.patches as patches

Y, hh, ww = dpmhm.datasets.augment.random_crop(X, (64,64), area_ratio=(0.01,1.), aspect_ratio=(1/2,2), channel_axis=0)
print(hh, ww)

plt.figure(); ax=plt.gca()
ax.imshow(X[0])
rect = patches.Rectangle((ww[0], hh[0]), ww[1], hh[1], linewidth=1, edgecolor='r', facecolor='none')
ax.add_patch(rect)

plt.figure()
plt.imshow(Y[0])

(22, 109) (547, 205)

Out[32]:

<matplotlib.image.AxesImage at 0x7fcd5049af50>

Model-level preprocessing¶

After the data-level preprocessing, the original data are transformed into appropriate format and dimension and can be used for training ML models. The model-level preprocessing is the ultimate adaptation of data to the specification of a ML model, and may include

conversion of label: from string to integer
conversion of channel: from the format of channel first to channel last
creation of paired view
load data by mini-batches & normalization
...

More about model-level preprocessing can be found in notebooks of ML topics.