Skip to content

Feature Extraction

Dehua Liang edited this page May 15, 2017 · 9 revisions

The clinical members of our group determined the relevant features for Atrial Fibrillation: R Peaks and P Waves. The rest of the features can be derived from this information.

Signal_W/_Features

You'll see we also were able to derive the q-s valleys and T wave, but for the purposes of our current model we ignore them. The old feature extraction that found them is in misc/waveOld.py.

Signal Object

We defined a signal class that has one init function to create signal objects. This way we don't need to keep passing in the relevant attributes to all our feature extraction functions, we can just pass around a signal object (Hooray OOP!).

In this simple initialization, we filter the signal (using biosppy), extract the R peaks, and find the RR interval bins based on a hardcoded interval.

def __init__(self,
             name,
             data,
             rr_bin_range=(234.85163198115271, 276.41687146297062)
            ):
    """
    Return a Signal object whose record name is *name*,
    signal data is *data*,
    RRInterval bin range is *mid_bin_range*
    """
    self.name = name
    self.sampling_rate = 300. # 300 hz

    self.data = wave.filterSignalBios(data)

    self.RPeaks = wave.getRPeaks(self.data, sampling_rate=self.sampling_rate)
    if np.mean([self.data[i] for i in self.RPeaks]) < 0: # signal is inverted
        self.data = -self.data

    self.baseline = wave.getBaseline(self)

    self.RRintervals = wave.interval(self.RPeaks)
    self.RRbinsN = wave.interval_bin(self.RRintervals, rr_bin_range)

Features

RR Intervals

The RR intervals are defined by the R Peaks, which we grab using biosppy. We then grab the intervals between each peak, and put them into bins based on length.

The bin edges are dervied from partitioned training data-specifically the normal records.

Residuals

The residuals are calculated by taking the difference between the original signal and the signal rebuilt from the wavelet coefficients eliminating D1 where most of the high frequency noise are contained.

The residual feature is useful to identify the noisy signals that usually have high frequency noise.

Summary Statistics of the Wavelet Coefficients

The wavelet coefficients are extracted by taking 5 levels of wavelet decomposition on the signal using the 'Sym5' wavelet. The summary statistics are then calculated for all level of the wavelet coefficients. These summary statistics include maximum, minimum, mean, standard deviation, variance, the mean of the square and the mean of the absolute value.

Baseline

Baseline detection is prett straightforward. We look in every RR window and find the flattest section. By flattest section I mean the voltage cannot vary by more than 0.04 from the mean of the section.

We then find the average of all the baseline sections and add it as an attribute to our signal object.

P waves

To find the p wave we implemented an original windowing algorithm using the R peaks. By going from one RR window to the next, we're able to isolate the third section in each respective window to look for the p wave.

def getPWaves(signal):
    
    maxesP = []
    
    for i in range(0, len(signal.RPeaks) - 1):
        left_limit = signal.RPeaks[i]
        right_limit = signal.RPeaks[i+1]
        left_limit = right_limit - (right_limit-left_limit)//3
        
        plotData = signal.data[left_limit:right_limit]
        peaks = detect_peaks(plotData, mpd=160) # super high mpd so it only gets the best peak
                        
        if peaks.size != 0:
            maxesP.append(left_limit + peaks[0]) # need to convert to original signal coordinates
        else:
            maxesP.append(left_limit) # if we can't find a p wave peak,
                                      # just grab the leftmost point in the window
        
    return np.asarray(maxesP)

We then can bin the P Heights as well, right? Not so fast! Some of the P waves are underneath our baseline. So we need to add the minimum p wave height to all of them to make sure they're positive.

    minPHeight = 1.17975561806
    self.PHeights = np.add(self.PHeights, minPHeight)

Now we can bin the P Heights and PP/PR intervals.