-
Notifications
You must be signed in to change notification settings - Fork 13
Feature Extraction
The clinical members of our group determined the relevant features for Atrial Fibrillation: R Peaks and P Waves. The rest of the features can be derived from this information.
You'll see we also were able to derive the q-s valleys and T wave, but for the purposes of our current model we ignore them. The old feature extraction that found them is in misc/waveOld.py.
We defined a signal class that has one init function to create signal objects. This way we don't need to keep passing in the relevant attributes to all our feature extraction functions, we can just pass around a signal object (Hooray OOP!).
In this simple initialization, we filter the signal (using biosppy), extract the R peaks, and find the RR interval bins based on a hardcoded interval.
def __init__(self,
name,
data,
rr_bin_range=(234.85163198115271, 276.41687146297062)
):
"""
Return a Signal object whose record name is *name*,
signal data is *data*,
RRInterval bin range is *mid_bin_range*
"""
self.name = name
self.sampling_rate = 300. # 300 hz
self.data = wave.filterSignalBios(data)
self.RPeaks = wave.getRPeaks(self.data, sampling_rate=self.sampling_rate)
if np.mean([self.data[i] for i in self.RPeaks]) < 0: # signal is inverted
self.data = -self.data
self.baseline = wave.getBaseline(self)
self.RRintervals = wave.interval(self.RPeaks)
self.RRbinsN = wave.interval_bin(self.RRintervals, rr_bin_range)
The RR intervals are defined by the R Peaks, which we grab using biosppy. We then grab the intervals between each peak, and put them into bins based on length.
The bin edges are dervied from partitioned training data-specifically the normal records.
The residuals are calculated by taking the difference between the original signal and the signal rebuilt from the wavelet coefficients eliminating D1 where most of the high frequency noise are contained.
The residual feature is useful to identify the noisy signals that usually have high frequency noise.
The wavelet coefficients are extracted by taking 5 levels of wavelet decomposition on the signal using the 'Sym5' wavelet. The summary statistics are then calculated for all level of the wavelet coefficients. These summary statistics include maximum, minimum, mean, standard deviation, variance, the mean of the square and the mean of the absolute value.
Baseline detection is prett straightforward. We look in every RR window and find the flattest section. By flattest section I mean the voltage cannot vary by more than 0.04 from the mean of the section.
We then find the average of all the baseline sections and add it as an attribute to our signal object.
To find the p wave we implemented an original windowing algorithm using the R peaks. By going from one RR window to the next, we're able to isolate the third section in each respective window to look for the p wave.
def getPWaves(signal):
maxesP = []
for i in range(0, len(signal.RPeaks) - 1):
left_limit = signal.RPeaks[i]
right_limit = signal.RPeaks[i+1]
left_limit = right_limit - (right_limit-left_limit)//3
plotData = signal.data[left_limit:right_limit]
peaks = detect_peaks(plotData, mpd=160) # super high mpd so it only gets the best peak
if peaks.size != 0:
maxesP.append(left_limit + peaks[0]) # need to convert to original signal coordinates
else:
maxesP.append(left_limit) # if we can't find a p wave peak,
# just grab the leftmost point in the window
return np.asarray(maxesP)
We then can bin the P Heights as well, right? Not so fast! Some of the P waves are underneath our baseline. So we need to add the minimum p wave height to all of them to make sure they're positive.
minPHeight = 1.17975561806
self.PHeights = np.add(self.PHeights, minPHeight)
Now we can bin the P Heights and PP/PR intervals.