Following a hiatus of a couple of years I have rejoined the competitors on kaggle. The UPenn and Mayo Clinic Seizure Detection Challenge had 8 days to run when I decided to participate. For the time I had available I'm quite pleased with my final score. I finished in 27th place with 0.93558. The metric used was area under the ROC curve, 1.0 is perfect and 0.5 being no better than random.
Prompted by a post from Zac Stewart I decided to give pipelines in scikit-learn a try. The data from the challenge consisted of electroencephalogram recordings from several patients and dogs. These subjects had different numbers of channels in their recordings, so manually implementing the feature extraction would have been very slow and repetitive. Using pipelines made the process incredibly easy and allowed me to make changes quickly.
The features I used were incredibly simple. All the code is in transformers.py - I used variance, median, and the FFT which I pooled into 6 bins. No optimization of hyperparameters was attempted before I ran out of time.
Next time, I'll be looking for a competition with longer to run.