Following a hiatus of a couple of years I have rejoined the competitors on kaggle. The UPenn and Mayo Clinic Seizure Detection Challenge had 8 days to run when I decided to participate. For the time I had available I'm quite pleased with my final score. I finished in 27th place with 0.93558. The metric used was area under the ROC curve, 1.0 is perfect and 0.5 being no better than random.
Prompted by a post from Zac Stewart I decided to give pipelines in scikit-learn a try. The data from the challenge consisted of electroencephalogram recordings from several patients and dogs. These subjects had different numbers of channels in their recordings, so manually implementing the feature extraction would have been very slow and repetitive. Using pipelines made the process incredibly easy and allowed me to make changes quickly.
The features I used were incredibly simple. All the code is in transformers.py - I used variance, median, and the FFT which I pooled into 6 bins. No optimization of hyperparameters was attempted before I ran out of time.
Next time, I'll be looking for a competition with longer to run.
This Saturday the DC Python group ran a coding meetup. As part of the event I ran an introduction to scientific computing for about 7 people.
After a quick introduction to numpy, matplotlib, pandas and scikit-learn we decided to pick a dataset and apply some machine learning. The dataset we decided to use was from a Kaggle competition looking at the Titanic disaster. This competition had been posted to help the community get started with machine learning so it seemed perfect.