NIH Pi Day 2017

At the NIH Pi Day celebration I gave a lightning talk on applying deep learning to histology images. A video of the event is now available at NIH Videocast.

During the one hour event there were presentations on ten different projects. I was the second speaker and began at 8:48.

I am using deep learning to identify glomeruli in kidney biopsies. When we are unsure about the specific type of kidney disease a patient has we take a small biopsy to look at the kidney. It is often differences in the glomeruli that define the type of disease. Pathologists study the biopsy to define the type of kidney disease. These skilled pathologists spend significant time locating the glomeruli. A machine can do this simple step. The pathologist can then focus on the harder disease identification task.

Transportation Techies: Capital Bikeshare TSP

The theme for the Transportation Techies event this month was Capital Bikeshare. This is the bike sharing service in Washington DC. Information is available on every trip and every station. Lots of analyses are possible with all this data. This event was the seventh on this theme.

I had not worked with geographical or transportation data before this so I learned a lot. I treated the stations as cities in the traveling salesperson problem. I then calculated the shortest path visiting all the stations.

I was able to do this using open data and open source software. This included customizing the calculation of distances for cycling.

The slides I presented include links to all the data and software used. The code I wrote is available on github. I include a Dockerfile for running the routing software with data for the Washington DC region.

Lightning talk slides on deep learning with keras

At the DCPython Office Hours event this month I gave a lightning talk on convolutional neural networks implemented with the keras library. The notebook is now up on github.

Deep neural networks are typically too slow to train on CPUs. Instead, GPUs are used. The example in the notebook uses a relatively small network so should be runnable on any hardware.

Lightning talk slides on web server log analysis with pandas

At the DCPython Office Hours event in May I gave a lightning talk on using pandas to analyse nginx access logs. The notebook is now up on github.

Seizure detection challenge on kaggle

Following a hiatus of a couple of years I have rejoined the competitors on kaggle. The UPenn and Mayo Clinic Seizure Detection Challenge had 8 days to run when I decided to participate. For the time I had available I'm quite pleased with my final score. I finished in 27th place with 0.93558. The metric used was area under the ROC curve, 1.0 is perfect and 0.5 being no better than random.

The code is now on github.

Prompted by a post from Zac Stewart I decided to give pipelines in scikit-learn a try. The data from the challenge consisted of electroencephalogram recordings from several patients and dogs. These subjects had different numbers of channels in their recordings, so manually implementing the feature extraction would have been very slow and repetitive. Using pipelines made the process incredibly easy and allowed me to make changes quickly.

The features I used were incredibly simple. All the code is in - I used variance, median, and the FFT which I pooled into 6 bins. No optimization of hyperparameters was attempted before I ran out of time.

Next time, I'll be looking for a competition with longer to run.