This was the third BarcampNortheast event I have attended. Each has been slightly different but they have all been a weekend well spent. This year felt a little smaller than previous years but that may have partly been because we were in a bigger space.
I have been attending the python Edinburgh meetups for a while. They have always been interesting and the Northwest meetup this Thursday was the first since I moved back to the Northwest. The format, alternating talks and coding sessions, is different to Edinburgh, regular pub meetups with irregular talks, coding sessions and miniconferences. It was an interesting crowd and the other talks, on Apache Thrift and teaching programming to GCSE students (15-16 year olds), gave a really good variety of subjects to discuss later.
Last weekend the Python Edinburgh users group hosted a mini-conference. Saturday morning was kicked off with a series of talks followed by sessions introducing and then focusing on contributing to django prior to sprints which really got going on the Sunday.
The slides for my talk on, "Images and Vision in Python" are now available in pdf format here.
The slide deck I used is relatively lightweight with my focus being on demonstrating using the different packages available. The code I went through is below.
fromPILimportImage#Open an image and show itpil1=Image.open('filename')pil1.show()#Get its sizepil1.size#Resizepil1s=pil1.resize((100,100))#or - thumbnailpil1.thumbnail((100,100),Image.ANTIALIAS)#New imagebg=Image.new('RGB',(500,500),'#ffffff')#Two ways of accessing the pixels#getpixel/putpixel and load#load is fasterpix=bg.load()forainrange(100,200):forbinrange(100,110):pix[a,b]=(0,0,255)bg.show()#Drawing shapes is slightly more involvedfromPILimportImageDrawdraw=ImageDraw.Draw(bg)draw.ellipse((300,300,320,320),fill='#ff0000')bg.show()fromPILimportImageFontfont=ImageFont.truetype("/usr/share/fonts/truetype/freefont/FreeSerif.ttf",72)draw.text((10,10),"Hello",font=font,fill='#00ff00')bg.show()#Demo's for visionfromscipyimportndimageimportmahotas#Create a sample imagev1=np.zeros((10,10),bool)v1[1:4,1:4]=Truev1[4:7,2:6]=Trueimshow(v1,interpolation="Nearest")imshow(mahotas.dilate(v1),interpolation="Nearest")imshow(mahotas.erode(v1),interpolation="Nearest")imshow(mahotas.thin(v1),interpolation="Nearest")#Opening, closing and top-hat as combinations of dilate and erode#Labeling#Latest version of mahotas has a label funcv1[8:,8:]=Trueimshow(v1)labeled,nr_obj=ndimage.label(v1)nr_objimshow(labeled,interpolation="Nearest")pylab.jet()#Thresholding#Convert a grayscale image to a binary imagev2=mahotas.imread("/home/jonathan/openplaques/blueness_images/1.jpg")T=mahotas.otsu(v2)imshow(v2)imshow(v2>T)#Distance Transformsdist=mahotas.distance(v2>T)imshow(dist)
I'm writing this on the train back from Newcastle after attending this years Maker Faire. I've attended each Newcastle Maker Faire and it's been fantastic witnessing it grow each year. Many of the groups displaying their projects are veterans of previous Faires and it's inspiring seeing their projects develop from one year to the next. A growing faire means new groups and although some themes are repeated many groups have truly unique projects.
As last year I've put together a short video capturing some of the activity at this maker faire. It's difficult capturing more than a sliver of what makes this event great so I encourage you to click through to the project websites. I'll link to as many as I can below the video over the next few days but for the moment would encourage you to visit the official website.
I can't find a link for the roving wheelie bins but more information on the fire breathing dragon is available here.
I can't find a link for the first robot. The second robot was from mbed. The third robot was part of a very large exhibit but again I'm struggling to find a link. The fourth scene of robots were from robochallenge. The final ground based bot was from robosavvy.
The underwater bot was from underwater rov. The aerial photography using a model plane was done by Simon Clark. The rocketry was from Black Streak.
It was difficult keeping track of all the 3D printers so I'll just highlight two focusing specifically on 3D printers - bodgeitquick and emakershop.
The standing wave flame tube was from Steve Mould. The wind up music disc was from the North of England Arduino Group organised by Mike Cook. I'm not sure who was responsible for the heart beat light sculpture. The interactive light table was built by Oli and the digital grafetti wall was built by the Jam Jar Collective and the musical tesla coils were from Brightarcs.
I've been using MDP and matplotlib a lot recently and although overall I've been very pleased with the documentation for both projects I have run into a few problems for which the solutions were not immediately obvious. This post gives the solution for each in the expectation it will certainly be useful to me in the future and the hope that it may also be useful to others.
Principal Component Analysis with MDP
The tutorial for the Modular Toolkit for Data Processing (MDP) starts with a quick example of using the toolkit for a pca analysis and yet I still ran into a couple of problems. The first issue I had was how the pca function expects to receive data. I suspect this is simply due to unfamiliarity with the field and the language used within the field. For future reference the data is expected to be in the following format.
Experimental Condition 1
Experimental Condition 2
Variance Accounted For in PC1, 2, etc
The previously mentioned quick start tutorial was very useful in getting something useful out quickly but I couldn't find a way to get a value for how much of the variance present in the data was accounted for in the principal components. To get that, as far as I've been able to determine, you need to interact with the PCANode directly rather than using the convenience function. The code is still relative straightforward.
importmdpimportnumpyasnpimportmatplotlib.pyplotasplt#Create sample datavar1=np.random.normal(loc=0.,scale=0.5,size=(10,5))var2=np.random.normal(loc=4.,scale=1.,size=(10,5))var=np.concatenate((var1,var2),axis=0)#Create the PCA node and train itpcan=mdp.nodes.PCANode(output_dim=3)pcar=pcan.execute(var)#Graph the resultsfig=plt.figure()ax=fig.add_subplot(111)ax.plot(pcar[:10,0],pcar[:10,1],'bo')ax.plot(pcar[10:,0],pcar[10:,1],'ro')#Show variance accounted forax.set_xlabel('PC1 (%.3f%%)'%(pcan.d))ax.set_ylabel('PC2 (%.3f%%)'%(pcan.d))plt.show()
Running this code produces an image similar to the one below.
Growing neural gas with MDP
The growing neural gas implementation was another sample application highlighted in the tutorial for MDP. It held my interest for a while as a technique which could potentially be applied to the transcription of plaques for the openplaques project. It wasn't immediately obvious how to get the position of a node from a connected nodes object. As the tutorial left the details of visualisation up to the user I'll present the solution to getting the node location in the form of the necessary code to visualise the node training. The end result will look something like the following.
I've been using Matplotlib to plot data exclusively for a while now. The defaults produce reasonable quality graphs and any differences in opinion can be quickly fixed either by altering options in matplotlib or, as the graphs can be saved in svg format, in a vector image manipulation program such as Inkscape. Although most options can be changed in matplotlib it can sometimes be difficult to find the correct option. Most of the time the naming of variables are, to my mind, logical but sometimes I just can't find the right way to describe what I want to do.
I wanted to have a grid of 6 graphs but didn't want to display the axes on all the graphs as I felt this looked cluttered.
Fixing the axis range
If I was going to display the axes on only some of the graphs then the values for the axes needed to be the same on all of them.
importnumpyasnpimportmatplotlib.pyplotasplt#Generate sample datavar=np.random.random_sample((40,2))fig=plt.figure()foriinrange(4):ax=fig.add_subplot(220+i+1)start=i*10ax.plot(var[start:start+10,0],var[start:start+10,1],'bo')#Hide the x axis on the top row of chartsifiin[0,1]:ax.set_xticklabels(ax.get_xticklabels(),visible=False)#Hide the y axis on the right column of chartsifiin[1,3]:ax.set_yticklabels(ax.get_yticklabels(),visible=False)#Set the axis rangeax.axis([0,1,0,1])plt.show()
Running this code should produce an image similar to the one below.
Removing second point in plot legend
The legend assumes that values are connected so two points and the connecting line are shown by default. If the points on the graph aren't connected then this looked strange. To remove the duplicate symbol is straightforward.
importnumpyasnpimportmatplotlib.pyplotasplt#Generate sample datavar=np.random.random_sample((10,2))#Plot data with labelsfig=plt.figure()ax=fig.add_subplot(111)ax.plot(var[0:5,0],var[0:5,1],'bo',label="First half")ax.plot(var[5:10,0],var[5:10,1],'r^',label="Second half")ax.legend(numpoints=1)plt.show()
A little over two months ago I wrote about the first round of the AI cookbook competition. Since then there have been two further rounds and a considerable amount of further progress. For the latest round I was able to get the error score down to 10.867 using an additional image pre-processing step and then a variety of text clean-up improvements.
Ian, who writes the AI Cookbook, had the theory that the curved text present at the top of many of the plaques in the test set were causing tesseract, our OCR software of choice, significant problems in transcribing the main text. If we could automatically recognise the curved text and block it out the transcription should be significantly improved. In the diagram below the text we want to be transcribed is in green and the text we don't want is in red.
I couldn't think of a good method to actually recognise the curved text at the top so decided to use a 'dumb' approach. The curved text is in the same place on all the plaques so I built a system to apply the same mask to all the images. To do this I went back to what I could still remember from high school math lessons. To the probable delight of my old math teachers I quickly had some working code. The code I wrote cycles through all the pixels in the image and converts them to a distance and angle relative to the centre of the image. This process is hopefully easier to visualise in the image below. The distance is simple enough to calculate as we're dealing with a right-angle triangle; we simply square the x and y values, add them together and take the square root. The angle is a little trickier. The y-value represents the opposite length of the triangle and the x-value represents the adjacent length so from the mnemonic SOH CAH TOA we known the angle will be tan-1 (O/A). Knowing that we can then apply our rules for distance and angle.
The text clean-up was lots of little steps. Briefly I've,
Made various improvements to the regexes for cleaning up the years
Converted any instances of 'vv' (two v's) to 'w' (one w)
Switched 0 (zero) to o (letter o) in words
Removed any one/two character tokens from the end of the string
Improved the selection of suggestions from the spell checker
Broken up long words to see if a valid word can be found in the two halves
Changed "s to 's
Improved correction for endings where the ending is lived|worked|died here and the spelling checker returns bad results
Removed any words containing three of lowercase, uppercase, digits and punctuation.
The regex for that last item is something of a monstrosity and as I'm far from an expert it wouldn't surprise me if it doesn't entirely do what I think it does. I've used whitespace to make it slightly easier to follow. Each line represents a sub-expression, if any sub-expression matches the string then the expression as a whole is considered to match. Each line matches a different combination of three from digits, lowercase, uppercase and punctuation. The .+ at the end means we match one or more of any character. The expressions in brackets starting with a question mark are look ahead assertions. The .+ still matches any character but the look ahead assertions state that at least one of the characters matched must be a digit for instance. It doesn't matter in what order the characters are present as long as they are present. If you suspect there is a flaw in the pattern or know some way to simplify it then I would really appreciate a quick note in the comments field below.
re.compile(r""" #matching a combination of digits, lowercase, uppercase and punctuation ((?=.*\d)(?=.*[a-z])(?=.*['"-,\.]).+| #d,l,p (?=.*[A-Z])(?=.*[a-z])(?=.*['"-,\.]).+| #u,l,p (?=.*\d)(?=.*[A-Z])(?=.*[a-z]).+| #d,u,l (?=.*\d)(?=.*[A-Z])(?=.*['"-,\.]).+ #u,p,d )""",re.VERBOSE)
That's all for now. I believe Ian is planning to run the competition for a further month and there are still considerable improvements to be made so it would be great to see more people taking part.