Machine Learning Workshop, Day 1.
Today I went back to my old university department to reprise a workshop I gave back in '05 at the end of my PhD. It's part of a 3 day series of talks and workshops on 'Machine Learning' (so, statistics with computers) and I'm giving two 1.5hr workshops on Monday and Tuesday afternoon. It's great to be able to split it into 2 parts as the workshop was originally given in a single 1.5hr session, and things got a little rushed.
The day started with 3 speakers from around the world (okay, technically the day started with me getting up at 6am and catching a train to Cambridge, but I digress...) presenting lectures on Fuzzy Clustering; Support Vector Machines and Kernel Methods; and Ant Colonies.
All three were genuinely interesting, though sadly I've been cursed with the impatience gene and can't concentrate for longer than 30mins. Short versions:
-Fuzzy clustering. 'Fuzzy' lets you say a data points belongs to more than one set, and have different degrees of 'belonging'. Very handy with clustering when you have 2 distinct clusters and a single point in between them. Regular clustering assigns the point to one of the clusters, botching up your analysis. Fuzzy clustering says it's 50:50 as to which cluster it belongs to, leaving the rest of your analysis untouched.
-Support Vector Machines. Separate data into two groups using a hyperplane. Hyperplane positioned so as to minimise the distance between the line and a set of data points (the supports). Can use Kernal methods to fit non-linear partitions: increase the dimensionality of the data, fit a hyperplane in this increased space, and then drop back down to the original data space. Hyperplane can now be non-linear. V. cool.
-Ant Colonies. Hadn't come across this before. Based on natural pheromone where ants leave a pheromone trail when walking to and from food sources. Other ants follow, building up the trail. If you have two paths: one long and one short, ants take both paths but the short trail ants get to the food first. They turn to head back and see their original pheromone trail on the short path, and nothing on the long path, so take the short path, again leaving pheromones. This doubles the strength of the short path, so that when the long path ants arrive, and turn to go back, they see a weak trail on the long path and a strong trail on the short path, so take the short path back. All future ants now only take the short path. Now apply to searching algorithms e.g. the travelling salesmen. Go on. Off you go.
Then came a workshop on Kernal techniques which I skipped as a) the room was packed, b) I wanted to email my wife and c) it gave me a chance to write this entry before I forgot it all. After that I gave part one of my workshop which covered:
-What R is (a statistical programming language)
-How to load, write and manipulate data (read.table, write.table,name<-dataset[i,j])
-Some simple statistics (dim,summary,cor, cov)
-Some simple plots (pairs, plot, hist, boxplot)
-Kernel Density Estimation (density)
Seemed to go well. A couple of people came up to me afterwards and said they enjoyed the talk and found it useful, which was nice.
Then is was off to the hotel to check in, quick catch up with Phil and Fiona at 'The Snug', and finally dinner at Peterhouse where I got to meet a man who had been shot at in Sweden for threatening their princess (due to an unfortunate series of misunderstandings).