The coming computer vision revolution

3 layer (7,5,3 hidden layers) neural network created in R using the neuralnet package.
3 layer (7,5,3 hidden layers) neural network created in R using the neuralnet package.

 

Nothing of him that doth fade
But doth suffer a sea-change
Into something rich and strange.

– Shakespeare, The Tempest 1.2.396-401

I’m halfway through auditing Stanford’s CS231n course – Convolutional Neural Networks for Visual Recognition.

Wow. Just Wow. There is a sea-changing paradigm shift that is happening NOW –  we probably have not fully realized it yet.

We are all tangentially aware of CV applications in our daily lives – Facebook’s ability to find us in photos, optical character recognition (OCR) of our address on postal mail, that sort of thing. But these algorithms were rule-based expert systems grounded in supervised learning methods. Applications were largely one-off for a specific, single task. They were expensive, complicated, and somewhat error prone.

So what changed?   First, a little history. In the early 1980’s I had a good friend obtaining a MS in comp sci all atwitter about “Neural Networks”. Back then they went nowhere. Too much processing/memory/storage required, too difficult to tune, computationally slow. Fail.

Then:

1999 –  Models beginning with SIFT & ending with SVM (support vector machine) deformable parts. Best model only 74% accurate.

2006 – Restricted Boltzmann Machines apply backpropogation to allow deep neural networks.

2012 – AlexNet Deep learning applied to Imagenet classification database competition achieves a nearly 2X increase in accuracy to earlier SVM methods.

2015-   ResNet Deep learning system achieves a 4.5X increase in accuracy compared to Alexnet and 8X increase in accuracy to old SVM models.

In practical aspects, what does this mean? On a data set with 1000 different items  (ImageNet), ResNet is getting the item 100% correct (compared to a human) about 80% of the time, and correctly classifies the image as one of a list of 5 most probable items 96.4% of the time. People are typically believed to have 95% accuracy identifying the correct image. It’s clear to see that the computer is not far off.

2012 was the watershed year with the first application and win of the CNN to the dataset, and the improvement was significant enough it sparked additional refinements and development. That is still going on – the ResNet example was just released in December 2015! It’s clear that this is an area of active research and further improvements expected.

The convolutional neural network is a game-changer and will likely approach and perhaps exceed human accuracy in computer vision and classification in the near future. That’s a big deal.  As this is a medical blog, the applications to healthcare are obvious – radiology, pathology, dermatology, ophthalmology for starters.  But the CNN may also be useful for the complicated process problems I’ve developed here on the blog – the flows themselves resemble networks naturally.  So why not model them as such?  Why is it a game changer?  Because the model is probably universally adaptable to visual classification problems and once trained, potentially cheap.

 

I’ll  write more on this in the coming weeks – I’ve been inching towards deep learning models (but lagging blogging about them) but there is no reason to wait any more. The era of the deep learning neural network is here.