Memory requirements for Convolutional Neural Network analysis of brain MRI.

Believed to be in the publc domainI’m auditing the wonderful Stanford CS 231n class on Convolutional Neural Networks in Computer Vision.

A discussion the other day was on the amount of memory required to analyze one image as it goes through the Convolutional Neural Network (CNN). This was interesting – how practical is it for application to radiology imaging?  (To review some related concepts see my earlier post : What Big Data  Visualization Analytics can learn from Radiology)

Take your standard non-contrast MRI of the brain. There are 5 sequences (T1, T2, FLAIR, DWI, ADC). For the purposes of this analysis, all axial. Assume a 320×320 viewing matrix for each slice. Therefore, one image will be a 320x320x5 matrix suitable for processing into a 512,000 byte vector. Applying this to the VGGNet Protocol D (1) yields the following:


In each image, there are 320 x and y pixels and each pixel holding a greyscale value. There are 5 different sequences. Each axial slice takes up 512KB, the first convolutional layers hold most of the memory at 6.4MB each, and summing all layers uses 30.5MB. Remember that you have to double the memory for the forward/backward pass through the network, giving you 61MB per image. Finally, the images do not exist in a void, but are part of about 15 axial slices of the head, giving you a memory requirement of 916.5MB, or about a gigabyte.

Of course, that’s just for feeding an image through the algorithm.

This is simplistic because:

  1. VGG is not going to get you to nearly enough accuracy for diagnosis! (50% accurate, I’m guessing)
  2. The MRI data is only put into slices for people to interpret – the data itself exists in K-space. What that would do to machine learning interpretation is another discussion.
  3. We haven’t even discussed speed of training the network.
  4. This is for older MRI protocols.  Newer MRI’s have larger matrices (512×512) and thinner slices (3mm) available, which will increase the necessary memory to approximately 4GB.

Nevertheless, it is interesting to note that the amount of memory required to train a neural network of brain MRI’s is in reach of a home network enthusiast.

(1). Karen Simonyan & Andrew Zisserman, Very Deep Convolutional Networks for Large-Scale Visual Recognition, ICLR 2015

The coming computer vision revolution

3 layer (7,5,3 hidden layers) neural network created in R using the neuralnet package.
3 layer (7,5,3 hidden layers) neural network created in R using the neuralnet package.


Nothing of him that doth fade
But doth suffer a sea-change
Into something rich and strange.

– Shakespeare, The Tempest 1.2.396-401

I’m halfway through auditing Stanford’s CS231n course – Convolutional Neural Networks for Visual Recognition.

Wow. Just Wow. There is a sea-changing paradigm shift that is happening NOW –  we probably have not fully realized it yet.

We are all tangentially aware of CV applications in our daily lives – Facebook’s ability to find us in photos, optical character recognition (OCR) of our address on postal mail, that sort of thing. But these algorithms were rule-based expert systems grounded in supervised learning methods. Applications were largely one-off for a specific, single task. They were expensive, complicated, and somewhat error prone.

So what changed?   First, a little history. In the early 1980’s I had a good friend obtaining a MS in comp sci all atwitter about “Neural Networks”. Back then they went nowhere. Too much processing/memory/storage required, too difficult to tune, computationally slow. Fail.


1999 –  Models beginning with SIFT & ending with SVM (support vector machine) deformable parts. Best model only 74% accurate.

2006 – Restricted Boltzmann Machines apply backpropogation to allow deep neural networks.

2012 – AlexNet Deep learning applied to Imagenet classification database competition achieves a nearly 2X increase in accuracy to earlier SVM methods.

2015-   ResNet Deep learning system achieves a 4.5X increase in accuracy compared to Alexnet and 8X increase in accuracy to old SVM models.

In practical aspects, what does this mean? On a data set with 1000 different items  (ImageNet), ResNet is getting the item 100% correct (compared to a human) about 80% of the time, and correctly classifies the image as one of a list of 5 most probable items 96.4% of the time. People are typically believed to have 95% accuracy identifying the correct image. It’s clear to see that the computer is not far off.

2012 was the watershed year with the first application and win of the CNN to the dataset, and the improvement was significant enough it sparked additional refinements and development. That is still going on – the ResNet example was just released in December 2015! It’s clear that this is an area of active research and further improvements expected.

The convolutional neural network is a game-changer and will likely approach and perhaps exceed human accuracy in computer vision and classification in the near future. That’s a big deal.  As this is a medical blog, the applications to healthcare are obvious – radiology, pathology, dermatology, ophthalmology for starters.  But the CNN may also be useful for the complicated process problems I’ve developed here on the blog – the flows themselves resemble networks naturally.  So why not model them as such?  Why is it a game changer?  Because the model is probably universally adaptable to visual classification problems and once trained, potentially cheap.


I’ll  write more on this in the coming weeks – I’ve been inching towards deep learning models (but lagging blogging about them) but there is no reason to wait any more. The era of the deep learning neural network is here.