Machine Intelligence in Medical Imaging Conference – Report

blueI heard about the Society of Imaging Informatics in Medicine’s (SIIM) Scientific Conference on Machine Intelligence in Medical Imaging (C-MIMI) on Twitter.  Priced attractively, easy to get to, I’m interested in Machine Learning and it was the first radiology conference I’ve seen on this subject, so I went.  Organized on short notice so I was expecting a smaller conference.


I almost didn’t get a seat.  It was packed.

The conference had real nuts and bolts presentations & discussions on healthcare imaging machine learning (ML).  Typically, these were Convolutional Neural Networks (CNN‘s/Convnets) but a few Random Forests (RF) and Support Vector Machines (SVM) sneaked in, particularly in hybrid models along with a CNN (c.f.  Microsoft).  Following comments assume some facility in understanding/working with Convnets.

Some consistent threads throughout the conference:

  • Most CNN’s were trained on Imagenet with the final fully connected (FC) layer removed; then re-trained on radiology data with a new classifer FC layer placed at the end.
  • Most CNN’s were using Imagenet standard three layer RGB input despite being greyscale.  This is of uncertain significance and importance.
  • The limiting of input matrices to grids less than image size is inherited from the Imagenet competitions (and legacy computational power).  Decreased resolution is a limiting factor in medical imaging applications, potentially worked-around by multi-scale CNN’s.
  • There is no central data repository for a good “Ground Truth” to develop improved machine imaging models.
  • Data augmentation methods are commonly used due to lower numbers of obtained cases.

Keith Dryer DO PhD gave an excellent lecture about the trajectory of machine imaging and how it will be an incremental process with AI growth more narrow in scope than projected, chiefly limited by applications.  At this time, CNN creation and investigation is principally an artisanal product with limited scalability.  There was a theme – “What is ground truth?” which in different instances is different things (path proven, followed through time, pathognomonic imaging appearance).

There was an excellent educational session from the FDA’s Berkman Sahiner.  The difference between certifying a type II or type III device may keep radiologists working longer than expected!  A type II device, like CAD, identifies a potential abnormality but does not make a treatment recommendation and therefore only requires a 510(k) application.  A type III device, as in an automated interpretation program creating diagnosis and treatment recommendations will require a more extensive application including clinical trials, and a new validation for any material changes.  One important insight (there were many) was that the FDA requires training and test data to be kept separate.   I believe this means that simple cross-validation is not acceptable nor sufficient for FDA approval or certification.  Adaptive systems may be a particularly challenging area for regulation, as similar to the ONC, significant changes to the software of the algorithm will require a new certification/approval process.

Industry papers were presented from HK Lau of Arterys, Xiang Zhou of Siemens, Xia Li of GE, and Eldad Elnekave of Zebra medical.  The Zebra medical presentation was impressive, citing their use of the Google Inception V3 model and a false-color contrast limited adaptive histogram equalization algorithm, which not only provides high image contrast with low noise, but also gets around the 3-channel RGB issue.  Given statistics for their CAD program were impressive at 94% accuracy compared to a radiologist at 89% accuracy.

Scientific Papers were presented by Matthew Chen, Stanford; Synho Do, Harvard; Curtis Langlotz, Stanford; David Golan, Stanford; Paras Lakhani, Thomas Jefferson; Panagiotis Korfiatis, Mayo Clinic; Zeynettin Akkus, Mayo Clinic; Etka Bullar, U Saskatchewan; Mahmudur Rahman, Morgan State U; Kent Ogden SUNY upstate.

Ronald Summers, MD PhD from the NIH gave a presentation on the work from his lab in conjunction with Holger Roth, detailing the specific CNN approaches to Lymph Node detection, Anatomic level detection, Vertebral body segmentation, Pancreas Segmentation, and colon polyp screening with CT-colonography, which had high False Positives.  In his experience, deeper models performed better.  His lab also changes unstructured radiology reporting into structured reporting through ML techniques.

Abdul Halabi of NVIDIA gave an impressive presentation on the supercomputer-like DGX-1 GPU cluster (5 deliveries to date, the fifth of which was to Mass. General, a steal at over $100K), and the new Pascal architecture in the P4 & P40 GPU’s.  60X performance on AlexNet vs the original version/GPU configuration in 2012.  Very impressive.

Sayan Pathak of Microsoft Research and the Inner Eye team gave a good presentation where he demonstrated that a RF was really just a 2 layer DNN, i.e. a sparse 2 layer perceptron.   Combining this with a CNN (dNDE.NET), it beat googLENet’s latest version in the Imagenet arms race.  However, as one needs to solve for both structures simultaneously, it is an expensive (long, intense) computation.

Closing points were the following:

  • Most devs currently using Python – Tensorflow +/- Keras with fewer using CAFFE off of  Modelzoo
  • De-identification of data is a problem, even moreso when considering longitudinal followup.
  • Matching accuracy to the radiologist’s report may not be as important as actual outcomes report.
  • There was a lot of interest in organizing a competition to advance medical imaging, c.f. Kaggle.
  • Radiologists aren’t obsolete just yet.

It was a great conference.  An unexpected delight.  Food for your head!




Memory requirements for Convolutional Neural Network analysis of brain MRI.

Believed to be in the publc domainI’m auditing the wonderful Stanford CS 231n class on Convolutional Neural Networks in Computer Vision.

A discussion the other day was on the amount of memory required to analyze one image as it goes through the Convolutional Neural Network (CNN). This was interesting – how practical is it for application to radiology imaging?  (To review some related concepts see my earlier post : What Big Data  Visualization Analytics can learn from Radiology)

Take your standard non-contrast MRI of the brain. There are 5 sequences (T1, T2, FLAIR, DWI, ADC). For the purposes of this analysis, all axial. Assume a 320×320 viewing matrix for each slice. Therefore, one image will be a 320x320x5 matrix suitable for processing into a 512,000 byte vector. Applying this to the VGGNet Protocol D (1) yields the following:


In each image, there are 320 x and y pixels and each pixel holding a greyscale value. There are 5 different sequences. Each axial slice takes up 512KB, the first convolutional layers hold most of the memory at 6.4MB each, and summing all layers uses 30.5MB. Remember that you have to double the memory for the forward/backward pass through the network, giving you 61MB per image. Finally, the images do not exist in a void, but are part of about 15 axial slices of the head, giving you a memory requirement of 916.5MB, or about a gigabyte.

Of course, that’s just for feeding an image through the algorithm.

This is simplistic because:

  1. VGG is not going to get you to nearly enough accuracy for diagnosis! (50% accurate, I’m guessing)
  2. The MRI data is only put into slices for people to interpret – the data itself exists in K-space. What that would do to machine learning interpretation is another discussion.
  3. We haven’t even discussed speed of training the network.
  4. This is for older MRI protocols.  Newer MRI’s have larger matrices (512×512) and thinner slices (3mm) available, which will increase the necessary memory to approximately 4GB.

Nevertheless, it is interesting to note that the amount of memory required to train a neural network of brain MRI’s is in reach of a home network enthusiast.

(1). Karen Simonyan & Andrew Zisserman, Very Deep Convolutional Networks for Large-Scale Visual Recognition, ICLR 2015

The coming computer vision revolution

3 layer (7,5,3 hidden layers) neural network created in R using the neuralnet package.
3 layer (7,5,3 hidden layers) neural network created in R using the neuralnet package.


Nothing of him that doth fade
But doth suffer a sea-change
Into something rich and strange.

– Shakespeare, The Tempest 1.2.396-401

I’m halfway through auditing Stanford’s CS231n course – Convolutional Neural Networks for Visual Recognition.

Wow. Just Wow. There is a sea-changing paradigm shift that is happening NOW –  we probably have not fully realized it yet.

We are all tangentially aware of CV applications in our daily lives – Facebook’s ability to find us in photos, optical character recognition (OCR) of our address on postal mail, that sort of thing. But these algorithms were rule-based expert systems grounded in supervised learning methods. Applications were largely one-off for a specific, single task. They were expensive, complicated, and somewhat error prone.

So what changed?   First, a little history. In the early 1980’s I had a good friend obtaining a MS in comp sci all atwitter about “Neural Networks”. Back then they went nowhere. Too much processing/memory/storage required, too difficult to tune, computationally slow. Fail.


1999 –  Models beginning with SIFT & ending with SVM (support vector machine) deformable parts. Best model only 74% accurate.

2006 – Restricted Boltzmann Machines apply backpropogation to allow deep neural networks.

2012 – AlexNet Deep learning applied to Imagenet classification database competition achieves a nearly 2X increase in accuracy to earlier SVM methods.

2015-   ResNet Deep learning system achieves a 4.5X increase in accuracy compared to Alexnet and 8X increase in accuracy to old SVM models.

In practical aspects, what does this mean? On a data set with 1000 different items  (ImageNet), ResNet is getting the item 100% correct (compared to a human) about 80% of the time, and correctly classifies the image as one of a list of 5 most probable items 96.4% of the time. People are typically believed to have 95% accuracy identifying the correct image. It’s clear to see that the computer is not far off.

2012 was the watershed year with the first application and win of the CNN to the dataset, and the improvement was significant enough it sparked additional refinements and development. That is still going on – the ResNet example was just released in December 2015! It’s clear that this is an area of active research and further improvements expected.

The convolutional neural network is a game-changer and will likely approach and perhaps exceed human accuracy in computer vision and classification in the near future. That’s a big deal.  As this is a medical blog, the applications to healthcare are obvious – radiology, pathology, dermatology, ophthalmology for starters.  But the CNN may also be useful for the complicated process problems I’ve developed here on the blog – the flows themselves resemble networks naturally.  So why not model them as such?  Why is it a game changer?  Because the model is probably universally adaptable to visual classification problems and once trained, potentially cheap.


I’ll  write more on this in the coming weeks – I’ve been inching towards deep learning models (but lagging blogging about them) but there is no reason to wait any more. The era of the deep learning neural network is here.