{"id":13649,"date":"2018-02-15T15:51:28","date_gmt":"2018-02-15T20:51:28","guid":{"rendered":"http:\/\/n2value.com\/blog\/?p=13649"},"modified":"2018-02-15T15:58:39","modified_gmt":"2018-02-15T20:58:39","slug":"are-computers-better-than-doctors-will-the-computer-see-you-now-what-we-learnt-from-the-chexnet-paper-for-pneumonia-diagnosis","status":"publish","type":"post","link":"https:\/\/n2value.com\/blog\/are-computers-better-than-doctors-will-the-computer-see-you-now-what-we-learnt-from-the-chexnet-paper-for-pneumonia-diagnosis\/","title":{"rendered":"Are computers better than doctors ? Will the computer see you now ? What we learnt from the ChexNet paper for pneumonia diagnosis \u2026"},"content":{"rendered":"<h5>Author&#8217;s Note: This was a fun side-project for the American College of Radiology&#8217;s Residents and Fellows Section.\u00a0 Judy Gichoya and I co-wrote the article. \u00a0 <a href=\"https:\/\/hackernoon.com\/are-computers-better-than-doctors-2e07a05ae7ea\" target=\"_blank\" rel=\"noopener\">The original article was posted by Judy to Medium and appeared on HackerNoon<\/a>.\u00a0 It was really an enlightening gathering of experts in the field.\u00a0 There is a small, but hopefully growing number of radiologists who are also deep learning practitioners.<\/h5>\n<p>&nbsp;<\/p>\n<p id=\"af37\" class=\"graf graf--p graf-after--h3\"><em class=\"markup--em markup--p-em\">Written by <\/em><a class=\"markup--user markup--p-user\" href=\"https:\/\/medium.com\/@judywawira\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/medium.com\/@judywawira\" data-anchor-type=\"2\" data-user-id=\"53ba2bb84284\" data-action-value=\"53ba2bb84284\" data-action=\"show-user-card\" data-action-type=\"hover\"><em class=\"markup--em markup--p-em\">Judy Gichoya<\/em><\/a><em class=\"markup--em markup--p-em\"> &amp; <\/em><a class=\"markup--user markup--p-user\" href=\"https:\/\/medium.com\/@sborstelmannmd\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/medium.com\/@sborstelmannmd\" data-anchor-type=\"2\" data-user-id=\"7f53283275d3\" data-action-value=\"7f53283275d3\" data-action=\"show-user-card\" data-action-type=\"hover\"><em class=\"markup--em markup--p-em\">Stephen Borstelmann MD<\/em><\/a><\/p>\n<p>&nbsp;<\/p>\n<p id=\"134b\" class=\"graf graf--p graf-after--p\">In December 2017\u00a0, we (radiologists both in training, staff radiologists and AI practitioners) discussed our role as knowledge experts in world of AI, summarized here <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/becominghuman.ai\/radiologists-as-knowledge-experts-in-a-world-of-artificial-intelligence-summary-of-radiology-ec63a7002329\" target=\"_blank\" rel=\"nofollow noopener\" data-href=\"https:\/\/becominghuman.ai\/radiologists-as-knowledge-experts-in-a-world-of-artificial-intelligence-summary-of-radiology-ec63a7002329\">https:\/\/becominghuman.ai\/radiologists-as-knowledge-experts-in-a-world-of-artificial-intelligence-summary-of-radiology-ec63a7002329<\/a>. For the month of January, we addressed the performance of deep learning algorithms for disease diagnosis\u00a0, specifically focusing on the paper by the stanford group\u200a\u2014\u200a<strong class=\"markup--strong markup--p-strong\">CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning. <\/strong>We continue to generate a large interest in the journal club\u00a0, with 347 people registered\u00a0, 150 of whom signed on January 24th 2018 to participate in the discussion.<\/p>\n<p id=\"31a0\" class=\"graf graf--p graf-after--p\">The paper has had 3 revisions and is available here <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/arxiv.org\/abs\/1711.05225\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/arxiv.org\/abs\/1711.05225\">https:\/\/arxiv.org\/abs\/1711.05225<\/a>\u00a0. Like many deep learning papers that claim super human performance\u00a0, the paper was widely circulated in the news media, several blog posts\u00a0, on <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/www.reddit.com\/r\/Radiology\/comments\/7d8f5k\/chexnet_radiologistlevel_pneumonia_detection_on\/\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/www.reddit.com\/r\/Radiology\/comments\/7d8f5k\/chexnet_radiologistlevel_pneumonia_detection_on\/\">reddit<\/a> and twitter.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-13650\" src=\"http:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/ngtwitter.png\" alt=\"ngtwitter\" width=\"800\" height=\"525\" srcset=\"https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/ngtwitter.png 800w, https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/ngtwitter-300x197.png 300w, https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/ngtwitter-768x504.png 768w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\" \/><\/p>\n<p>Please note that the findings of superhuman performance are increasingly being reported in medical AI papers. For example, <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/futurism.com\/medical-ai-may-better-spotting-eye-disease-real-doctors\/\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/futurism.com\/medical-ai-may-better-spotting-eye-disease-real-doctors\/\">this article<\/a> denotes that \u201cMedical AI May Be Better at Spotting Eye Disease Than Real Doctors\u201d<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-13651\" src=\"http:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/CVDretina.png\" alt=\"CVDretina\" width=\"1176\" height=\"846\" srcset=\"https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/CVDretina.png 1176w, https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/CVDretina-300x216.png 300w, https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/CVDretina-768x552.png 768w, https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/CVDretina-1024x737.png 1024w\" sizes=\"auto, (max-width: 1176px) 100vw, 1176px\" \/><\/p>\n<p>To help critique the ChexNet paper\u00a0, we constituted a panel composed of the <strong class=\"markup--strong markup--p-strong\">author<\/strong> team (most of the authors listed on the paper were kind enough to be in attendance\u200a\u2014\u200athank you!), Dr. Luke(<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/lukeoakdenrayner.wordpress.com\/2018\/01\/24\/chexnet-an-in-depth-review\/\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/lukeoakdenrayner.wordpress.com\/2018\/01\/24\/chexnet-an-in-depth-review\/\">blog<\/a>) and Dr. Paras (<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/medium.com\/@paras42\/dear-mythical-editor-radiologist-level-pneumonia-in-chexnet-c91041223526\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/medium.com\/@paras42\/dear-mythical-editor-radiologist-level-pneumonia-in-chexnet-c91041223526\">blog<\/a>) who had critiqued the <strong class=\"markup--strong markup--p-strong\">data<\/strong> used and Jeremy Howard (past president and chief scientist of <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/www.kaggle.com\/\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/www.kaggle.com\/\">Kaggle<\/a>, a data analytics competition site, Ex-CEO of <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/www.enlitic.com\/news.html\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/www.enlitic.com\/news.html\">Enlitic,<\/a> a healthcare imaging company, and the Current CEO of <a class=\"markup--anchor markup--p-anchor\" href=\"http:\/\/www.fast.ai\/\" target=\"_blank\" rel=\"noopener\" data-href=\"http:\/\/www.fast.ai\/\">Fast.ai<\/a>, a deep learning educational site) to provide insight to deep learning <strong class=\"markup--strong markup--p-strong\">methodology<\/strong>.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-13652\" src=\"http:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/chexnet.png\" alt=\"chexnet\" width=\"800\" height=\"448\" srcset=\"https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/chexnet.png 800w, https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/chexnet-300x168.png 300w, https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/chexnet-768x430.png 768w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\" \/><\/p>\n<p id=\"0dca\" class=\"graf graf--p graf-after--figure\">In this blog we summarise the methodology of reviewing medical AI papers.<\/p>\n<h4 id=\"710b\" class=\"graf graf--h4 graf-after--p\">Radiology 101<\/h4>\n<p id=\"e20d\" class=\"graf graf--p graf-after--h4\">The ChexNet paper reviews performance of AI versus 4 trained radiologists in diagnosing pneumonia. Pneumonia is a clinical diagnosis\u200a\u2014\u200aa patient will present with fever and cough\u00a0, and can get a chest Xray(CXR) to identify complications of pneumonia. Patients will usually get blood cultures to supplement diagnosis. Pneumonia on a CXR is not easily distinguishable from other findings that fill the alevolar spaces\u200a\u2014\u200aspecifically pus\u00a0, blood\u00a0, fluid or collapsed lung called atelectasis. The radiologists interpreting these studies can therefore use terms like infiltrates\u00a0, consolidation and atelectasis interchangeably.<\/p>\n<h4 id=\"7c39\" class=\"graf graf--h4 graf-after--p\">Show me the\u00a0data<\/h4>\n<p id=\"1a9b\" class=\"graf graf--p graf-after--h4\">The data used for this study is the ChestX-ray14 dataset which is the largest publicly available imaging data set that consists of 112,120 frontal chext xray radiographs of 30,805 unique patients and expands the ChestX-Ray 8, described<a class=\"markup--anchor markup--p-anchor\" href=\"http:\/\/openaccess.thecvf.com\/content_cvpr_2017\/papers\/Wang_ChestX-ray8_Hospital-Scale_Chest_CVPR_2017_paper.pdf\" target=\"_blank\" rel=\"noopener\" data-href=\"http:\/\/openaccess.thecvf.com\/content_cvpr_2017\/papers\/Wang_ChestX-ray8_Hospital-Scale_Chest_CVPR_2017_paper.pdf\"> by Wang, et. al<\/a>. Each radiograph is labeled with one or more of 14 different pathology labels, or a \u2018no finding\u2019 label.<\/p>\n<p id=\"f3a9\" class=\"graf graf--p graf-after--p\">Labeling of the radiographs was performed using Natural Language Processing (NLP) by mining the text in the radiology reports. Individual case labels were not assigned by humans.<\/p>\n<blockquote id=\"d726\" class=\"graf graf--blockquote graf-after--p\"><p><strong class=\"markup--strong markup--blockquote-strong\">Critique<\/strong>: Labeling medical data remains a big challenge especially because the radiology report is a tool for communicating to ordering doctors and not a description of the images. For example\u00a0, in an ICU film with a central line, tracheostomy tube and chest tube may be reported as \u201cstable lines and tubes\u201d without detailed description of the every individual finding on the CXR. This can be missclassified by NLP as a study without findings. This image-report disconcordance occurs at a high rate on this dataset.<\/p><\/blockquote>\n<blockquote id=\"d16f\" class=\"graf graf--blockquote graf-after--blockquote\"><p>Moreover reportable findings could be ignored by the NLP technique and\/or labeling schema, either through error or pathology outside of one of the 14 labels. The paper\u2019s claims of 90%+ NLP mining accuracy do not appear to be accurate. (SMB,LOR,JH). One of the panelists\u200a\u2014\u200aLuke reviewed several hundred examples and found the NLP labeling about 50% accurate overall compared to the image, with the pneumonia labeling worse\u200a\u2014\u200a30\u201340%.<\/p><\/blockquote>\n<blockquote id=\"97b9\" class=\"graf graf--blockquote graf-after--blockquote\"><p>Jeremy Howard notes that the use of an old NLP tool contributes to the inaccuracy due to the preponderance of \u2018No Findings\u2019 cases in the dataset skewing the data\u200a\u2014\u200ahe doesn\u2019t think that the precision of normal findings in this dataset is likely improved over random. Looking at the pneumonia label, it is only 60% accurate. A lot of the discrepancy can be drawn back to the core NLP method, which he characterized as \u201cmassively out of date and known to be inaccurate\u201d. He feels a re-characterization of the labels with a more up-to-date NLP system is appropriate.<\/p><\/blockquote>\n<figure id=\"attachment_13654\" aria-describedby=\"caption-attachment-13654\" style=\"width: 994px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-13654\" src=\"http:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/trach.png\" alt=\"Chest X Ray, CXR, Deep Learning, CheXNet, n2value, tracheostomy, infiltrates, pulmonary edema\" width=\"994\" height=\"1056\" srcset=\"https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/trach.png 994w, https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/trach-282x300.png 282w, https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/trach-768x816.png 768w, https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/trach-964x1024.png 964w\" sizes=\"auto, (max-width: 994px) 100vw, 994px\" \/><figcaption id=\"caption-attachment-13654\" class=\"wp-caption-text\">Chest Xray showing a tracheostomy tube , right internal jugular dialysis line and diffuse infiltrates likely pulmonary edema. The lines and tubes for an ICU patient are easily reported as \u201cStable\u201d<\/figcaption><\/figure>\n<p>The stanford group tackled the labeling challenge by having 4 radiologists (one specializing in thoracic imaging and 3 non thoracic radiologists) assign labels to a subset of the data for training created through a stratified random sampling, for a minimum of 50 positive cases of each label, with a final N=420.<\/p>\n<blockquote id=\"24c2\" class=\"graf graf--blockquote graf-after--p\"><p><strong class=\"markup--strong markup--blockquote-strong\">Critique<\/strong>: The ChestXRay14 contains many patients with only one radiograph but those who had multiple studies tended to have many. While the text-mined reports may match clinical information, any mismatch between the assigned label and radiographic appearance hurts the predictive power of the dataset.<\/p><\/blockquote>\n<blockquote id=\"9d20\" class=\"graf graf--blockquote graf-after--blockquote\"><p>Moreover\u00a0, what do the labels actually mean? Dr. Oakden-Rayner questions what the labels mean\u200a\u2014\u200ado they mean a radiologic pneumonia or a clinical pneumonia? In an immunocompromised patient, radiography of a pneumonia might be negative, largely because the patient cannot mount an immune response to the pathogen. This does not mean that the clinical diagnosis of pneumonia is inaccurate. The imaging appearance and clinical appearance\/diagnosis therefore would not match.<\/p><\/blockquote>\n<blockquote id=\"ee5d\" class=\"graf graf--blockquote graf-after--blockquote\"><p>The closeness of four of the labels: Pneumonia, Consolidation, Infiltration, and Atelectasis introduces a new level of complexity. Pneumonia is a subset of consolidation and infiltration is a superset of consolidation. While the dataset labels these as 4 separate entities, to the radiologic practitioner they may not be separate at all. It is important to have experts look at images when doing an image classification task.<\/p><\/blockquote>\n<p id=\"5497\" class=\"graf graf--p graf-after--blockquote\">See a great summary of the data problems on this <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/lukeoakdenrayner.wordpress.com\/2018\/01\/24\/chexnet-an-in-depth-review\/\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/lukeoakdenrayner.wordpress.com\/2018\/01\/24\/chexnet-an-in-depth-review\/\">blog<\/a> posting from Luke who was one of the panelists <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/lukeoakdenrayner.wordpress.com\/2018\/01\/24\/chexnet-an-in-depth-review\/\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/lukeoakdenrayner.wordpress.com\/2018\/01\/24\/chexnet-an-in-depth-review\/\">here<\/a>.<\/p>\n<h4 id=\"9b72\" class=\"graf graf--h4 graf-after--p\">Model<\/h4>\n<p id=\"cbe8\" class=\"graf graf--p graf-after--h4\">The CheXNet algorithm is a 121-layer deep 2D Convolutional Neural Network; a <strong class=\"markup--strong markup--p-strong\">Densenet<\/strong> after<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/arxiv.org\/abs\/1608.06993\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/arxiv.org\/abs\/1608.06993\"> Huang &amp; Liu<\/a>. The Densenet\u2019s multiple residual connections reduce parameters and training time, allowing a deeper, more powerful model. The model accepts a vectorized two-dimensional image of size 224 pixels by 224 pixels.<\/p>\n<figure id=\"attachment_13653\" aria-describedby=\"caption-attachment-13653\" style=\"width: 418px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-13653\" src=\"http:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/Densenet.png\" alt=\"DenseNet, Convolutional Neural Network, CNN, AI, machine learning, deep learning\" width=\"418\" height=\"327\" srcset=\"https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/Densenet.png 418w, https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/Densenet-300x235.png 300w\" sizes=\"auto, (max-width: 418px) 100vw, 418px\" \/><figcaption id=\"caption-attachment-13653\" class=\"wp-caption-text\">Densenet<\/figcaption><\/figure>\n<p id=\"68b9\" class=\"graf graf--p graf-after--figure\">To improve trust in CheXNet\u2019s output, a Class Activation Mapping (GRAD-CAM) heatmap was utilized after <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/people.csail.mit.edu\/bzhou\/publication\/Zhou_Learning_Deep_Features_CVPR_2016_paper.pdf\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/people.csail.mit.edu\/bzhou\/publication\/Zhou_Learning_Deep_Features_CVPR_2016_paper.pdf\">Zhou et al<\/a>. This allows the human user to \u201csee\u201d what areas of the radiograph provide the strongest activation of the Densenet for the highest probability label.<\/p>\n<blockquote id=\"2c16\" class=\"graf graf--blockquote graf-after--p\"><p><strong class=\"markup--strong markup--blockquote-strong\">Critique<\/strong>: Jeremy notes that image preprocessing of resizing to 224&#215;224 pixel size images and adding random horizontal flips is fairly standard, but leaves room for potential improvement, as effective data augmentation is one of the best ways to improve a model. Image downsizing to 224&#215;224 is a known issue\u200a\u2014\u200aboth from research and practical experience at Enlitic, larger images perform better in medical imaging (SMB: Multiple top 5 winners of the 2017 RSNA Bone age challenge had image sizes near 512&#215;512). Mr. Howard feels there is no reason to leave<a class=\"markup--anchor markup--blockquote-anchor\" href=\"http:\/\/www.image-net.org\/\" target=\"_blank\" rel=\"noopener\" data-href=\"http:\/\/www.image-net.org\/\"> Imagenet<\/a> trained models this size any longer. Regarding the model choice, the Densenet model is adequate, but <a class=\"markup--anchor markup--blockquote-anchor\" href=\"https:\/\/arxiv.org\/abs\/1707.07012\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/arxiv.org\/abs\/1707.07012\">NasNets<\/a> in the last 12 months have shown significant improvement (50%) over older models.<\/p><\/blockquote>\n<blockquote id=\"fe89\" class=\"graf graf--blockquote graf-after--blockquote\"><p>Pre-trained Imagenet weights were used, which is fine &amp; a standard approach; but Jeremy felt it would be nice if we had a medical imagenet for some semi-supervised training of an <a class=\"markup--anchor markup--blockquote-anchor\" href=\"http:\/\/automl.info\/\" target=\"_blank\" rel=\"noopener\" data-href=\"http:\/\/automl.info\/\">AutoML <\/a>encoder or a <a class=\"markup--anchor markup--blockquote-anchor\" href=\"https:\/\/blog.lunit.io\/2017\/05\/18\/siamese-one-shot-learners-and-feed-forward-one-shot-learners\/\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/blog.lunit.io\/2017\/05\/18\/siamese-one-shot-learners-and-feed-forward-one-shot-learners\/\">siamese network<\/a> to <a class=\"markup--anchor markup--blockquote-anchor\" href=\"https:\/\/en.wikipedia.org\/wiki\/Cross-validation_%28statistics%29\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/en.wikipedia.org\/wiki\/Cross-validation_(statistics)\">cross validate<\/a> patients\u200a\u2014\u200aleaving room for improvement. Consider that Imagenet consists of color images of dogs, cats, planes and trains\u200a\u2014\u200aand we are getting great results on X-rays? While better than nothing, <strong class=\"markup--strong markup--blockquote-strong\">ANY<\/strong> pretrained network trained on medical images in any modality would probably perform superiorly.<\/p><\/blockquote>\n<p id=\"debf\" class=\"graf graf--p graf-after--blockquote\">The Stanford team\u2019s <strong class=\"markup--strong markup--p-strong\">best idea<\/strong> was to train on multiple labels at the same time\u200a\u2014\u200ait is best to build a single model that predicts multiple classes\u200a\u2014\u200acounterintuitive, but bears out in deep learning models, and likely responsible for their model yielding better results than prior studies. <strong class=\"markup--strong markup--p-strong\">The more classes you train the model on properly, the better results you can expect<\/strong>.<\/p>\n<h4 id=\"6a33\" class=\"graf graf--h4 graf-after--p\">Results<\/h4>\n<p id=\"b5ca\" class=\"graf graf--p graf-after--h4\"><a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/en.wikipedia.org\/wiki\/F1_score\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/en.wikipedia.org\/wiki\/F1_score\">F1 scores<\/a> were used to evaluate both CheXNet model and the Stanford Radiologists.<\/p>\n<figure id=\"attachment_13655\" aria-describedby=\"caption-attachment-13655\" style=\"width: 810px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-13655\" src=\"http:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/Evaluation-metrics.png\" alt=\"Precision, Recall, F1 Score, ROC, AUC, AUCROC, metrics, measure, n2value\" width=\"810\" height=\"506\" srcset=\"https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/Evaluation-metrics.png 810w, https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/Evaluation-metrics-300x187.png 300w, https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/Evaluation-metrics-768x480.png 768w\" sizes=\"auto, (max-width: 810px) 100vw, 810px\" \/><figcaption id=\"caption-attachment-13655\" class=\"wp-caption-text\">Calculating F1 score<\/figcaption><\/figure>\n<p id=\"f1b0\" class=\"graf graf--p graf-after--figure\">Each Radiologists\u2019 F1 score was calculated by considering the other three radiologists as \u201cground truth.\u201d ChexNet\u2019s F1 score, was calculated vs. all 4 radiologists. A <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/en.wikipedia.org\/wiki\/Bootstrapping_%28statistics%29\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/en.wikipedia.org\/wiki\/Bootstrapping_(statistics)\">bootstrap calculation<\/a> was added to yield 95% confidence intervals.<\/p>\n<p id=\"ac3b\" class=\"graf graf--p graf-after--p\">CheXnet\u2019s results are as follows:<img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-13656\" src=\"http:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/Evaluation-results.png\" alt=\"Evaluation-results\" width=\"798\" height=\"495\" srcset=\"https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/Evaluation-results.png 798w, https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/Evaluation-results-300x186.png 300w, https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/Evaluation-results-768x476.png 768w\" sizes=\"auto, (max-width: 798px) 100vw, 798px\" \/><\/p>\n<p id=\"aec1\" class=\"graf graf--p graf-after--figure\">From the results, ChexNet outperforms human radiologists. The varying F1 scores can be interpreted to imply that for each study\u00a0, 4 radiologists do not seem to agree with each other on findings. However there is an outlier (rad 4\u200a\u2014\u200awith an F score of 0.442) who is the thoracic trained radiologists who performs better than the ChexNet.<\/p>\n<p id=\"3544\" class=\"graf graf--p graf-after--p\">Moreover CheXNet has State of the Art (SOTA) performance on all 14 pathologies compared to prior publications.<img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-13657\" src=\"http:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/eval-prior-benchmarks.png\" alt=\"eval - prior benchmarks\" width=\"922\" height=\"630\" srcset=\"https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/eval-prior-benchmarks.png 922w, https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/eval-prior-benchmarks-300x205.png 300w, https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/eval-prior-benchmarks-768x525.png 768w\" sizes=\"auto, (max-width: 922px) 100vw, 922px\" \/><\/p>\n<p>In my (JG) search\u00a0, the Machine Intelligence Lab, Institute of Computer Science &amp; Technology, Peking University, directed by Prof. Yadong Mu reports superior performance than the Stanford group. The code is open source and available here\u200a\u2014\u200a<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/github.com\/arnoweng\/CheXNet\" target=\"_blank\" rel=\"nofollow noopener\" data-href=\"https:\/\/github.com\/arnoweng\/CheXNet\">https:\/\/github.com\/arnoweng\/CheXNet\u00a0<\/a><\/p>\n<figure id=\"attachment_13658\" aria-describedby=\"caption-attachment-13658\" style=\"width: 1800px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-13658\" src=\"http:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/Peking-chexnet.png\" alt=\"CheXNet, AUROC, ROC, n2value\" width=\"1800\" height=\"1319\" srcset=\"https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/Peking-chexnet.png 1800w, https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/Peking-chexnet-300x220.png 300w, https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/Peking-chexnet-768x563.png 768w, https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/Peking-chexnet-1024x750.png 1024w\" sizes=\"auto, (max-width: 1800px) 100vw, 1800px\" \/><figcaption id=\"caption-attachment-13658\" class=\"wp-caption-text\">Results from various implementations of ChexNet<\/figcaption><\/figure>\n<figure id=\"f079\" class=\"graf graf--figure graf-after--p\"><figcaption class=\"imageCaption\">Results from various implementations of\u00a0ChexNet<\/figcaption><\/figure>\n<blockquote id=\"04da\" class=\"graf graf--blockquote graf-after--figure\"><p><strong class=\"markup--strong markup--blockquote-strong\">Critique<\/strong>\u200a\u2014\u200aVarious studies that assess cognitive fit show that human performance can be affected by lack of clinical information or prior comparisons that may affect their performance. Moreover, before the most recent version of the paper, human performance was unfairly scored against the machine.<\/p><\/blockquote>\n<h4 id=\"e8d2\" class=\"graf graf--h4 graf-after--blockquote\">Clinical significance<\/h4>\n<p id=\"5ab6\" class=\"graf graf--p graf-after--h4\">With the majority of labelled CXRs with pneumothorax having chest tubes present, the question must be raised: \u201care we training the Densenet to recognize pneumothoraces or chest tubes?\u201d<\/p>\n<h4 id=\"1e1c\" class=\"graf graf--h4 graf-after--p\"><strong class=\"markup--strong markup--h4-strong\">Peer review<\/strong><\/h4>\n<p id=\"94ef\" class=\"graf graf--p graf-after--h4\">Luke Oakden-Rayner MD, a radiologist in Australia with expertise in AI &amp; deep learning who was on our panel independently <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/lukeoakdenrayner.wordpress.com\/2017\/12\/18\/the-chestxray14-dataset-problems\/\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/lukeoakdenrayner.wordpress.com\/2017\/12\/18\/the-chestxray14-dataset-problems\/\">evaluated the ChestXRay-14 dataset<\/a>, and <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/lukeoakdenrayner.wordpress.com\/2018\/01\/24\/chexnet-an-in-depth-review\/\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/lukeoakdenrayner.wordpress.com\/2018\/01\/24\/chexnet-an-in-depth-review\/\">CheXNet<\/a>. He praises the <strong class=\"markup--strong markup--p-strong\">Stanford team for their openness and patience in discussing the paper\u2019s methodology<\/strong>, and their willingness to modify the paper to correct a methodologic flaw which biased against evaluating radiologists.<\/p>\n<h4 id=\"774d\" class=\"graf graf--h4 graf-after--p\">Summary<\/h4>\n<p id=\"8180\" class=\"graf graf--p graf-after--h4\">For the second AI journal club we analysed the pipeline of AI papers in medicine. You must make sure you are asking the right clinical question to be answered and not doing algorithms for the sake of doing something. Thereafter understand whether your data will help you answer the question you have, looking into details on how the data was collected and labeled.<\/p>\n<p id=\"e3bc\" class=\"graf graf--p graf-after--p\">To determine human level or super human performance, ensure the baseline metrics are adequate and not biased against one group.<\/p>\n<figure id=\"attachment_13659\" aria-describedby=\"caption-attachment-13659\" style=\"width: 800px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-13659\" src=\"http:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/flowchart.png\" alt=\"Flowchart, AI, Deep Learning, Medicine, n2value\" width=\"800\" height=\"973\" srcset=\"https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/flowchart.png 800w, https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/flowchart-247x300.png 247w, https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/flowchart-768x934.png 768w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\" \/><figcaption id=\"caption-attachment-13659\" class=\"wp-caption-text\">Pipeline for AI in medicine<\/figcaption><\/figure>\n<p id=\"f155\" class=\"graf graf--p graf-after--figure\">The model appears to give at-human performance for experts, or better than human performance for less-trained practitioners. This is in line with research findings and Enlitic\u2019s experience. We should not be surprised by that; the research in Convolutional Neural Networks has consistently reported near-human or super-human performance consistently.<\/p>\n<h4 id=\"c0db\" class=\"graf graf--h4 graf-after--p\">Take Aways<\/h4>\n<ol class=\"postList\">\n<li id=\"cc59\" class=\"graf graf--li graf-after--h4\">There is exists a critical gap in the labeling of medical data.<\/li>\n<li id=\"542d\" class=\"graf graf--li graf-after--li\">Do not forget the clinical significance of your results.<\/li>\n<li id=\"c9ae\" class=\"graf graf--li graf-after--li\">Embrace peer review especially in medicine and AI<\/li>\n<\/ol>\n<p id=\"b5cb\" class=\"graf graf--p graf-after--li\">These were the best tweets regarding the problem of labeling medical data\u200a\u2014\u200aaka do not get discouraged to attempt deep learning for medicine.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-13660\" src=\"http:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/twitterJHKL.png\" alt=\"twitterJHKL\" width=\"1266\" height=\"1300\" srcset=\"https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/twitterJHKL.png 1266w, https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/twitterJHKL-292x300.png 292w, https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/twitterJHKL-768x789.png 768w, https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/twitterJHKL-997x1024.png 997w\" sizes=\"auto, (max-width: 1266px) 100vw, 1266px\" \/><\/p>\n<p id=\"e89f\" class=\"graf graf--p graf-after--figure\">The journal club was a success, so if you are a doctor or an AI scientist\u00a0, join us at <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/tribe.radai.club\" target=\"_blank\" rel=\"nofollow noopener\" data-href=\"https:\/\/tribe.radai.club\">https:\/\/tribe.radai.club<\/a> to continue with the conversations on AI and medicine. You can listen to the recording of this journal club here\u00a0: <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/youtu.be\/xoUpKjxbeC0\" target=\"_blank\" rel=\"nofollow noopener\" data-href=\"https:\/\/youtu.be\/xoUpKjxbeC0\">https:\/\/youtu.be\/xoUpKjxbeC0<\/a>\u00a0. Our next guest is <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/tigebru\/\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/tigebru\/\">Timnit Gebru<\/a> who worked on US demographic household prediction using Google Street view images on 22nd February 2018. She will be talking on <strong class=\"markup--strong markup--p-strong\">Using deep learning and Google Street View to estimate the demographic makeup of neighborhoods across the United States (<\/strong><a class=\"markup--anchor markup--p-anchor\" href=\"http:\/\/www.pnas.org\/content\/114\/50\/13108\" target=\"_blank\" rel=\"noopener\" data-href=\"http:\/\/www.pnas.org\/content\/114\/50\/13108\">http:\/\/www.pnas.org\/content\/114\/50\/13108<\/a>).<\/p>\n<h4 id=\"f9ff\" class=\"graf graf--h4 graf-after--p\">Coming soon<\/h4>\n<p id=\"64f1\" class=\"graf graf--p graf-after--h4\">For the journal club we developed a human versus AI competition for interepreting the CXRs in the dataset hosted at <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/radai.club\" target=\"_blank\" rel=\"nofollow noopener\" data-href=\"https:\/\/radai.club\">https:\/\/radai.club<\/a>. We will be publishing the outcome of our crowdsourced labels soon, with a detailed analysis to check whether the model performance improves.<\/p>\n<h4 id=\"32b5\" class=\"graf graf--h4 graf-after--p\"><strong class=\"markup--strong markup--h4-strong\">Say thanks<\/strong><\/h4>\n<p id=\"45ed\" class=\"graf graf--p graf-after--h4\">This I would like to thank the panelists including <a class=\"markup--user markup--p-user\" href=\"https:\/\/medium.com\/@jeremyphoward\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/medium.com\/@jeremyphoward\" data-anchor-type=\"2\" data-user-id=\"34ab754f8c5e\" data-action-value=\"34ab754f8c5e\" data-action=\"show-user-card\" data-action-type=\"hover\">Jeremy Howard<\/a>, <a class=\"markup--user markup--p-user\" href=\"https:\/\/medium.com\/@paras42\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/medium.com\/@paras42\" data-anchor-type=\"2\" data-user-id=\"7d86917f81d4\" data-action-value=\"7d86917f81d4\" data-action=\"show-user-card\" data-action-type=\"hover\">Paras Lakhani<\/a>, <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/lukeoakdenrayner.wordpress.com\/\" target=\"_blank\" rel=\"home noopener\" data-href=\"https:\/\/lukeoakdenrayner.wordpress.com\/\">Luke Oakden-Rayner<\/a>\u00a0, and the <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/stanfordmlgroup.github.io\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/stanfordmlgroup.github.io\">Stanford ML team<\/a>. Thanks to the ACR RFS AI advisory council members including <a class=\"markup--user markup--p-user\" href=\"https:\/\/medium.com\/@kevinsealsmd\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/medium.com\/@kevinsealsmd\" data-anchor-type=\"2\" data-user-id=\"c5e52d8f6dbb\" data-action-value=\"c5e52d8f6dbb\" data-action=\"show-user-card\" data-action-type=\"hover\">Kevin Seals<\/a>.<\/p>\n<h4 id=\"9bfb\" class=\"graf graf--h4 graf-after--p\">Article corrections made<\/h4>\n<ol class=\"postList\">\n<li id=\"8bc3\" class=\"graf graf--li graf-after--h4\">This article referred to Jeremy Howard (Ex-CEO of <a class=\"markup--anchor markup--li-anchor\" href=\"https:\/\/www.kaggle.com\/\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/www.kaggle.com\/\">Kaggle<\/a>)\u200a\u2014\u200aupdated to \u201cpresident and chief scientist of Kaggle\u201d<\/li>\n<li id=\"4ab7\" class=\"graf graf--li graf-after--li\">Article stated <em class=\"markup--em markup--li-em\">NLP performance on that dataset is not likely improved over random.<\/em>Jeremy clarified that the <em class=\"markup--em markup--li-em\">precision of the normal finding<\/em> was what was not likely improved over random<\/li>\n<\/ol>\n<div id=\"01a2\" class=\"graf graf--mixtapeEmbed graf-after--li\"><\/div>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Author&#8217;s Note: This was a fun side-project for the American College of Radiology&#8217;s Residents and Fellows Section.\u00a0 Judy Gichoya and I co-wrote the article. \u00a0 The original article was posted by Judy to Medium and appeared on HackerNoon.\u00a0 It was really an enlightening gathering of experts in the field.\u00a0 There is a small, but hopefully [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":13659,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"Very pleased to have been able to co-author with @judywawira the blogpost for the #radaijc with panelists @pranavrajpurkar @DrLukeOR @jeremyphoward @ParasLakhaniMD & others. Was just a fantastic discussion","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false},"version":2}},"categories":[22,24],"tags":[28],"class_list":["post-13649","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-computer-vision","category-radiology","tag-ai"],"jetpack_publicize_connections":[],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/n2value.com\/blog\/wp-content\/uploads\/2018\/02\/flowchart.png","jetpack_shortlink":"https:\/\/wp.me\/p4mtfP-3y9","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/n2value.com\/blog\/wp-json\/wp\/v2\/posts\/13649","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/n2value.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/n2value.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/n2value.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/n2value.com\/blog\/wp-json\/wp\/v2\/comments?post=13649"}],"version-history":[{"count":3,"href":"https:\/\/n2value.com\/blog\/wp-json\/wp\/v2\/posts\/13649\/revisions"}],"predecessor-version":[{"id":13663,"href":"https:\/\/n2value.com\/blog\/wp-json\/wp\/v2\/posts\/13649\/revisions\/13663"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/n2value.com\/blog\/wp-json\/wp\/v2\/media\/13659"}],"wp:attachment":[{"href":"https:\/\/n2value.com\/blog\/wp-json\/wp\/v2\/media?parent=13649"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/n2value.com\/blog\/wp-json\/wp\/v2\/categories?post=13649"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/n2value.com\/blog\/wp-json\/wp\/v2\/tags?post=13649"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}