OODA loop revisited – medical errors, heuristics, and AI.

OODA loop revisited – medical errors, heuristics, and AI.

My OODA loop post is actually one of the most popular on this site.   I  blame Venkatesh Rao of Ribbonfarm and his Tempo book and John Robb’s Brave New War for introducing me to Boyd’s methodology.   Venkatesh focuses on philosophy and management consulting, and Robb focuses on COIN and human social networks. Both are removed from healthcare, but applying Boyd’s principles to medicine: our enemy is disease, perhaps even ourselves.

Consider aerial dogfighting.  The human OODA loop is – Observe, Orient, Decide, Act.   You want to “get inside your opponent’s OODA loop” and out-think them, knowing their actions before they do, assuring victory.  If you know your opponent’s next move, you can anticipate where to shoot and end the conflict decisively.  Quoting Sun Tzu in The Art of War:

Sun Tzu Art of War OODA loops and AI

If you know the enemy and know yourself, you need not fear the result of a hundred battles. If you know yourself but not the enemy, for every victory gained you will also suffer a defeat. If you know neither the enemy nor yourself, you will succumb in every battle.

Focused, directed, lengthy and perhaps exhausting training for a fighter pilot enables them to “know their enemy” and anticipate action in a high-pressure, high-stakes aerial battle.  The penalty for failure is severe – loss of the pilot’s life.   Physicians prepare similarly – a lengthy and arduous training process in often adverse circumstances.  The penalty for failure is also severe – a patient’s death.  Given adequate intelligence and innate skill, successful pilots and physicians internalize their decision trees – transforming the OODA loop to a simpler OA loop – Observe and Act.  Focused practice allows the Orient and Decide portions of the loop to become automatic and intuitive, almost Zen-like.  This is what some people refer to as ‘Flow’ – an effortlessly hyperproductive state where total focus and immersion in a task suspends the perception of the passage of time.

For a radiologist, ‘flow’ is when you sit down at your PACS at 8am, continuously reading cases, making one great diagnosis after another, smiling as the words appear on Powerscribe. You’re killing the cases and you know it.  Then your stomach rumbles – probably time for lunch – you look up at the clock and it is 4pm.  That’s flow.

Flow is one of the reasons why experienced professionals are highly productive – and a smart manager will try to keep a star employee ‘in the zone’ as much as possible, removing extraneous interruptions, unnecessary low-value tasks, and distractions.

Kahneman defines this as fast type 1 thinking, intuitive and heuristic : quick, easy, and with sufficient experience/training, usually accurate.  But type 1 thinking can fail : a complex process masquerades as a simple one, additional important data is undiscovered or ignored, or a novel agent is introduced.  In these circumstances type 2 critical thinking is needed : slow, methodological, deductive and logical.  But humans err, substituting heuristic thinking for analytical thinking, and we get it wrong.

For the enemy fighter pilot, its the scene in Top Gun where Tom Cruise hits the air brakes to drop behind an attacking Mig to deliver a kill shot with his last missile. For a physician, it is an uncommon or rare disease presenting like a common one, resulting in a missed diagnosis and lawsuit.

To those experimenting in deep learning and Artificial intelligence, the time to train or teach the network far exceeds the time needed to process an unknown through the trained network.  Training can take hours to days, evaluation takes seconds.

Narrow AI’s like Convolutional Neural Networks take advantage of their speed to go through the OODA loop quickly, in a process called inference.  I suggest a deep learning algorithm functions as an OA loop on the specific type of data it has been trained on.  Inference is quick.

I believe that OODA loops are Kahneman’s Type 2 slow thinking.  OA loops are Kahneman’s Type 1 fast thinking.  Narrow AI inference is a type 1 OA loop.   An AI version of type 2 slow thinking doesn’t yet exist.*

And like humans, Narrow AI can be fooled.

Can your classifier tell the difference between a chihuahau and blueberry muffin?

If you haven’t seen the Chihuahua vs. blueberry muffin clickbait picture, consider yourself sheltered. Claims that narrow AI can’t tell the difference are largely, but not entirely, bogus.  While Narrow AI is generally faster than people, and potentially more accurate, it can still make errors. But so can people. In general, classification errors can be reduced by creating a more powerful, or ‘deeper’ network. I think collectively we have yet to decide how much error to tolerate in our AI’s. If we are willing to tolerate an error of 5% in humans, are we willing to tolerate the same in our AI’s, or do we expect 97.5%?  Or 99%? Or 99.9%?

The single pixel attack is a bit more interesting.  While similar images such as the ones above probably won’t pass careful human scrutiny, and frankly adversarial images unrecognizable to humans can be misinterpreted by a classifier:

Convolutional Neural Networks can be fooled by adversarial images

Selecting and perturbing a single pixel is much more subtle, and probably could escape human scrutiny.  Jaiwei Su et al address this in their “One Pixel Attack” paper, where the modification of one pixel in an image had between a 66% to 73% chance of changing the classification of that image.  By changing more than one pixel, success rates respectively rose.  The paper used older, less deep Narrow AI’s like VGG-16 and Network-in-network.  Newer models such as DenseNets and ResNets might be harder to fool.  This type of “attack” represents a real-world situation where the OA loop fails to account for unexpected new (or perturbed) information, and is incorrect.

Contemporaneous update: Google has developed images that use an adversarial attack to uniformly defeat classification attempts by standard CNN models.  By making “stickers” out of these processed images, the presence of such an image, even at less than 20% of the image size, is sufficient to change the classification to what the ensemble dictates, rather than the primary object in an image.  They look like this:

adversarial images capable of overriding CNN classifier


I am not aware of defined solutions to these problems – the obvious images that fool the classifier can probably be dealt with by ensembling other, more traditional forms of computer vision image analysis such as HOG or SVM’s.  For a one-pixel attack, perhaps widening the network and increasing the number of training samples by either data augmentation or adversarially generated features might make the network more robust.  This probably falls into the “too soon to tell” category.

There has been a great deal of interest and emphasis placed lately on understanding black-box models.  I’ve written about some of these techniques in other posts.  Some investigators feel this is less relevant.  However, by understanding how the models fail, they can be strengthened.  I’ve also written about this, but from a management standpoint.  There is a trade off between accuracy at speed, robustness, and serendipity.  I think the same principle applies to our AI’s as well.  By understanding the frailty of speedy accuracy vs. redundancies that come at the expense of cost, speed, and sometimes accuracy, we can build systems and processes that not only work but are less likely to fail in unexpected & spectacular ways.

Let’s acknowledge the likelihood of failure of narrow AI where it is most likely to fail, and design our healthcare systems and processes around that, as we begin to incorporate AI into our practice and management.  If we do that, we will truly get inside the OODA loop of our opponent – disease – and eradicate it before it even had a chance.  What a world to live in where the only thing disease can say is, “I never saw it coming.”


*I believe OODA loops have mathematical analogues. The OODA loop is inherently Bayesian – next actions iteratively decided by prior probabilities. Iterative deep learning constructs include LSTM and RNN’s (Recurrent Neural Networks) and of course, General Adversarial Networks (GANs). There have been attempts to not only use Bayesian learning for hyperparameter optimization but also combining it with RL(Reinforcement Learning) & GANs.  Time will only tell if this brings us closer to the vaunted AGI (Artificial General Intelligence)**.

**While I don’t think we will soon solve the AGI question, I wouldn’t be surprised if complex combinations of these methods, along with ones not yet invented, bring us close to top human expert performance in a Narrow AI. But I also suspect that once we start coding creativity and resilience into these algorithms, we will take a hit in accuracy as we approach less narrow forms of AI.  We will ultimately solve for the best performance of these systems, and while it may even eventually exceed human ability, there will likely always be an error present.  And in that area of error is where future medicine will advance.

© 2018

CheXNet – a brief evaluation

CheXNet – a brief evaluation

Chest X-Ray deep dreamed - our AI & deep learning future
Chest Radiograph from ChestX-ray14 dataset processed with the deep dream algorithm trained on ImageNet

NOTE: Controversy over the report and dataset continues.  I have updated the post since first written as new information has become available.  I recommend you read through the post and its addendum.


Andrew Ng released CheXNet yesterday on ArXiv (citation) and promoted it with a tweet which caused a bit of a stir on the internet and related radiology social media sites like Aunt Minnie.  Before Radiologists throw away their board certifications and look for jobs as Uber drivers, a few comments on what this does and does not do.

First off, from the Machine Learning perspective, methodologies check out.  It uses a 121 layer DenseNet, which is a powerful convolutional neural network.  While code has not yet been provided, the DenseNet seems similar to code repositories online where 121 layers are a pre-made format.  80/20 split for Training/Validation seems pretty reasonable (from my friend, Kirk Borne), Random initialization, minibatches of 16 w/oversampling positive classes, and a progressively decaying validation loss are utilized, all of which are pretty standard.  Class activation mappings are used to visualize areas in the image most indicative of the activated class (in this case, pneumonia).  This is an interesting technique that can be used to provide some human-interpretable insights into the potentially opaque DenseNet.

The last Fully Connected (FC) layer is replaced by a single output (only one class is being tested for – pneumonia) coupled to a sigmoid function (an activation function – see here) to give a probability between 0 and 1.   Again, pretty standard for a binary classification.  The multiclass portion of the study was performed seperately/later.

The test portion of the study was 420 Chest X-rays read by four radiologists, one of whom was a thoracic specialist.  They could choose between the 14 pathologies in the ChestX-ray14 dataset, read blind without any clinical data.

So, a ROC curve was created, showing three radiologists similar to each other, and one outlier.The radiologists lie slightly under the ROC curve of the CheXNet classifier.  But, a miss is as good as a mile, so the claims of at or above radiologist performance are accurate, because math.  As Luke Oakden Rayner points out, this would probably not pass statistical muster.

So that’s the study.  Now, I will pick some bones with the study.

First, only including one thoracic radiologist is relevant, if you are going to make ground truth agreement of 3 out of four radiologists.  General radiologists will be less specific than specialist radiologists, and that is one of the reasons why we have moved to specialty-specific reads over the last 20 years.  If the three general rads disagreed with the thoracic rad, the thoracic rad’s ground truth would be discarded.  Think about this – you would take the word of the generalist over the specialist, despite greater training.  Even Google didn’t do this in their retinal machine learning paper.  Instead, Google used their three retinal specialists as ground truth and then looked at how the non-specialty opthalmologists were able to evaluate that data and what it meant to the training dataset.  (Thanks, Melody!)  Nevertheless, all rads lie reasonably along the same ROC curve, so methodologically it checks out.

Second, the Wang ChestXray14 dataset is a dataset that was data-mined from NIH radiology reports.  This means that for the dataset, ground truth was whatever the radiologists said it was.  I’m not casting aspersions on the NIH radiologists, as I am sure they are pretty good.  I’m simply saying that the dataset’s ground truth is what it says it is, not necessarily what the patient’s clinical condition was.  As proof of that, here are a few cells from the findings field on this dataset.

Findings field from the ChestX-ray14 dataset (representative)

In any case, the NIH radiologists more than a few times perhaps couldn’t tell either, or identified one finding as the cause of the other (Infiltrate & Pneumonia mentioned side by side) and at the top you have the three fields “atelectasis” “consolidation” & “Pneumonia” – is this concurrent pneumonia with consolidation with some atelectasis elsewhere, or is it “atelectasis vs consolidation cannot r/o pneumonia” (as radiologists we say these things). While the text miner purports to use several advanced NLP tools to avoid these kinds of problems, in practice it does not seem to do so. (See addendum below)  Dr. Ng, if you read this, I have the utmost respect for you and your team, and I have learned from you.  But I would love to know your rebuttal, and I would urge you to publish those results.  Or perhaps someone should do it for reproducibility purposes.

Finally, I’m bringing up these points not to be a killjoy, but to be balanced.  I think it is important to see this and prevent someone from making a really boneheaded decision of firing their radiologists to put in a computer diagnostic system (not in the US, but elsewhere) and realizing it doesn’t work after spending a vast sum of money on it.  Startups competing in the field who do not have deep healthcare experience need to be aware of potential pitfalls in their product.  I’m saying this because real people could be really hurt and impacted if we don’t manage this transition into AI well.  Maybe all parties involved in medical image analysis should join us in taking the Hippocratic Oath, CEO’s and developers included.

Thanks for reading, and feel free to comment here or on twitter or connect on linkedin to me: @drsxr

Addendum: ChestX-ray14 is based on the ChestX-ray8 database which is described in a paper released on ArXiv by Xiaosong Wang et al. The text mining is based upon a hand-crafted rule-based parser using weak labeling designed to account for “negation & uncertainty”, not merely application of regular expressions. Relationships between multiple labels are expressed, and while labels can stand alone, for the label ‘pneumonia’, the most common associated label is ‘infiltrate’.  A graph showing relationships between the different labels in the dataset is here (from Wang Et Al.)

Label map from the ChestX-ray14 dataset by Wang et. al.

Pneumonia is purple with 2062 cases, and one can see the largest association is with infiltration, then edema and effusion.  A few associations with atelectasis also exist (thinner line).

The dataset methodology claims to account for these issues at up to 90% precision reported in ChestX-ray8, with similar precision inferred in ChestX-ray14.

No Findings (!) from NIH CXR14 dataset
“No Findings”
No Findings (!) from NIH CXR14 Dataset
“No Findings”

However, expert review of the dataset (ChestX-ray14) does not support this.  In fact, there are significant concerns that the labeling of the dataset is a good deal weaker.  I’ll just pick out two examples above that show a patient likely post R lobectomy with attendant findings classified as “No Findings” and the lateral chest X-ray which doesn’t even belong in the study database of all PA and AP films.  These sorts of findings aren’t isolated – Dr. Luke Oakden-Rayner addresses this extensively in this post, from which his own observations are garnered below:

Sampled PPV for ChestX-Ray14 dataset vs reported
Dr. Luke Oakden Rayner’s own Positive Predictive Value on visual inspection of 130 images vs reported

His final judgment is that the ChestX-ray14 dataset is not fit for training medical AI systems to do diagnostic work.  He makes a compelling argument, but I think it is primarily a labelling problem, where the proposed 90% acccuracy on the NLP data mining techniques of Wang et al does not hold up.  ChestX-ray14 is a useful dataset for the images alone, but the labels are suspect.  I would call upon the NIH group to address this and learn from this experience.  In that light, I am surprised that the system did not do a great deal better than the human radiologists involved in Dr. Ng’s group’s study, and I don’t really have a good explanation for it.

Copyright © 2017

What’s up with N2Value -tying up loose ends

Dora Mitsonia - CC license

It’s been almost a year since my last long-form article. Of course, ‘busyness’ in real life and blog writing are inversely proportional! I’ve been focused on real-life advances; namely neural networks, machine learning, and machine intelligence which fall loosely under the colloquial misnomer of “A.I.”

After a deep dive into machine learning, it is contemporaneously unexpectedly simple and deceptively difficult. The technical hurdles are significant, but improving – math skills ease the conceptual framework, but without the programming chops, practical application is tougher. Worse, the IT task of getting multiple languages, packages, and pieces of hardware to work together well is daunting. Getting the venerable MNIST to work on your computer with your GPU might be a weekend project – or worse. I’m not a ‘gamer’, so for the last decade it has been hard for me to get excited about increasing CPU clock speeds, faster DRAM, and faster GPU flops. Like many, I’ve been happy to use OSX on increasingly venerable Mac products – works fine for my purposes.

But since Alexnet’s publication in 2014, the explosion in both theory and application in machine learning has made me sit up and take notice. The Imagenet Large Scale Visual Recognition Challenge top-5 classification error rate was only 2.7% in latest competition held a few days ago in July 2017. That’s up from 30%+ error rates only four years ago. And my current hardware isn’t up to that task.

So, count me in. Certainly AI will be used in healthcare, but in what manner and to what extent still to be worked out. Pioneer firms like Arterys and Zebra Medical Vision, brave uncharted regulatory waters, watched closely by AI startups with similar dreams.

So, while I’d like to talk more about AI, I’m not sure that N2Value is the right place to do it. N2Value is primarily a healthcare thought leadership blog, promoting an evolution from Six Sigma methodology into more robust management practices which incorporate systems theory, focus on appropriately chosen metrics, model patient populations and likely outcomes and thereby successfully implement profitable value-based care. Caveat: with current US politics, it is very difficult to predict healthcare policy’s direction.

So, in the near future, I will decide what the scope of N2Value is to be going forward. I thank my loyal readers & subscribers who have given me 5 digit page views over the short life of the blog – far more than I ever expected! The blog has been a labor of love, but I’m pretty sure that AI algorithms have a place in healthcare management. However, I am not sure if you want to hear me opine on which version of convolutional neural network works better with or without LSTM added here, so stay tuned!

I have a few topics I have eluded to which I would like to mention quickly as stubs – they may or may not be expanded in the future.

STUB: What Healthcare can learn from Wall Street.

The main point of this series was to document the chronological implications of advances in computing technology on a leading industry (finance), to describe the likely similar path of a lagging industry (healthcare). I never was able to find the statistics on Wall Street employment I was seeking, which would document a declining number of workers, while documenting higher productivity and profitability per employee as IT advances allowed for super-empowerment of individuals.

Additionally, it raised issues regarding technology in B2B relationships that are adversarial. Much like Insurer-Hospital or Hospital-Doctor. If I have time, I’d like to rewrite this series. It was when I first began blogging and it is a bit rough.

STUB: The Measure is the Metric

One of my favorite articles (with its siblings), this subject was addressed much more eloquently on the Ribbonfarm blog by David Manheim in Goodhart’s Law and why measurement is Hard. If anything, after reading that essay, you will have sympathy for the metrics-oriented manager and be convinced that nothing they can do is right. I firmly believe that metrics should be designed to the task at hand, and then once achieved, monitored for a while but not dogmatically so. Better to target new and improved metrics than enforce institutional petrification ‘by the numbers.’

STUB: Value as Risk Series

I perceive the only way for value based care to be long-term profitable/successful is for large-scale vertical integration by a large Enterprise Health Institution (EHI) across the care spectrum. Hospital acquires Clinics, Practices, and Doctors, quantifies its covered lives, and then with better analytics than the insurers, capitates, ultimately contracting directly with employers & individuals. The insurers become redundant – and the Vertically Integrated Enterprise saves on economies of scale. It provides care in the most cost effective manner possible & closes beds, relying instead on telehealth, m health apps & predictive algorithms, and innovative care delivery.

When the Hospital’s profitability model resembles the insurer’s, and it is beholden only to itself (capitated payments are all there is), something fascinating happens. No longer does it matter if there is an ICD-10/HOPPS/CPT/DRG code for a procedure. The entity is no longer beholden to the rules of payment, and can internally innovate. A successful vertically integrated enterprise will – and quickly. While there will have to be appropriate regulatory oversight to prevent patient abuse, profiteering, or attempts to financialize the model; adjusting capitation with incentive payments for real measures of quality (not proxies) will prompt compliance and improved care.

Writing as a physician, this arrangement may or may not commoditize care further. Concerns about standardization of care are probably overstated, as the first CDS tool more accurate than a physician will standardize care to that model anyway! From an administrator’s perspective, it is a no-brainer to deliver care in an innovative manner that circumvents existing stumbling blocks. From a patient’s perspective, while I prefer easy access to a physician, maintaining that access is becoming unaffordable, let alone then utilizing health care! At some point, the economic pain will be so high that patients will want alternatives they can afford. Whether that means mid-levels or AI algorithms only time will tell.

STUB: Data Science and Radiology

I really like the concept I began here with data visualization in five dimensions. Could this be a helpful additional tool to AI research like Tensorboard? I’m thinking about eventually writing a paper on this one.

STUB: Developing the Care Model

The concept of treating a care model like an equation is what got me started on all this – describing a system as a mathematical model seemed like such a good idea – but required learning on my part. That, and the effects thereof, are still ongoing. At the time of the writing, the solution appeared daunting & I “put the project on the back burner (i.e. abandoned it)” as I couldn’t make it work. Of course, with advancing tools and algorithms well suited to evaluation of this task, I might rexamine this soon.

Where does risk create value for a hospital? (Value as Risk series post #3)

towers1Let’s turn to the hospital side.

For where I develop the concept of value as risk management go here 1st, and where I discuss the value in risk management from an insurer’s perspective click here 2nd.

The hospital is an anxious place – old fat fee-for-service margins are shrinking, and major rule set changes keep coming. To manage revenue cycles requires committing staff resources (overhead) to compliance related functions, further shrinking margin. More importantly, resource commitment postpones other potential initiatives. Maintaining compliance with Meaningful Use (MU) 3 cum MACRA, PQRS, ICD-10 (11?) and other mandated initiatives while dealing with ongoing reviews Read more

Some reflections on the ongoing shift from volume to value

As an intuitive and inductive thinker, I often use facts to prove or disprove my biases. This may make me a poor researcher, though I believe I would have been popular in circa 1200 academic circles. Serendipity plays a role; yes I’m a big Nassim Taleb fan – sometimes in the seeking, unexpected answers appear. Luckily, I’m correct more often than not. But honestly – in predicting widely you miss more widely.

One of my early mentors from Wall St. addressed this with me in the infancy of my career – take Babe Ruth’s batting average of .342 . This meant that two out of three times at bat, Babe Ruth struck out. However, he was trying to hit home runs. There is a big difference between being a base hit player and a home run hitter. What stakes are you playing for?

With that said, this Blog is for exploring topics I find of interest pertaining mostly to healthcare and technology. The blog has been less active lately, not only due to my own busy personal life (!) but also because I have sought more up-to-date information about advancing trends in both the healthcare payment sector and the IT/Tech sector as it applies to medicine. I’m also diving deeper into Radiology and Imaging. As I’ve gone through my data science growth phase, I’ll probably blog less on that topic except as it pertains to machine learning.

The evolution of the volume to value transition is ongoing as many providers are beginning to be subject to at least a degree of ‘at-risk’ payment. Stages of ‘at-risk’ payment have been well characterized – this slide by Jacque Sokolov MD at SSB solutions is representative:

Sokolove - SSB solutions slide 1

In 2015, approximately 20% of medicare spend was value-based, with CMS’s goal 50% by 2020. Currently providers are ‘testing the waters’ with <20% of providers accepting over 40% risk-based payments (c.f. Kimberly White MBA, Numerof & Associates). Obviously the more successful of these will be larger, more data-rich and data-utilizing providers.

However, all is not well in the value-based-payment world. In fact, this year United Health Care announced it is pulling its insurance products out of most of the ACA exchange marketplaces. While UHC products were a small share of the exchanges, it sends a powerful message when a major insurer declines to participate. Recall most ACO’s (~75%) did not produce cost savings in 2014, although more recent data was more encouraging (c.f. Sokolov).   Notably, out of the 32 Pioneer ACO’s that started, only 9 are left (30%) (ref. CMS). The road to value is not a certain path at all.

So, with these things in mind, how do we negotiate the waters? Specifically, as radiologists, how do we manage the shift from volume to value, and what does it mean for us? How is value defined for Radiology? What is it not? Value is NOT what most people think it is. I define value as: the cost savings arising from the assumption and management of risk. We’ll explore this in my next post.

Catching up with the “What medicine can learn from Wall St. ” Series

The “What medicine can learn from Wall Street” series is getting a bit voluminous, so here’s a quick recap of where we are up to so far:

Part 1 – History of analytics – a broad overview which reviews the lagged growth of analytics driven by increasing computational power.

Part 2 – Evolution of data analysis – correlates specific computing developments with analytic methods and discusses pitfalls.

Part 3 – The dynamics of time – compares and contrasts the opposite roles and effects of time in medicine and trading.

Part 4 – Portfolio management and complex systems – lessons learned from complex systems management that apply to healthcare.

Part 5 – RCM, predictive analytics, and competing algorithms – develops the concept of competing algorithms.

Part 6 – Systems are algorithms – discusses ensembling in analytics and relates operations to software.


What are the main themes of the series?

1.  That healthcare lags behind wall street in computation, efficiency, and productivity; and that we can learn where healthcare is going by studying Wall Street.

2.  That increasing computational power allows for more accurate analytics, with a lag.  This shows up first in descriptive analytics, then allows for predictive analytics.

3.  That overfitting data and faulty analysis can be dangerous and lead to unwanted effects.

4.  That time is a friend in medicine, and an enemy on Wall Street.

5.  That complex systems behave complexly, and modifying a sub-process without considering its effect upon other processes may have “unintended consequences.”

6.  That we compete through systems and processes – and ignore that at our peril as the better algorithm wins.

7.  That systems are algorithms – whether soft or hard coded – and we can ensemble our algorithms to make them better.


Where are we going from here?

– A look at employment trends on Wall Street over the last 40 years and what it means for healthcare.

– More emphasis on the evolution from descriptive analytics to predictive analytics to proscriptive analytics.

– A discussion for management on how analytics and operations can interface with finance and care delivery to increase competitiveness of a hospital system.

– Finally, tying it all together and looking towards the future.


All the best to you and yours and great wishes for 2016!



Black Swans, Antifragility, Six Sigma and Healthcare Operations – What medicine can learn from Wall St Part 7

Black Swans, Antifragility, Six Sigma and Healthcare Operations – What medicine can learn from Wall St Part 7


I am an admirer of Nicholas Nassim Taleb – a mercurial options trader who has evolved into a philosopher-mathematician.  The focus of his work is on the effects of randomness, how we sometimes mistake randomness for predictable change, and fail to prepare for randomness by excluding outliers in statistics and decision making.  These “black swans” arise unpredictably and cause great harm, amplified by systems that have put into place which are ‘fragile’.

Perhaps the best example of a black swan event is the period of financial uncertainty we have lived through during the last decade.  A quick recap: the 1998 global financial crisis was caused by a bubble in US real estate assets.  This in turn from legislation mandating lower lending standards and facilitating securitization of these loans combining with lower lending standards (subprime, Alt-A) allowed by the proverbial passing of the ‘hot potato’.  These mortgages were packaged into derivatives named collateralized debt obligations (CDO’s), using statistical models to gauge default risks in these loans.  Loans more likely to default were blended with loans less likely to default, yielding an overall package that was statistically unlikely to default.  However, as owners of these securities found out, the statistical models that made them unlikely to default were based on a small sample period in which there were low defaults.  The models indicated that the financial crisis was a 25-sigma (standard deviations) event that should only happen once in:

Lots of Zeroesyears. (c.f.wolfram alpha)

Of course, the default events happened in the first five years of their existence, proving that calculation woefully inadequate.

The problem with major black swans is that they are sufficiently rare and impactful enough that it is difficult to plan for them.  Global Pandemics, the Fukushima Reactor accident, and the like.  By designing robust systems, expecting system perturbations, you can mitigate their effects when they occur and shake off the more frequent minor black (grey) swans – system perturbations that occur occasionally (but more often than you expect); 5-10 sigma events that are not devastating but disruptive (like local disease outbreaks or power outages).

Taleb classifies how things react to randomness into three categories: Fragile, Robust, and Anti-Fragile.  While the interested would benefit from reading the original work, here is a brief summary:

1.     The Fragile consists of things that hate, or break, from randomness.  Think about tightly controlled processes, just-in-time delivery, tightly scheduled areas like the OR when cases are delayed or extended, etc…
2.     The Robust consists of things that resist randomness and try not to change.  Think about warehousing inventories, overstaffing to mitigate surges in demand, checklists and standard order sets, etc…
3.     The Anti-Fragile consists of things that love randomness and improve with serendipity.  Think about cross-trained floater employees, serendipitous CEO-employee hallway meetings, lunchroom physician-physician interactions where the patient benefits.

In thinking about FragileRobustAnti-Fragile, be cautious about injecting bias into meaning.  After all, we tend to avoid breakable objects, preferring things that are hardy or robust.  So, there is a natural tendency to consider fragility ‘bad’, robustness ‘good’ and anti-fragility must be therefore be ‘great!’  Not true – when we approach these categories from an operational or administrative viewpoint.

Fragile processes and systems are those prone to breaking. They hate variation and randomness and respond well to six-sigma analyses and productivity/quality improvement.  I believe that fragile systems and processes are those that will benefit the most from automation & technology.  Removing human input & interference decreases cycle time and defects.  While the fragile may be prone to breaking, that is not necessarily bad.  Think of the new entrepreneur’s mantra – ‘fail fast’.  Agile/SCRUM development, most common in software (but perhaps useful in Healthcare?) relies on rapid iteration to adapt to a moving target.scrum.jpg   Fragile systems and processes cannot be avoided – instead they should be highly optimized with the least human involvement.  These need careful monitoring (daily? hourly?) to detect failure, at which point a ready team can swoop in, fix whatever has caused the breakage, re-optimize if necessary, and restore the system to functionality.  If a fragile process breaks too frequently and causes significant resultant disruption, it probably should be made into a Robust one.

Robust systems and processes are those that resist failure due to redundancy and relative waste.  These probably are your ‘mission critical’ ones where some variation in the input is expected, but there is a need to produce a standardized output.  From time to time your ER is overcome by more patients than available beds, so you create a second holding area for less-acute cases or patients who are waiting transfers/tests.  This keeps your ER from shutting down.  While it can be wasteful to run this area when the ER is at half-capacity, the waste is tolerable vs. the lost revenue and reputation of patients leaving your ER for your competitor’s ER or the litigation cost of a patient expiring in the ER after waiting 8 hours.    The redundant patient histories of physicians, nurses & medical students serve a similar purpose – increasing diagnostic accuracy.  Only when additional critical information is volunteered to one but not the other is it a useful practice.  Attempting to tightly manage robust processes may either be a waste of time, or turn a robust process into a fragile one by depriving it of sufficient resilience – essentially creating a bottleneck.  I suspect that robust processes can be optimized to the first or second sigma – but no more.

Anti-fragile processes and systems benefit from randomness, serendipity, and variability.  I believe that many of these are human-centric.  The automated process that breaks is fragile, but the team that swoops in to repair it – they’re anti-fragile.  The CEO wandering the halls to speak to his or her front-line employees four or five levels down the organizational tree for information – anti-fragile.  Clinicians that practice ‘high-touch’ medicine result in good feelings towards the hospital and the unexpected high-upside multi-million dollar bequest of a grateful donor 20 years later – that’s very anti-fragile.  It is important to consider that while anti-fragile elements can exist at any level, I suspect that more of them are present at higher-level executive and professional roles in the healthcare delivery environment.  It should be considered that automating or tightly managing anti-fragile systems and processes will likely make them LESS productive and efficient.  Would the bequest have happened if that physician was tasked and bonused to spend only 5.5 minutes per patient encounter?  Six sigma management here will cause the opposite of the desired results.

I think a lot more can be written on this subject, particularly from an operational standpoint.   Systems and processes in healthcare can be labeled fragile, robust, or anti-fragile as defined above.  Fragile components should have human input reduced to the bare minimum possible, then optimize the heck out of these systems.  Expect them to break – but that’s OK – have a plan & team ready for dealing with it, fix it fast, and re-optimize until the next failure.  Robust systems should undergo some optimization, and have some resilience or redundancy also built in – and then left the heck alone!  Anti-fragile systems should focus on people and great caution should be used in not only optimization, but the metrics used to manage these systems – lest you take an anti-fragile process, force it into a fragile paradigm, and cause failure of that system and process.  It is the medical equivalent of forcing a square peg into a round hole.  I suspect that when an anti-fragile process fails, this is why.

Follow up to “The Etiquette of Help”

c.f. mark ong at ganyfd.com
Superior Mesenteric Angiogram demonstrating a right colonic bleed.

I came across this wonderful piece by Bruce Davis MD on Physician’s Weekly.com about “The Etiquette of Help”. How do you help a colleague emergently in a surgical procedure where things go wrong? As proceduralists, we are always cognizant that this is a possibility.

“Any Surgeon to OR 6 STAT. Any Surgeon to OR 6 STAT.


No surgeon wants to hear or respond to a call like that. It means someone is in deep kimchee and needs help right away.”


I was called about an acute lower GI bleed with a strongly positive bleeding scan. I practice in a resort area, and an extended family had come here with their patriarch, a man in his late 50’s. (Identifying details changed/withheld – image above is NOT from this case). He had been feeling woozy in the hot sun, went to the men’s room, evacuated a substantial amount of blood, and collapsed.


As an interventional radiologist, I was asked to perform an angiogram and embolize the bleeder if possible. The patient was brought to the cath lab; I gained access to the right femoral artery, and then consecutively selected the celiac, superior mesenteric, and inferior mesenteric arteries to evaluate abdominal blood supply. The briskly bleeding vessel was identifiable in the right colonic distribution as an end branch off the ileocolic artery. I guided my catheter, and then threaded a smaller micro-catheter through it, towards the vessel that was bleeding.


When you embolize a vessel, you are cutting off blood flow. Close off too large a region, and the bowel will die. Also, collateral vessels in the colon will resupply the bleeding vessel, so you have to be precise.


Advancing a microcatheter under fluoroscopy to an end vessel is slow, painstaking work requiring multiple wire exchanges and contrast injections. After one injection, I asked my assisting scrub tech to hand me back the wire.

“Sir, I’m sorry. I dropped the wire on the floor.”

“That’s OK. Just open up another one.”

“Sir, I’m sorry. That was the last one in the hospital.”

“There’s an art to coming in to help a colleague in trouble. Most of us have been in that situation, both giving and receiving help. A scheduled case that goes bad is different from a trauma. In trauma, you expect the worst. Your thinking and expectations are already looking for trouble. In a routine case, trouble is an unwelcome surprise, and even an experienced surgeon may have difficulty shifting from routine to crisis mode.”


We inquired how quickly we could get another wire. It would take hours, if we were lucky. The patient was still actively bleeding and requiring increasing fluid and blood support to maintain pressure. After a few creative attempts at solving this problem, it was clear that it was not going to be solved by me, today, in that room. It was time to pull the trigger and make the call the interventionalist dreads – the call to the surgeon.


The general surgeon came down to the angio suite and I explained what was happening. I marked the bowel with a dye to assist him in surgery, and sent the patient with him to the OR. The patient was operated on within 30 minutes from leaving my cath lab, and OR time was perhaps 45 minutes. After the procedure was done the surgeon remarked to me that it was one of the easiest resections ever, as he knew exactly where to go from my work.  The surgeon never said anything negative to me, and we had a very good working relationship thereafter.

“The first thing to remember when stepping into a bad situation is that you are the cavalry. You didn’t create the situation, and recriminations and blame have no place in the room. You need to be the calm center to a storm that started before you got involved. Sometimes that’s all that is needed. A fresh perspective, a few focused questions, and the operating surgeon can calm down and get back on track.”


I saw the patient the next day, sitting up with a large smile on his face. He explained to me how happy he was that he had come here for vacation, that it was the trip of a lifetime for him, and that he was looking forward to attending his youngest daughter’s wedding later that year. He told me he lived in a rural Midwest area, hours from a very small hospital without an interventionalist, and if this had happened at home, well, who knows?


If I had not objectively assessed my inability to finish the case because of equipment issues, well, who knows?


If I had been prideful and unwilling to accept my limitations at that time, well, who knows?


If I had been more concerned with my reputation or what my partners would think, well, who knows?


I sincerely hope that my patient has enjoyed many years of happiness with his family in his bucolic rural Midwestern home. I will never see him again, but I do think of him from time to time.

The danger of choosing the wrong metric : The VA Scandal

The Veteran’s affair scandal has been newsworthy lately.  The facts about the VA scandal will be forthcoming in August, but David Brooks made some smart inferences back on May 16th on NPR’s Week In Politics:

BROOKS: Yeah, he’s (Shinkseki) in hot water. He’s been there since the beginning. So I don’t know if I’d necessarily want to bet on him. But, you know, I do have some sympathy for the VA. It’s obviously not a good thing to doctor and cook the books, but you – there is a certain fundamental reality here, which is the number of primary care visits over the last three years at this place rose 50 percent. The number of primary care physicians rose nine percent.
And so there’s just a backlog, and if you put a sort of standard in place that you have to see everybody in 14 days but you don’t provide enough physicians to actually do that, well, people are going to start cheating. And so there is a more fundamental problem here than just the cheating.

An administrative failure was made by mandating patients be seen within 14 days but not providing the staffing capabilities to do so.  The rule designed to promote a high level of care had ‘unintended consequences.’  However, I do have some sympathy for an institution which depends on procurement from congress for funding in a political process where funds can be yanked, redistributed, or earmarked based on political priorities.

More concerning, multiple centers may have been complicit with the impossibility of fulfilling the mandate, and whistleblowers were actively retaliated against.

I need to disclaim here that I both trained and worked at the VA as a physician.  I have tremendous respect for the veterans who seek care there, and I had great pride working there and in being in a place to give service to these men and women who gave service to us.  The level of care in the VA system is generally thought to be good, by myself and others.

As I’ve written before in The Measure is the Metric and Productivity in Medicine – what’s real and what’s fake?, the selection of metrics is important because those metrics will be followed by the organization, particularly if performance evaluations and bonuses are tied to the metrics.  Ben Horowitz, partner at Andreessen Horowitz, astutely notes the following from his experience as CEO at Opsware and an employee at HP (1):

At a basic level, metrics are incentives.  By measuring quality, features, and schedule and discussing them at every staff meeting, my people focused intensely on those metrics to the exclusion of other goals.  The metrics did not describe the real goals and I distracted the team as a result.

And if he didn’t get the point across clearly enough (2):

Some things that you will want to encourage will be quantifiable, and some will not.  If you report on the quantitative goals and ignore the qualitative onces, you won’t get the qualitative goals, which may be the most important ones.  Management purely by numbers is sort of like painting by numbers – it’s strictly for amateurs.
At HP, the company wanted high earnings now and in the future.  By focusing entirely on the numbers, HP got them now by sacrificing the future…
By managing the organization as though it were a black box, some divisions at HP optimized the present at the expense of their downstream competitiveness.  The company rewarded managers for achieving short-term objectives in a manner that was bad for the company.  It would have been better to take into account the white box.  The white box goes beyond the numbers and gets into how the organization produced the numbers.  It penalizes managers who sacrifice the future for the short-term and rewards those who invest in the future even if that investment cannot be easily measured.

I’ll have to wait until the official report on the VA scandal is released before commenting on why the failure occurred.  However, it does seem to me as a case of failure of the black box, as Ben Horowitz explained so adeptly.  His writing is recommended.

1.  Ben Horowitz, The Hard Thing about Hard Things, HarperCollins 2014, p.132

2. IBID p.132-133


A conversation with Farzad Mostashari MD

I participated in a webinar with Farzad Mostashari MD, scM, former director of the ONC (Office of the National Coordinator for Health IT)  sponsored by the data analytics firm Wellcentive   He is now a visiting fellow at the Brookings Institution.  Farzad spoke on points made in a recent article in the American Journal of Accountable Care, Four Key Competencies for Physician-led Accountable Care Organizations.  

The hour-and-a-half format lent itself well to a Q&A format, and basically turned into a small group consulting session with this very knowledgeable policy leader!  

1.  Risk Stratification.  Begin using the EHR data by ‘hot spotting.’  Hot spotting refers to a technique of identifying outliers in medical care and evaluating these outliers to find out why they are consuming resources significantly beyond that of the average.  The Oliver Wyman folks wrote a great white paper that references Dr. Jeffrey Brenner of the Camden Coalition who identified the 1% of Medicaid patients responsible for 30% of the city’s medical costs.  Farzad suggests that data mining should go further and “identify populations of ‘susceptibles’ with patterns of behavior that indicate impending clinical decomposition & lack of resilience.”   He further suggests that we go beyond a insurance-like “risk score” to understand how and why these patients fail, and then apply targeted interventions to prevent susceptibles from failing and over utilizing healthcare resources in the process.  My takeaway from this is in the transition from volume to value, bundled payments and ACO style payments will incentivize physicians to share and manage this risk, transferring a role onto them traditionally filled only by insurers.

2.  Network Management.  Data mining the EHR enables organizations to look at provider and resource utilization within a network.  (c.f. the recent Medicare physician payments data release).  By analyzing this data, referral management can be performed.   By sending patients specifically to those providers who have the best outcomes / lowest costs for that disease, the ACO or insurer can meet shared savings goals.  This would help to also prevent over-utilization – by changing existing referral patterns and excluding those providers who always choose the highest-cost option for care (c.f. the recent medicare payment data for ophthalmologists performing intraocular drug injections – wide variation in costs).  This IS happening – Aetna’s CEO Mark Bertolini, said so specifically during his HIMSS 2014 keynote.   To my understanding, network analysis is mathematically difficult (think eigenfunctions, eigenvalues, and linear algebra) – but that won’t stop a determined implementer from it (it didn’t stop Facebook, Google, or Twitter).  Also included in this topic was workflow management, which is sorely broken in current EHR implementations, clinical decision support tools (like ACRSelect), and traditional six sigma process analytics.

3.  ADT Management.  This was something new.  Using the admission/discharge/transfer data from the HL7 data feed, you could ‘push’ that data to regional health systems.  It achieves a useful degree of data exchange not currently present without a regional data exchange.   Patients who bounce from one ER to the next could be identified this way.  Its also useful to push to the primary care doctors (PCP) managing those patients.  Today, where PCP’s function almost exclusively on an outpatient basis and hospitalists manage the patient while in the hospital, the PCP often doesn’t know about a patient’s hospitalization until they present to the office.  Follow-up care in the first week after hospitalization may help to prevent readmissions. According to Farzad, there is a financial incentive to do so – a discharge alert can enable a primary care practice to ensure that every discharged patient has a telephone follow-up within 48 hours and an office visit within 7 days which would qualify for a $250 “transition in care” payment from Medicare.  (aside – I wasn’t aware of this. I’m not a PCP, and I would carefully check medicare billing criteria closely for eligibility conditions before implementing, as consequences could be severe.  Don’t just take my word for it, as I may be misquoting/misunderstanding and medicare billers are ultimately responsible for what they bill for.  This may be limited to ACO’s.  Due your own due diligence)

4.  Patient outreach and engagement.  One business point is that for the ACO to profit, patients must be retained.  Patient satisfaction may be as important to the business model as the interventions the ACO is performing, particularly as the ACO model suggests a shift to up-front costs and back-end recovery through shared savings.  If you as an ACO invest in a patient, to only lose that patient to a competing ACO, you will let your competitor have the benefit of those improvements in care and eat those sunk costs!  To maintain patient satisfaction and engagement, behavioral economics (think Cass Sunstein’s Nudges.gov  paper), gamification (Jane McGonigal ), A/B Testing (Tim Ferriss) marketing techniques.  Basically, we’re applying customer-centric marketing to healthcare, with not only the total lifetime revenue of the patient considered, but also the total lifetime cost!

It was a very worthwhile discussion and thanks to Wellcentive for hosting it!