What Medicine can learn from Wall Street – Part 2 – evolution of data analysis

If you missed the first part, read it  here.  Note: For the HIT crowd reading this, I’ll offer (rough) comparison to the HIMSS stages (1-7).

1.     Descriptive Analytics based upon historical data.

Hand Drawn Chart of the Dow Jones Average
Hand Drawn Chart of the Dow Jones Average

     This was the most basic use of data analysis.   When newspapers printed price data (Open-High-Low-Close or OHLC), that data could be charted (on graph paper!) and interpreted using basic technical analysis, which was mostly lines drawn upon the chart. (1)  Simple formulas such as year-over-year (YOY) percentage returns could be calculated by hand. This information was merely descriptive and had no bearing upon future events.  To get information into a computer required data entry by hand, and operator errors could throw off the accuracy of your data.  Computers lived in the accounting department, with the data being used to record position and profit and loss (P&L).   At month’s end a large run of data would produce a computer-generated accounting statement.dot matrix
     A good analogue to this system would be older laboratory reporting systems where laboratory test values were sent to a dedicated lab computer.  If the test equipment interfaced with the computer (via IEEE-488 & RS-232 interfaces) the values were sent automatically.  If not, data entry clerks had to enter these values.  Once in the system, data could be accessed by terminals throughout the hospital.  Normal ranges were typically included, with an asterisk indicating the value was abnormal.  The computer database would be updated once a day (end of day type data).  For more rapid results, you would have to go to the lab yourself and ask.  On the operations side, a Lotus 1-2-3 spreadsheet on the finance team’s computer of  quarterly charges, accounts receivable, and perhaps a few very basic metrics would be available to the finance department and CEO for periodic review.  
     For years, this delayed, descriptive data was the standard.  Any inference would be provided by humans alone, who connected the dots.  A rough equivalent would be HIMSS stage 0-1.

2.     Improvements in graphics, computing speed, storage, connectivity.

     Improvements in processing speed & power (after Moore’s Law), cheapening memory and storage prices, and improved device connectivity resulted in more readily available data.  Near real-time price data was available, but relatively expensive ($400 per month or more per exchange with dedicated hardware necessary for receipt – a full vendor package could readily run thousands of dollars a month from a low cost competitior, and much more if you were a full service institution).    An IBM PC XT of enough computing power & storage ($3000) could now chart this data.  The studies that Ed Seykota ran on weekends would run on the PC – but analysis was still manual. The trader would have to sort through hundreds of ‘runs’ of the data to find the combination of parameters which led to the most profitable (successful) strategies, and then apply them to the market going forward.  More complex statistics could be calculated – such as Sharpe Ratios, CAGR, and maximum drawdown – and these were developed and diffused over time into wider usage.  Complex financial products such as options could now be priced more accurately in near-real time with algorithmic advances (such as the binomial pricing model).
     The health care corollary would be in-house early electronic record systems tied in to the hospital’s billing system.  Some patient data was present, but in siloed databases with limited connectivity.  To actually use the data you would ask IT for a data dump which would then be uploaded into Excel for basic analysis.  Data would come from different systems and combining it was challenging.  Because of the difficulty in curating the data (think massive spreadsheets with pivot tables), this could be a full-time job for an analyst or team of analysts, and careful selection of what data was being followed and what was discarded would need to be considered, a priori.  The quality of the analysis improved, but was still human labor intensive, particularly because of large data sets & difficulty in collecting the information.  For analytic tools think Excel by Microsoft or Minitab.
     This corresponds to HIMSS stage 2-3.

3.     Further improvement in technology correlates with algorithmic improvement.Pretty
     With new levels of computing power, analysis of data became quick and relatively cheap allowing automated analysis.  Taking the same data set of computed results from price/time data that was analyzed by hand before; now apply an automated algorithm to run through ALL possible combinations of included parameters.  This is brute-force optimization.   The best solve for the data set is found, and a trader is more confident that the model will be profitable going forward.
ACTV5MA     For example, consider ACTV(2).  Running a brute force optimization on this security with a moving average over the last 2 years yields a profitable trading strategy that returns 117% with the ideal solve.  Well, on paper that looks great.  What could be done to make it even MORE profitable?  Perhaps you could add a stop loss.  Do another optimization and theoretical return increases.  Want more?  Sure.  Change the indicator and re-optimize.  Now your hypothetical return soars.  Why would you ever want to do anything else? (3,4)
     But it’s not as easy as it sounds.  The best of the optimized models would work for a while, and then stop.  The worst would immediately diverge and lose money from day 1 – never recovering.  Most importantly : what did we learn from this experienceWe learned that how the models were developed matteredAnd to understand this, we need to go into a bit of math.
    Looking at security prices, you can model (approximate) the price activity as a function, F(X)= the squiggles of a chart.  The model can be as complex or simple as desired.  Above, we start with a simple model (the moving average), and make it progressively more complex adding additional rules and conditions.  As we do so, the accuracy of the model increases, so the profitability increases as well.  However, as we increase the accuracy of the model, we use up degrees of freedom, making the model more rigid and less resilient.
     Hence the system trader’s curse – everything works great on paper, but when applied to the market, the more complex the rules, and the less robustly the data is tested, the more likely the system will fail due to a phenomenon known as over-fitting.  Take a look at the 3D graph below which shows a profitability model of the above analysis:3D optimization
     You will note that there is a spike in profitability using a 5 day moving average at the left of the graph, but profitability sharply falls off after that, rises a bit, and then craters.  There is a much broader plateau of profitability in the middle of the graph, where many values are consistently and similarly profitable.  Changes in market conditions could quickly invalidate the more profitable 5 day moving average model, but a model with a value chosen in the middle of the chart might be more consistently profitable over time.  While more evaluation would need to be done, the less profitable (but still profitable) model is said to be more ‘Robust’.

To combat this, better statistical sampling methods were utilized, namely cross-validation where an in-sample set is used to test an out-of-sample set for performance.  This gave a system which was less prone to immediate failure, i.e. more robust.  A balance between profitability and robustness can be struck, netting you the sweet spot in the Training vs. Test-set performance curve I’ve posted before.
So why didn’t everyone do this?  Quick answer: they did.  And by everyone analyzing the same data set of end-of-day historical price data in the same way, many people began to reach the same conclusions as each other.  This created an ‘observer effect’ where you had to be first to market to execute your strategy, or trade in a market that was liquid enough (think the S&P 500 index) that the impact of your trade (if you were a small enough trader – doesn’t work for a large institutional trader) would not affect the price.  Classic case of ‘the early bird gets the worm’. here
     The important point is that WE ARE HERE in healthcare.  We have moderately complex computer systems that have been implemented largely due to Meaningful Use concerns, bringing us to between HIMSS stages 4-7.  We are beginning to use the back ends of computer systems to interface with analytic engines for useful descriptive analytics that can be used to inform business and clinical care decisions.  While this data is still largely descriptive, some attempts at predictive analytics have been made.  These are largely proprietary (trade secrets) but I have seen some vendors beginning to offer proprietary models to the healthcare community (hospitals, insurers, related entities) which aim at predictive analytics.  I don’t have specific knowledge of the methods used to create these analytics, but after the experience of Wall Street, I’m pretty certain that a number of them are going to fall into the overfitting trap.  There are other, more complex reasons why these predictive analytics might not work (and conversely, good reasons why they may), which I’ll cover in future posts.  
     One final point – application of predictive analytics to healthcare will succeed in the area where it fails on Wall Street for a specific reason.  On Wall Street, the relationship once discovered and exploited causes the relationship to disappear.  That is the nature of arbitrage – market forces reduce arbitrage opportunities since they represent ‘free money’ and once enough people are doing it, it is no longer profitable.  However, biological organisms don’t response to gaming the system in that manner.  For a conclusive diagnosis, there may exist an efficacious treatment that is consistently reproducible.  In other words, for a particular condition in a particular patient with a particular set of characteristics (age, sex, demographics, disease processes, genetics) if accurately diagnosed and competently executed, we can expect a reproducible biologic response, optimally a total cure of the individual.  And that reproducible response applies to processes present in the complex dynamic systems that comprise our healthcare delivery system.  That is where the opportunity lies in applying predictive analytics to healthcare.

(1) Technical Analysis of Stock Trends, Edwards and Magee, 8th Edition, St. Lucie Press
(2) ACTIVE Technologies, acquired (taken private) by Vista Equity Partners and delisted on 11/15/2013.  You can’t trade this stock.   
(3) Head of Trading, First Chicago Bank, personal communication
(4) Reminder – see the disclaimer for this blog!  And if you think you are going to apply this particular technique to the markets to be the next George Soros, I’ve got a piece of the Brooklyn Bridge to sell you.

The trap of overfitting data – future directions in the blog (2014)

Quick thought for this Monday morning….

Overfitting

I’m taking Trevor Hastie’s and Robert Tibshirani’s fantastic Statistical Learning course online through Stanford.
The slide here is great – and shows the danger of complex models and overfitting.   For the data scientists – cross validation.  If any system traders or finance people are reading, think walk-forward analysis.  

Basically, what the graph says is that when you apply a better and better model with higher levels of refinement to your system, you ‘fit’ your established data more accurately.  However, because the system is more complex, it is more rigid and less flexible (degrees of freedom, anyone?) and less resilient.  Tracks the data better, but works less well in practice.  That’s why the red line starts going up again as the scale of complexity goes from low to high.  Does this resonate with any of the process improvement (PI) people who are six-sigma trained?  Once you scoop up the low-hanging fruit and pass that first or second sigma in iteration, things get tougher.  A lot tougher.  

This is the trap of curve-fitting (also known as overfitting).  More in this case is less, as the model fails to be predictive.  

Where I’m going in this blog is to integrate some of these concepts with resource allocation problems in healthcare; and their resulting effect on patient care.   I’m particularly interested in applying these techniques to predictive analytics in healthcare.  I think we can learn a great deal from the practical applications of these computational statistic tools which have been successfully applied (I know because I started my career doing it) to the markets on Wall St.   What medicine can learn from Wall St. is a topic I intend to cover  in a series of posts. But healthcare is not the markets, and can’t be approached entirely similarly.   The costs of error are catastrophic in terms of lives – it’s not just about money.  

I hope you stay with me as I develop this theme. 

Image from :https://class.stanford.edu/courses/HumanitiesScience/StatLearning/Winter2014/about

The Measure is the Metric

There is a maxim in management circles to use data-rich methods of management.  Peter Drucker is reputed to have said, “What gets measured gets managed.” Clearly better than managing by the hem of one’s skirt (or seat of one’s pants), data-driven management allows for assessment of measured items.
It is interesting to consider the perturbations of this statement:
-if it can be measured, it can be managed (implying causality)
-if it can’t be measured, it can’t be managed (negative causality)
-if it can’t be measured, it doesn’t matter (reductio ad absurbum)
You can pick for yourself where in the spectrum you lie, and how far from Drucker’s original statement you are.
 
But there is another issue in measurement that isn’t as well addressed – the influence that the measure itself has on what is being measured.   This is what is known as an observer effect in Physics – simply measuring perturbs the system.  The Heisenberg uncertainty principal has been cited like this (that’s actually NOT what the Heisenberg says, but that’s beyond the scope of this discussion).
 
So, let’s acknowledge that observation, or measurement changes what is being measured itself.  An observation, or ‘measure’ of X (insert variable here – productivity, speed, outcome, etc..) is performed.  It is compared to a standard, or ‘metric’ is performed.
 
For a process or an person, there may or may not be established standards of measurement.  Therefore, a baseline or initial measurement becomes the metric to compare future measurements against.  As process improvement or skill improvement happens (hopefully), subsequent metrics should improve in both accuracy and value.
 
Let’s consider a human measure and its associated metric.  A manager may wish to evaluate his employees by comparing their productivity to an established range of productivity.  The employee is being measured, and is being compared to a metric.
 
But employees aren’t stupid.   Even if they have not been told that they are being measured, when they see the difference in their performance reviews as compared to their peer’s performance reviews, they figure it out.  And those employees with performance reviews that didn’t sit right with them become more diligent in their work, to achieve a better performance review next time.  Some employees will even figure out that they are being evaluated, and up their game before the performance review.

 

Positive feedback loop for the measure is the metric post.
Positive feedback loop in a simple system

So, by the mere act of being measured, we change what is being measured.  The measure is the metric.

And it shouldn’t be too hard to figure out that WHAT you measure and WHAT you choose to be the metric are more important than you think.

OODA loops – a definition and thoughts on application to healthcare

John Boyd's OODA loop
John Boyd’s OODA loop

 

 

John Boyd was a US military pilot who became a military strategist. His chief contribution to military theory was contained in a large slide presentation and one essay, but his teachings heavily influenced those who train our military commanders and are incorporated into US military strategy and tactics. Boyd characterized the decision-making process as a means of a continuous (iterative) cycle of Observation – Orientation – Decision – and Action. This OODA loop is the mechanism enabling adaption and therefore survival.

The observation process involves data gathering, orientation is analysis and synthesis, decision is determination of course of action, and action is the execution of that decision, with resultant consequence. Boyd felt that any model is incomplete (including our own perception of reality) and must be continuously refined or adapted in the setting of new data. This is in the setting of increasing entropy (disorder, uncertainty) of any system once it perturbs from the initial point, which we perceive incompletely and imprecisely due to our human limitations. There’s actually a not insubstantial amount of philosophic thought in that!

It is easiest to conceive of the OODA loop in the setting of a dogfight between two fighter pilots. Each pilot is feeling out the other, maneuvering in a certain way that will best give them the kill, ensuring their own personal survival. The OODA loop is by definition, reactive. But the victorious pilot will be able to out-think his opponent by leaving his own OODA loop and getting inside his opponent’s OODA loop, and therefore predict what that pilot is likely to do. That ability to be predictive with a reasonable certainty, allows the winning pilot to best his opponent, all other things being similar. (skill, plane, etc..)

OODA loops have been applied successfully in business, sports, and particularly in litigation.

In healthcare, physicians have their own worldview or OODA loop. They observe patients, orient their differential, decide on a diagnosis and treatment, and then act on that treatment and observe their results, trying something else if unsuccessful. More experienced or better clinicians have very well-developed, almost algorithmic, OODA loops. They also are functioning in their environment (the practice, the wards, the OR) where they have very specific, well-developed skills to assess when something is amiss – patient flow, quietness, absence of something as opposed to glaring signs (like alarms, etc…).

You could hardly expect the healthcare administrator to have the same OODA loop. Note the Cultural Traditions in the blue box – very different for both! So when the two interact, so do these loops, and may lead to some unsatisfying conversations if there is no empathy between the administrator and the physician. The physician communicates to the administrator, and the administrator communicates back, but neither is understanding what the other is saying. It probably falls upon the more emotionally intelligent of the two to try to get inside the OODA loop of the other to facilitate a truly constructive conversation. Without practiced understanding, the physician risks being labeled ‘disruptive’ and the administrator risks being thought of as ‘unfair’.

Venkatesh Rao, of the sublime ribbonfarm blog, has an older post on his Tempo book blog about OODA’s backstory. I have ‘borrowed’ his OODA diagram here.

I finally got around to writing the companion update to this post – it concerns OODA loops and AI.  I invite you to read it.