What medicine can learn from Wall Street – Part 3 – The dynamics of time

This a somewhat challenging post with cross-discipline correlations, some unfamiliar terminology, and concepts.  There is a payoff!

You can recap part 1 and part 2 here. 

The crux of this discussion is time.  Understanding the progression towards shorter and shorter time frames on Wall Street enables us to draw parallels and differences in medical care delivery particularly pertaining to processes and data analytics.  This is relevant because some vendors tout real-time capabilities in health care data analysis.  Possibly not as useful as one thinks.

In trading, the best profit one is a risk-less one.  A profit that occurs by simply being present, is reliable, and reproducible, and exposes the trader to no risk.  Meet arbitrage.  Years ago, it was possible for the same security to be trading at different prices on different exchanges as there was no central marketplace.  A network of traders could execute a buy of a stock for $10 in New York, and then sell those same shares on the Los Angeles exchange for $11.  If one imagines a 1000 share transaction, a $1 profit per share yields $1000.  It was made by the head trader holding up two phones to his head and saying ‘buy’ into one and ’sell’ into the other.*   These relationships could be exploited over longer periods of time and represented an information deficit.  However, as more traders learned of them, the opportunities became harder to find as greater numbers pursued them.  This price arbitrage kept prices reasonably similar before centralized, computerized exchanges and data feeds.

As information flow increased, organizations became larger and more effective, and time frames for executing profitable arbitrages decreased.  This led traders to develop simple predictive algorithms, like Ed Seykota did, detailed in part 1.  New instruments re-opened the profit possibility for a window of time, which eventually closed.  The development of futures, options, indexes, all the way to closed exchanges (ICE, etc…) created opportunities for profit which eventually became crowded.  Since the actual arbitrages were mathematically complex (futures have an implied interest rate, options require a solution of multiple partial differential equations, and indexes require summing instantaneously hundreds of separate securities) a computational model was necessary as no individual could compute the required elements quickly enough to profit reliably.  With this realization, it was only a matter of time before automated trading (AT) happened, and evolved into high-frequency trading with its competing algorithms operating without human oversight on millisecond timeframes.

The journey from daily prices to ever shorter prices over the trading day to millisecond prices was driven by availability of good data and reliable computing which could be counted to act on those flash prices.  Once a game of location (geographical arbitrage) turned into a game of speed (competitive pressures on geographical arbitrage) turned into a game of predictive analytics (proprietary trading and trend following) turned into a more complex game of predictive analytics (statistical arbitrage) was then ultimately turned back into a game of speed and location (High frequency trading).

The following chart shows a probability analysis of an ATM straddle position on IBM.  This is an options position.  It is not important to understand the instrument, only to understand what the image shows.  For IBM, the expected variance that exists in price at one standard deviation (+/- 1 s.d.) is plotted in below.  As time (days) increases along the X axis, the expected range widens, or becomes less accurate.

credit: TD Ameritrade
credit: TD Ameritrade

Is there a similar corollary for health care?

Yes, but.

First, recognize the distinction between the simpler price-time data which exists in the markets, vs the rich, complex multivariate data in healthcare.  

Second, assuming a random walk hypothesis , security price movement is unpredictable, and at best can only be calculated so that the next price will be in a range defined by a number of standard deviations according to one’s model as seen above in the picture. You cannot make this argument in healthcare.  This is because the patient’s disease is not a random walk.  Disease follows proscribed pathways and natural histories which allow us to make diagnoses and implement treatment options.

It is instructive to consider Clinical Decision Support tools.  Please note that these tools are not a substitute for expert medical advice (and my mention does not employ endorsement).  See Esagil and diagnosis pro.  If you enter “abdominal pain” into either of the algorithms, you’ll get back a list of 23 differentials (woefully incomplete) in Esagil and 739 differentials (more complete, but too many to be of help) in Diagnosis Pro.  But this is a typical presentation to a physician – a patient complains of “abdominal pain” and the differential must be narrowed.

At the onset, there is a wide differential diagnosis.  The possibility that the pain is a red herring and the patient really has some other, unsuspected, disease must be considered.  While there are a good number of diseases with a pathognomonic presentation, uncommon presentations of common diseases are more frequent than common presentations of rare diseases.

In comparison to the trading analogy above, where expected price movement is generally restricted to a quantifiable range based on the observable statistics of the security over a period of time, for a de novo presentation of a patient, this could be anything, and the range of possibilities is quite large.

Take, for example, a patient that presents to the ER complaining “I don’t feel well.”  When you question them, they tell you that they are having severe chest pain that started an hour and a half ago.  That puts you into the acute chest pain diagnostic tree.

Reverse Tree

With acute chest pain, there is a list of differentials that needs to be excluded (or ‘ruled out’), some quite serious.  A thorough history and physical is done, taking 10-30 minutes.  Initial labs are ordered (5-30 minutes if done in a rapid, in-ER test, longer if sent to the main laboratory) an EKG and CXR (chest X-ray) are done for their speed,(10 minutes for each)  and the patient is sent to CT for a CTA (CT Angiogram) to rule out a PE (Pulmonary embolism).  This is a useful test, because it will not only show the presence or absence of a clot, but will also allow a look at the lungs to exclude pneumonias, effusions, dissections, and malignancies. Estimate that the wait time for the CTA is at least 30 minutes.  

The ER doctor then reviews the results (5 minutes)- troponins are negative, excluding a heart attack (MI), the CT scan eliminated PE, Pneumonia, Dissection, Pneumothorax, Effusion, malignancy in the chest.  The Chest X-Ray excludes fracture.  The normal EKG excludes arrhythmia, gross valvular disease, and pericarditis.   The main diagnoses left are GERD, Pleurisy, referred pain, and anxiety.  ER doctor goes back to the patient (10 minutes) , patient doesn’t appear anxious & no stressors, so panic attack unlikely.  No history of reflux, so GERD unlikely.  No abdominal pain component, and labs were negative, so abdominal pathologies unlikely.  Point tenderness present on the physical exam at the costochondral junction – and the patient is diagnosed with costochondritis.  The patient is then discharged with a prescription for pain control.  (30 minutes).  

Ok, if you’ve stayed with me, here’s the payoff.

As we proceed down the decision tree, the number of possibilities narrows in medicine.

In comparison, price-time data – in which the range of potential prices increase as you proceed forward in time.

So, in healthcare the potential diagnosis narrows as you proceed down the x-axis of time.  Therefore, time is both one’s friend and enemy – friend as it provides for diagnostic and therapeutic interventions which establish the patient’s disease process; enemy as payment models in medicine favor making that diagnostic and treatment process as quick as possible (when a hospital inpatient).

We’ll continue this in part IV and compare it relevance to portfolio trading.

*As an aside, the phones in trading rooms had a switch on the handheld receiver – you would push them in to talk.  That way, the other party would not know that you were conducting an arbitrage!  They were often slammed down and broken by angry traders – one of the manager’s jobs was to keep a supply of extras in his desk, and they were not hard-wired but plugged in by a jack expressly for that purpose! trader's phone

**Yes, for the statisticians reading this, I know that there is an implication of a gaussian distribution that may not be proven.  I would suspect the successful houses have modified for this and have instituted non-parametric models as well.  Again, this is not a trading, medical or financial advice blog.

 

The danger of choosing the wrong metric : The VA Scandal

The Veteran’s affair scandal has been newsworthy lately.  The facts about the VA scandal will be forthcoming in August, but David Brooks made some smart inferences back on May 16th on NPR’s Week In Politics:

BROOKS: Yeah, he’s (Shinkseki) in hot water. He’s been there since the beginning. So I don’t know if I’d necessarily want to bet on him. But, you know, I do have some sympathy for the VA. It’s obviously not a good thing to doctor and cook the books, but you – there is a certain fundamental reality here, which is the number of primary care visits over the last three years at this place rose 50 percent. The number of primary care physicians rose nine percent.
And so there’s just a backlog, and if you put a sort of standard in place that you have to see everybody in 14 days but you don’t provide enough physicians to actually do that, well, people are going to start cheating. And so there is a more fundamental problem here than just the cheating.

An administrative failure was made by mandating patients be seen within 14 days but not providing the staffing capabilities to do so.  The rule designed to promote a high level of care had ‘unintended consequences.’  However, I do have some sympathy for an institution which depends on procurement from congress for funding in a political process where funds can be yanked, redistributed, or earmarked based on political priorities.

More concerning, multiple centers may have been complicit with the impossibility of fulfilling the mandate, and whistleblowers were actively retaliated against.

I need to disclaim here that I both trained and worked at the VA as a physician.  I have tremendous respect for the veterans who seek care there, and I had great pride working there and in being in a place to give service to these men and women who gave service to us.  The level of care in the VA system is generally thought to be good, by myself and others.

As I’ve written before in The Measure is the Metric and Productivity in Medicine – what’s real and what’s fake?, the selection of metrics is important because those metrics will be followed by the organization, particularly if performance evaluations and bonuses are tied to the metrics.  Ben Horowitz, partner at Andreessen Horowitz, astutely notes the following from his experience as CEO at Opsware and an employee at HP (1):

At a basic level, metrics are incentives.  By measuring quality, features, and schedule and discussing them at every staff meeting, my people focused intensely on those metrics to the exclusion of other goals.  The metrics did not describe the real goals and I distracted the team as a result.

And if he didn’t get the point across clearly enough (2):

Some things that you will want to encourage will be quantifiable, and some will not.  If you report on the quantitative goals and ignore the qualitative onces, you won’t get the qualitative goals, which may be the most important ones.  Management purely by numbers is sort of like painting by numbers – it’s strictly for amateurs.
At HP, the company wanted high earnings now and in the future.  By focusing entirely on the numbers, HP got them now by sacrificing the future…
By managing the organization as though it were a black box, some divisions at HP optimized the present at the expense of their downstream competitiveness.  The company rewarded managers for achieving short-term objectives in a manner that was bad for the company.  It would have been better to take into account the white box.  The white box goes beyond the numbers and gets into how the organization produced the numbers.  It penalizes managers who sacrifice the future for the short-term and rewards those who invest in the future even if that investment cannot be easily measured.

I’ll have to wait until the official report on the VA scandal is released before commenting on why the failure occurred.  However, it does seem to me as a case of failure of the black box, as Ben Horowitz explained so adeptly.  His writing is recommended.

1.  Ben Horowitz, The Hard Thing about Hard Things, HarperCollins 2014, p.132

2. IBID p.132-133

 

What Big Data visualization analytics can learn from radiology

As I research on part III of the “What Healthcare can learn from Wall Street” series, which is probably going to turn in to a Part III, Part IV, and Part V, I was thinking about visualization tools in big data and how to use them to analyze large data sets rapidly (relatively) by a human (or a deep unsupervised learning type algorithm) – and it came to me that us radiologists have been doing this for years.
If you have ever watched a radiologist reading at a PACS station (a high-end computer system which displays images quickly) you will see them scroll at a blindingly fast speed through a large series of multiple anatomic images to arrive at a diagnosis or answer a specific question.  [N.B. if you haven’t, you really should – it’s quite cool!]  Stacked upon each other, these images assemble a complete anatomic picture of the area of data acquisition.

What the radiologist is doing while going over the images is comparing the expected appearance of a reference standard to that visualized image to find discrepancies.  The data set looks like THIS:

CT scan segmentationvoxelIt’s important to understand that each pixel on the screen represents not a point, but a volume, called a voxel.  The reconstruction algorithms can sometimes over or under emphasize the appearance of the voxel, so the data is usually reconstructed in multiple axes.  This improves diagnostic accuracy and confidence.

Also, the voxel is not a boolean (binary) zero or one variable – it is a scalar corresponding to a grey-scale value.

So, in data science thinking, what a radiologist is doing is examining a four-dimensional space (X,Y,Z, voxel grayscale) for relevant patterns and deviance from those patterns (Essentially a subtractive algorithm).  A fifth dimension can be added by including changes over time (comparison to a previous similar study at some prior point in time).

Rapid real-time pattern recognition in five variables on large data sets.  Done successfully day-in and day-out visually by your local radiologist.

 

Initial evaluation of a complex data set can give you something like this multiple scatter plot which I don’t find too useful:

Multiple scatter plots

Now, this data set, to me with my orientation and training, becomes much more useful:

3D datasetA cursory visual inspection yields a potential pattern, the orange circles, which to me suggests a possible model drawn in blue.  visuallyevaluatedThat curve looks parabolic, which suggests a polynomial linear model might be useful for describing that particular set of data, so we can model it like this and then run the dataset in R to prove or disprove our hypothesis.
Polynomial Linear Model
So, what I’m suggesting here is that by visually presenting complex data in a format of up to five dimensions (three axes, X, Y,Z, a point with grayscale corresponding to a normalized value, and a fifth, comparative dimension) complex patterns can be visually discovered, potentially quickly and on a screening basis, and then appropriate models can be tested to discover if they hold water.  I’ll save the nuts and bolts of this for a later post, but when a large dataset is evaluated (like an EHR) dimension reduction operations can allow focusing down on fewer variables to put it into a more visualization-friendly dataset.

And I’m willing to bet even money that if an analyst becomes intimately familiar with the dataset and visualization, as they spend more time with it and understand it better, they will be able to pick out relationships that will be absolutely mind-blowing.

Processes and Modeling – a quick observation

Is it not somewhat obvious to the folks reading this blog that this:

simplified ER Process

 

Is the same thing as this:

GLMWhile I might be skewered for oversimplifying the process (and it is oversimplified – greatly), the fundamental principles are the same.  LOS for those not inured in the definition is Length of Stay, also known as Turn around Time (former is usually in days, latter in minutes or hours)

Out of curiosity, is anyone reading this blog willing to admit they are using something similar, or have tried to use something similar and failed?  I would love to know people’s thoughts on this.

A conversation with Farzad Mostashari MD

I participated in a webinar with Farzad Mostashari MD, scM, former director of the ONC (Office of the National Coordinator for Health IT)  sponsored by the data analytics firm Wellcentive   He is now a visiting fellow at the Brookings Institution.  Farzad spoke on points made in a recent article in the American Journal of Accountable Care, Four Key Competencies for Physician-led Accountable Care Organizations.  

The hour-and-a-half format lent itself well to a Q&A format, and basically turned into a small group consulting session with this very knowledgeable policy leader!  

Discussed:
1.  Risk Stratification.  Begin using the EHR data by ‘hot spotting.’  Hot spotting refers to a technique of identifying outliers in medical care and evaluating these outliers to find out why they are consuming resources significantly beyond that of the average.  The Oliver Wyman folks wrote a great white paper that references Dr. Jeffrey Brenner of the Camden Coalition who identified the 1% of Medicaid patients responsible for 30% of the city’s medical costs.  Farzad suggests that data mining should go further and “identify populations of ‘susceptibles’ with patterns of behavior that indicate impending clinical decomposition & lack of resilience.”   He further suggests that we go beyond a insurance-like “risk score” to understand how and why these patients fail, and then apply targeted interventions to prevent susceptibles from failing and over utilizing healthcare resources in the process.  My takeaway from this is in the transition from volume to value, bundled payments and ACO style payments will incentivize physicians to share and manage this risk, transferring a role onto them traditionally filled only by insurers.

2.  Network Management.  Data mining the EHR enables organizations to look at provider and resource utilization within a network.  (c.f. the recent Medicare physician payments data release).  By analyzing this data, referral management can be performed.   By sending patients specifically to those providers who have the best outcomes / lowest costs for that disease, the ACO or insurer can meet shared savings goals.  This would help to also prevent over-utilization – by changing existing referral patterns and excluding those providers who always choose the highest-cost option for care (c.f. the recent medicare payment data for ophthalmologists performing intraocular drug injections – wide variation in costs).  This IS happening – Aetna’s CEO Mark Bertolini, said so specifically during his HIMSS 2014 keynote.   To my understanding, network analysis is mathematically difficult (think eigenfunctions, eigenvalues, and linear algebra) – but that won’t stop a determined implementer from it (it didn’t stop Facebook, Google, or Twitter).  Also included in this topic was workflow management, which is sorely broken in current EHR implementations, clinical decision support tools (like ACRSelect), and traditional six sigma process analytics.

3.  ADT Management.  This was something new.  Using the admission/discharge/transfer data from the HL7 data feed, you could ‘push’ that data to regional health systems.  It achieves a useful degree of data exchange not currently present without a regional data exchange.   Patients who bounce from one ER to the next could be identified this way.  Its also useful to push to the primary care doctors (PCP) managing those patients.  Today, where PCP’s function almost exclusively on an outpatient basis and hospitalists manage the patient while in the hospital, the PCP often doesn’t know about a patient’s hospitalization until they present to the office.  Follow-up care in the first week after hospitalization may help to prevent readmissions. According to Farzad, there is a financial incentive to do so – a discharge alert can enable a primary care practice to ensure that every discharged patient has a telephone follow-up within 48 hours and an office visit within 7 days which would qualify for a $250 “transition in care” payment from Medicare.  (aside – I wasn’t aware of this. I’m not a PCP, and I would carefully check medicare billing criteria closely for eligibility conditions before implementing, as consequences could be severe.  Don’t just take my word for it, as I may be misquoting/misunderstanding and medicare billers are ultimately responsible for what they bill for.  This may be limited to ACO’s.  Due your own due diligence)

4.  Patient outreach and engagement.  One business point is that for the ACO to profit, patients must be retained.  Patient satisfaction may be as important to the business model as the interventions the ACO is performing, particularly as the ACO model suggests a shift to up-front costs and back-end recovery through shared savings.  If you as an ACO invest in a patient, to only lose that patient to a competing ACO, you will let your competitor have the benefit of those improvements in care and eat those sunk costs!  To maintain patient satisfaction and engagement, behavioral economics (think Cass Sunstein’s Nudges.gov  paper), gamification (Jane McGonigal ), A/B Testing (Tim Ferriss) marketing techniques.  Basically, we’re applying customer-centric marketing to healthcare, with not only the total lifetime revenue of the patient considered, but also the total lifetime cost!

It was a very worthwhile discussion and thanks to Wellcentive for hosting it!  

What Medicine can learn from Wall Street – Part 2 – evolution of data analysis

If you missed the first part, read it  here.  Note: For the HIT crowd reading this, I’ll offer (rough) comparison to the HIMSS stages (1-7).

1.     Descriptive Analytics based upon historical data.

Hand Drawn Chart of the Dow Jones Average
Hand Drawn Chart of the Dow Jones Average

     This was the most basic use of data analysis.   When newspapers printed price data (Open-High-Low-Close or OHLC), that data could be charted (on graph paper!) and interpreted using basic technical analysis, which was mostly lines drawn upon the chart. (1)  Simple formulas such as year-over-year (YOY) percentage returns could be calculated by hand. This information was merely descriptive and had no bearing upon future events.  To get information into a computer required data entry by hand, and operator errors could throw off the accuracy of your data.  Computers lived in the accounting department, with the data being used to record position and profit and loss (P&L).   At month’s end a large run of data would produce a computer-generated accounting statement.dot matrix
     A good analogue to this system would be older laboratory reporting systems where laboratory test values were sent to a dedicated lab computer.  If the test equipment interfaced with the computer (via IEEE-488 & RS-232 interfaces) the values were sent automatically.  If not, data entry clerks had to enter these values.  Once in the system, data could be accessed by terminals throughout the hospital.  Normal ranges were typically included, with an asterisk indicating the value was abnormal.  The computer database would be updated once a day (end of day type data).  For more rapid results, you would have to go to the lab yourself and ask.  On the operations side, a Lotus 1-2-3 spreadsheet on the finance team’s computer of  quarterly charges, accounts receivable, and perhaps a few very basic metrics would be available to the finance department and CEO for periodic review.  
     For years, this delayed, descriptive data was the standard.  Any inference would be provided by humans alone, who connected the dots.  A rough equivalent would be HIMSS stage 0-1.

2.     Improvements in graphics, computing speed, storage, connectivity.

     Improvements in processing speed & power (after Moore’s Law), cheapening memory and storage prices, and improved device connectivity resulted in more readily available data.  Near real-time price data was available, but relatively expensive ($400 per month or more per exchange with dedicated hardware necessary for receipt – a full vendor package could readily run thousands of dollars a month from a low cost competitior, and much more if you were a full service institution).    An IBM PC XT of enough computing power & storage ($3000) could now chart this data.  The studies that Ed Seykota ran on weekends would run on the PC – but analysis was still manual. The trader would have to sort through hundreds of ‘runs’ of the data to find the combination of parameters which led to the most profitable (successful) strategies, and then apply them to the market going forward.  More complex statistics could be calculated – such as Sharpe Ratios, CAGR, and maximum drawdown – and these were developed and diffused over time into wider usage.  Complex financial products such as options could now be priced more accurately in near-real time with algorithmic advances (such as the binomial pricing model).
     The health care corollary would be in-house early electronic record systems tied in to the hospital’s billing system.  Some patient data was present, but in siloed databases with limited connectivity.  To actually use the data you would ask IT for a data dump which would then be uploaded into Excel for basic analysis.  Data would come from different systems and combining it was challenging.  Because of the difficulty in curating the data (think massive spreadsheets with pivot tables), this could be a full-time job for an analyst or team of analysts, and careful selection of what data was being followed and what was discarded would need to be considered, a priori.  The quality of the analysis improved, but was still human labor intensive, particularly because of large data sets & difficulty in collecting the information.  For analytic tools think Excel by Microsoft or Minitab.
     This corresponds to HIMSS stage 2-3.

3.     Further improvement in technology correlates with algorithmic improvement.Pretty
     With new levels of computing power, analysis of data became quick and relatively cheap allowing automated analysis.  Taking the same data set of computed results from price/time data that was analyzed by hand before; now apply an automated algorithm to run through ALL possible combinations of included parameters.  This is brute-force optimization.   The best solve for the data set is found, and a trader is more confident that the model will be profitable going forward.
ACTV5MA     For example, consider ACTV(2).  Running a brute force optimization on this security with a moving average over the last 2 years yields a profitable trading strategy that returns 117% with the ideal solve.  Well, on paper that looks great.  What could be done to make it even MORE profitable?  Perhaps you could add a stop loss.  Do another optimization and theoretical return increases.  Want more?  Sure.  Change the indicator and re-optimize.  Now your hypothetical return soars.  Why would you ever want to do anything else? (3,4)
     But it’s not as easy as it sounds.  The best of the optimized models would work for a while, and then stop.  The worst would immediately diverge and lose money from day 1 – never recovering.  Most importantly : what did we learn from this experienceWe learned that how the models were developed matteredAnd to understand this, we need to go into a bit of math.
    Looking at security prices, you can model (approximate) the price activity as a function, F(X)= the squiggles of a chart.  The model can be as complex or simple as desired.  Above, we start with a simple model (the moving average), and make it progressively more complex adding additional rules and conditions.  As we do so, the accuracy of the model increases, so the profitability increases as well.  However, as we increase the accuracy of the model, we use up degrees of freedom, making the model more rigid and less resilient.
     Hence the system trader’s curse – everything works great on paper, but when applied to the market, the more complex the rules, and the less robustly the data is tested, the more likely the system will fail due to a phenomenon known as over-fitting.  Take a look at the 3D graph below which shows a profitability model of the above analysis:3D optimization
     You will note that there is a spike in profitability using a 5 day moving average at the left of the graph, but profitability sharply falls off after that, rises a bit, and then craters.  There is a much broader plateau of profitability in the middle of the graph, where many values are consistently and similarly profitable.  Changes in market conditions could quickly invalidate the more profitable 5 day moving average model, but a model with a value chosen in the middle of the chart might be more consistently profitable over time.  While more evaluation would need to be done, the less profitable (but still profitable) model is said to be more ‘Robust’.

To combat this, better statistical sampling methods were utilized, namely cross-validation where an in-sample set is used to test an out-of-sample set for performance.  This gave a system which was less prone to immediate failure, i.e. more robust.  A balance between profitability and robustness can be struck, netting you the sweet spot in the Training vs. Test-set performance curve I’ve posted before.
So why didn’t everyone do this?  Quick answer: they did.  And by everyone analyzing the same data set of end-of-day historical price data in the same way, many people began to reach the same conclusions as each other.  This created an ‘observer effect’ where you had to be first to market to execute your strategy, or trade in a market that was liquid enough (think the S&P 500 index) that the impact of your trade (if you were a small enough trader – doesn’t work for a large institutional trader) would not affect the price.  Classic case of ‘the early bird gets the worm’. here
     The important point is that WE ARE HERE in healthcare.  We have moderately complex computer systems that have been implemented largely due to Meaningful Use concerns, bringing us to between HIMSS stages 4-7.  We are beginning to use the back ends of computer systems to interface with analytic engines for useful descriptive analytics that can be used to inform business and clinical care decisions.  While this data is still largely descriptive, some attempts at predictive analytics have been made.  These are largely proprietary (trade secrets) but I have seen some vendors beginning to offer proprietary models to the healthcare community (hospitals, insurers, related entities) which aim at predictive analytics.  I don’t have specific knowledge of the methods used to create these analytics, but after the experience of Wall Street, I’m pretty certain that a number of them are going to fall into the overfitting trap.  There are other, more complex reasons why these predictive analytics might not work (and conversely, good reasons why they may), which I’ll cover in future posts.  
     One final point – application of predictive analytics to healthcare will succeed in the area where it fails on Wall Street for a specific reason.  On Wall Street, the relationship once discovered and exploited causes the relationship to disappear.  That is the nature of arbitrage – market forces reduce arbitrage opportunities since they represent ‘free money’ and once enough people are doing it, it is no longer profitable.  However, biological organisms don’t response to gaming the system in that manner.  For a conclusive diagnosis, there may exist an efficacious treatment that is consistently reproducible.  In other words, for a particular condition in a particular patient with a particular set of characteristics (age, sex, demographics, disease processes, genetics) if accurately diagnosed and competently executed, we can expect a reproducible biologic response, optimally a total cure of the individual.  And that reproducible response applies to processes present in the complex dynamic systems that comprise our healthcare delivery system.  That is where the opportunity lies in applying predictive analytics to healthcare.

(1) Technical Analysis of Stock Trends, Edwards and Magee, 8th Edition, St. Lucie Press
(2) ACTIVE Technologies, acquired (taken private) by Vista Equity Partners and delisted on 11/15/2013.  You can’t trade this stock.   
(3) Head of Trading, First Chicago Bank, personal communication
(4) Reminder – see the disclaimer for this blog!  And if you think you are going to apply this particular technique to the markets to be the next George Soros, I’ve got a piece of the Brooklyn Bridge to sell you.

What medicine can learn from Wall Street – Part I – History of analytics

floorWe, in healthcare, lag in computing technology and sophistication vs. other fields.  The standard excuses given are: healthcare is just too complicated, doctors and staff won’t accept new ways of doing things, everything is fine as it is, etc…  But we are shifting to a new high-tech paradigm in healthcare, with ubiquitous computing supplanting or replacing traditional care delivery models.  Medicine has a ‘deep moat’ – both regulatory and through educational barriers to entry.  However, the same was said of the specialized skill sets of the financial industry.  Wall St. has pared its staffing down and has automated many jobs & continues to do so.  More product (money) is being handled by fewer people than before, an increase in real productivity.

Computing power in the 1960’s-1970’s on Wall street was large mainframe & mini-frame systems which were used for back-office operations.  Most traders operated by ‘seat of your pants’ hunches and guesses, longer term macro-economic plays, or using their privileged position as market-makers to make frequent small profits.  One of the first traders to use computing was Ed Seykota, who applied Richard Donchian’s trend following techniques to the commodity markets.  Ed would run computer programs on an IBM 360 on weekends, and over six months tested four systems with variations (100 combinations), ultimately developing an exponential moving average trading system that would turn a $5000 account into $15,000,000.(1)  Ed would run his program and wait for the output.  He would then manually select the best system for his needs (usually most profitable).  He had access to delayed, descriptive data which required his analysis for a decision.

In the 1980’s – 1990’s computing power increased with the PC, and text-only displays evolved to graphical displays.  Systems traders became some of the most profitable traders in large firms.  Future decisions were being made on historical data (early predictive analytics).   On balance well-designed systems traded by experienced traders were successful more often than not.  Testing was faster, but still not fast (a single security run on a x386 IBM PC would take about 8 hours).  As more traders began to use the same systems, the systems worked less well.  This was due to an ‘observer effect’., with traders trying to exploit a particular advantage quickly causing the advantage to disappear!  The system trader’s ‘edge’ or profitability was constantly declining, and new markets or circumstances were sought. ‘Program’ trades were accused of being the cause of the 1987 stock market crash.  

There were some notable failures in market analysis – Fast Fourier Transformations being one.  With enough computing power, you could fit a FFT to the market perfectly – but it would hardly ever work going forward.  The FFT fails because it presumes a cyclical formula, and the markets while cyclical, are not predictably so.  But an interesting phenomenon was that the better the fit in the FFT, the quicker and worse it would fall apart.  That was due to the phenomenon of curve-fitting.  ‘Fractals’ were all the rage later & failed just as miserably – same problem.  As an aside, it explains why simpler linear models in regression analysis are frequently ‘better’ than a high-n polynomial spline fit to the data, particularly when considered for predictive analytics.  The closer you fit the data, the less robust the model becomes and more prone to real-world failure.

Further advances in computing and computational statistics followed in the 1990’s-2000’s.  Accurate real-time market data became widely available and institutionally ubiquitous, and time frames became shorter and shorter.   Programs running on daily data were switched to multi-hour, hour, and then in intervals of minutes.The trend-following programs of the past became failures as the market became more choppy, and anti-trend (mean reversion) systems were popular.  Enter the quants –  the statisticians.(2)   With fast, cheap, near-ubiquitous computing, the scope of the systems expanded.   Now many securities could be analyzed at once, and imbalances exploited.  Hence the popularity of ‘pairs’ trading. Real-time calculation of indices created index arbitrage, which were able to execute without human intervention.

The index arbitrage (index-arb) programs relied on speed and proximity to the exchanges to have advantages in execution.  Statistical Arbitrage (Stat-arb) programs were the next development. These evolved into today’s High-Frequency-Trading programs (HFT’s) which dominate systems trading  These programs are tested extensively on existing data, and then are let loose on the markets to be run – with only high-level oversight.  They make thousands of trading decisions a second, incur real profits and losses, and compete against other HFT algorithms in a darwinian environment where the winners make money and are adapted further, and the losers dismissed with a digital death.  Master governing algorithms coordinate individual algorithms. (4)

The floor traders, specialists, market-makers, and scores of support staff that once participated in the daily business have been replaced by glowing boxes sitting in a server rack next to the exchange.  

Not to say that automated trading algorithms are perfect.  A rogue algorithm with insufficient oversight caused a forced sale of Knight Capital Group (KCG) in 2012.  (3)  The lesson here is significant – there ARE going to be errors once automated algorithms are in greater use – it is inevitable.

So reviewing the history, what happened on wall st.?
1.  First was descriptive analytics based upon historical data.
2.  Graphical Interfaces were improved.
3.  Improving technology led to more complicated algorithms which overfit the data. (WE ARE HERE)
4.  Improving data accuracy led to real-time analytics.
5.  Real time analytics led to shorter analysis timeframes
6.  Shorter analysis timeframes led to dedicated trading algorithms operating with only human supervision
7.  Master algorithms were created to coordinate the efforts of individual trading algorithms.

Next post, I’ll show the corollaries in health care and use it to predict where we are going.

  

(1) Jack Schwager, Market Wizards, Ed Seykota interview pp151-174.
(2) David Aronson, Evidence-based Technical Analysis, Wiley 2007
(3) Wall St. Journal, Trading Error cost firm $440 million, Marketbeat  

(4)Personal communication, HFT trader (name withheld)

Cost Shifting in Healthcare

 

There is a widely held belief, perhaps unspoken but no less strongly held, that the healthcare business is a zero-sum game.

Consider how healthcare dollars are generated.   A hospital system, care facility, or provider provides a service to the surrounding area, termed a catchment area.  Those covered lives in the catchment are expected to generate a certain amount of healthcare expenditures on an aggregate, population basis.  This is modeled by insurers and hospital systems for budgetary purposes.  Given the number of people in the catchment area, the age, socio-economic status, general degree of illness, and type of insurance, finance professionals and actuaries can make an estimate of expected healthcare dollars by payors (insurers, government) to providers and facilities on a per patient basis.

While modifiers, complications, and co-morbidities can alter the real billing for a particular patient and encounter, on aggregate most in the healthcare industry tend to think that these care dollars will either be captured by their system or a competitor.  Hence, zero-sum.  That understanding probably accounts for the ‘me too’ effect in healthcare, as once one system purchases a gamma knife, the other system will to, as they are unwilling to let the competitor capture those lives with the resulting profit strengthening one system over the other.  

But this zero-sum mentality trickles down as well from the CEO level to employees, particularly middle management.  Consider the service line manager – given a fixed budget, bonused on cost savings vs that budget ceiling.  You have value-added services that earn revenue.  However, you also have compliance-related non-value add mandatory services which are essentially costs.  What’s one way to improve the service line budget?  By keeping the valued added work and pawning off the non-value added work as much as possible on someone else.  By having your clinicians bill separately for services, and requiring by medical staff privileges that ‘cherry picking’ is not allowed, you make sure your clinicians will provide services to the indigent as well as the insured.  But you don’t have to pay your clinicians for that work.  By requiring department chairpeople to design standard orders, you avoid having to hire consultants to do the same thing.  Cost-shifting onto the non-employed physician is a well-known phenomenon.  Don’t think that it doesn’t work the other way, however!  On a busy friday afternoon, a family practitioner sends a complicated elderly patient to the ER with a weak complaint which requires evaluation.  When it is time to discharge the patient, the family members can’t be found and the physician, who does not have privileges at the hospital, won’t answer the phone.  An economist would argue that each of these individuals acted in their own best interest, but the cost to the patient and the system, as well as the payor, is high.

As physicians are employed in the hospital system,  the situation gets more complex.  Cost-shifting behavior dies slowly, but the mid-level administrator is merely shifting costs within the system to another service line manager to meet their own budgetary  or productivity goals.  Without an institutional understanding of why this behavior is maladaptive, and management processes in place to make sure this does not happen, the result is that employed physician is cost shifted upon – and that person has lost the ability to cost-shift herself back to maintain equilibrium by virtue of employment.  This is a problem, because it can cause physician dissatisfaction, a declining quality of care, and ultimately physician burnout.  And currently, there does not seem to be any governance model in place to prevent this (At least, I’m unaware of them).  What will ultimately happen is service lines will be missing key players, resulting in missed revenue opportunities for the system – essentially giving their competition the edge – in light of positive budgets and productivity goals.  This will leave most executives scratching their heads, as the relationship is not directly seen.  The bottom line is that you can’t cost shift onto yourself.  Systems employing physicians in significant numbers would be wise to learn this quickly.

Resource Misallocation in Graduate Medical Education

travestylarge

I hope it is clear to anyone reading this blog that this is a terrible waste of human capital.

If it becomes widely known that you need to spend $320,000 on four years of medical school just to compete for a shot at a residency, the ‘best and brightest’ will take one look at that, say “No thank you” and re-orient to careers that do not subject them to an inordinate degree of personal and professional risk.  Medical students will then be picked from 1) the truly wealthy, 2) the uninformed, and 3) the desperate, looking for a lottery ticket.

I am mentoring a young physician who falls into that gap and has been unable to secure an internship.  Once upon a time, this physician would have slid readily into a less competitive specialty – pediatrics, family practice, etc…  But now, their ability to practice medicine in the future is really in jeopardy.  This is a bright person with an ivy-league background and a winning personality, but coming from a lower-tier medical school.  Their dream of being a physician is at risk of becoming a nightmare.  And the terrible thing is that this individual’s story is not a fluke any more.  The terrible state of Graduate Medical Education (GME) in the United States needs to be addressed.

P.S. Any program directors needing to fill a slot with a great intern, contact me.

 

Productivity in medicine – what’s real and what’s fake?

Let’s think about provider productivity.  As an armchair economist, I apologize to any PhD economists who feel I am oversimplifying things.
Why is productivity good?  It has enabled the standard of living increase over the last 200 years.  Economic output is tied to two variables: the number of individuals producing goods, and how many goods and services they can produce – productivity.  Technology supercharges productivity.   50 member platform companies now outproduce the corporation of 40 years ago which took a small army of people to achieve a lower output.  We live better lives because of productivity.

We strive for productivity in health care.  More patients seen per hour, more patients treated.  Simple enough.  But productivity focused on N(#) of patients seen per hour does not necessarily maintain quality of care as that metric increases.  A study of back office workers in banking validated that when the workers were overloaded, they sped up, but the quality of their work decreased (defects).  Banking is not healthcare, granted, but in finance defects are pretty quickly recognized and corrected [“Excuse me, but where is my money?”].  As to patient outcome, defects may take longer to show up and be more difficult to attribute to any one factor.  Providers usually have a differential diagnosis for their patient’s presenting complaints.   A careful review of the history and medical record can significantly narrow the differential.  Physician extenders can allow providers to see patients more effectively, with routine care shunted to the extender.  However, for a harried clinician, testing can also be used as a physician extender of sorts.  It increases diagnostic accuracy, at a cost to the patient (monetary and time) and the payor (monetary).  It is hardly fraudulent.  However, is it waste?  And since it usually requires a repeat visit, is it rework?  Possibly yes, to both.

The six-minute per encounter clinician who uses testing as a physician extender will likely have higher RVU production than one who diligently reviews the medical record for a half-an-hour and sees only 10 patients a day.  But who is providing better care?  If outcomes are evaluated, I would suspect that there is either no difference between the two or a slight outcome measure favoring the higher testing provider.  An analysis to judge whether the cost/benefit ratio is justified would probably be necessary.  Ultimately, if you account for all costs on the system, the provider that causes more defects, waste, and re-work is usually less efficient on aggregate, even though individually measured productivity may be high.  See: ‘The measure is the metric‘.  Right now, insurers are data mining to see which providers have best outcomes and lowest costs for specific disease processes, and will steer patients preferentially to them (Aetna CEO, keynote speech HIMSS 2014).

One of my real concerns is that we are training an entire generation of providers in this volume-oriented, RVU-production approach.  These folks may be high performers now, but when the value shift comes, these providers are going to have to re-learn a whole new set of skills.  More worrisome, there are entire practices that are being optimized under six sigma processes for greatest productivity.  Such a practice will have a real problem adapting to value-based care, because it represents a cultural shift.  It might affect the ability of a health system to pivot from volume to value, with resulting loss of competitiveness.

In the volume to value world, there are two types of productivity:

  • Fake productivity: High RVU generators who do so by cost shifting, waste, re-work, defects.
  • True productivity: Consistent RVU generators who follow efficient testing, appropriate # of follow-up visits, and have the good outcomes to prove it.

I am sure that most providers want to work in the space of real productivity – after all, it represents the ideal model learned as students.   Fake productivity is simply a maladaptive response to external pressures, and shouldn’t be conflated with True productivity.