{"id":8810,"date":"2014-04-11T10:28:18","date_gmt":"2014-04-11T14:28:18","guid":{"rendered":"http:\/\/n2value.com\/blog\/?p=8810"},"modified":"2014-04-14T08:07:05","modified_gmt":"2014-04-14T12:07:05","slug":"what-medicine-can-learn-from-wall-street-part-2-evolution-of-data-analysis","status":"publish","type":"post","link":"https:\/\/n2value.com\/blog\/what-medicine-can-learn-from-wall-street-part-2-evolution-of-data-analysis\/","title":{"rendered":"What Medicine can learn from Wall Street &#8211; Part 2 &#8211; evolution of data analysis"},"content":{"rendered":"<p><span style=\"font: 13.0px Arial;\">If you missed the first part, read it\u00a0 <\/span><a title=\"Part 1 - what medicine can learn from wall street\" href=\"http:\/\/n2value.com\/blog\/what-medicine-can-learn-from-wall-street-part-i-history-of-analytics\/\" target=\"_blank\"><span style=\"font: 13.0px Arial; color: #042eee;\">here<\/span><\/a><span style=\"font: 13.0px Arial;\">.<\/span>\u00a0 Note: <span style=\"font: 13.0px Arial;\">For the HIT crowd reading this, I\u2019ll offer (rough) comparison to the\u00a0<\/span><a title=\"HIMSS EMRAM\" href=\"http:\/\/www.himssanalytics.org\/emram\/emram.aspx\" target=\"_blank\"><span style=\"font: 13.0px Arial; color: #042eee;\">HIMSS stages (1-7)<\/span><\/a><span style=\"font: 13.0px Arial;\">.<\/span><\/p>\n<h3><span style=\"font: 13.0px Arial;\">1. \u00a0 \u00a0\u00a0<span style=\"text-decoration: underline;\">Descriptive Analytics based upon historical data.<\/span><\/span><\/h3>\n<figure id=\"attachment_8818\" aria-describedby=\"caption-attachment-8818\" style=\"width: 252px\" class=\"wp-caption alignleft\"><a href=\"http:\/\/n2value.com\/blog\/wp-content\/uploads\/2014\/04\/Dowbyhand.png\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-8818\" alt=\"Hand Drawn Chart of the Dow Jones Average \" src=\"http:\/\/n2value.com\/blog\/wp-content\/uploads\/2014\/04\/Dowbyhand-300x199.png\" width=\"252\" height=\"168\" srcset=\"https:\/\/n2value.com\/blog\/wp-content\/uploads\/2014\/04\/Dowbyhand-300x199.png 300w, https:\/\/n2value.com\/blog\/wp-content\/uploads\/2014\/04\/Dowbyhand.png 536w\" sizes=\"auto, (max-width: 252px) 100vw, 252px\" \/><\/a><figcaption id=\"caption-attachment-8818\" class=\"wp-caption-text\">Hand Drawn Chart of the Dow Jones Average<\/figcaption><\/figure>\n<p><span style=\"font: 13.0px Arial;\">\u00a0 \u00a0\u00a0 This was the most basic use of data analysis. \u00a0 When newspapers printed price data (Open-High-Low-Close or OHLC), that data could be charted (on graph paper!) and interpreted using basic technical analysis, which was mostly lines drawn upon the chart. (1)\u00a0 Simple formulas such as year-over-year (YOY) percentage returns could be calculated by hand. This information was merely descriptive and had no bearing upon future events.\u00a0 To get information into a computer required data entry by hand, and operator errors could throw off the accuracy of your data.\u00a0 Computers lived in the accounting department, with the data being used to record position and profit and loss (P&amp;L). \u00a0 At month\u2019s end a large run of data would produce a computer-generated accounting statement.<a href=\"http:\/\/n2value.com\/blog\/wp-content\/uploads\/2014\/04\/dot-matrix.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignright  wp-image-8817\" alt=\"dot matrix\" src=\"http:\/\/n2value.com\/blog\/wp-content\/uploads\/2014\/04\/dot-matrix-300x298.jpg\" width=\"204\" height=\"204\" srcset=\"https:\/\/n2value.com\/blog\/wp-content\/uploads\/2014\/04\/dot-matrix-300x298.jpg 300w, https:\/\/n2value.com\/blog\/wp-content\/uploads\/2014\/04\/dot-matrix-150x150.jpg 150w, https:\/\/n2value.com\/blog\/wp-content\/uploads\/2014\/04\/dot-matrix-1024x1018.jpg 1024w\" sizes=\"auto, (max-width: 204px) 100vw, 204px\" \/><\/a><\/span><br \/>\n<span style=\"font: 13.0px Arial;\">\u00a0 \u00a0 \u00a0A good analogue to this system would be older laboratory reporting systems where laboratory test values were sent to a dedicated lab computer. \u00a0If the test equipment interfaced with the computer (via <a title=\"IEEE-488\" href=\"https:\/\/en.wikipedia.org\/wiki\/IEEE-488\" target=\"_blank\">IEEE-488<\/a> &amp; <a title=\"RS-232\" href=\"https:\/\/en.wikipedia.org\/wiki\/RS-232\" target=\"_blank\">RS-232<\/a> interfaces) the values were sent automatically. \u00a0If not, data entry clerks had to enter these values. \u00a0Once in the system, data could be accessed by terminals throughout the hospital. \u00a0Normal ranges were typically included, with an asterisk indicating the value was abnormal. \u00a0The computer database would be updated once a day (end of day type data). \u00a0For more rapid results, you would have to go to the lab yourself and ask.\u00a0 On the operations side, a Lotus 1-2-3 spreadsheet on the finance team\u2019s computer of\u00a0 quarterly charges, accounts receivable, and perhaps a few very basic metrics would be available to the finance department and CEO for periodic review. \u00a0<\/span><br \/>\n<span style=\"font: 13.0px Arial;\">\u00a0 \u00a0 \u00a0For years, this delayed, descriptive data was the standard. \u00a0Any inference would be provided by humans alone, who connected the dots. \u00a0A rough equivalent would be HIMSS stage 0-1.<\/span><\/p>\n<h3><span style=\"font: 13.0px Arial;\">2. \u00a0 \u00a0\u00a0<span style=\"text-decoration: underline;\">Improvements in graphics, computing speed, storage, connectivity.<\/span><\/span><\/h3>\n<p><span style=\"font: 13.0px Arial;\">\u00a0 \u00a0 \u00a0Improvements in processing speed &amp; power (after <a title=\"Moore's Law\" href=\"https:\/\/simple.wikipedia.org\/wiki\/Moore%27s_law\" target=\"_blank\">Moore\u2019s Law<\/a>), cheapening memory and storage prices, and improved device connectivity resulted in more readily available data.\u00a0 Near real-time price data was available, but relatively expensive ($400 per month or more per exchange with dedicated hardware necessary for receipt &#8211; a full vendor package could readily run thousands of dollars a month from a low cost competitior, and much more if you were a full service institution).\u00a0\u00a0\u00a0 An IBM PC XT of enough computing power &amp; storage ($3000) could now chart this data. \u00a0The studies that <a title=\"What medicine can learn from Wall Street \u2013 Part I \u2013 History of analytics\" href=\"http:\/\/n2value.com\/blog\/what-medicine-can-learn-from-wall-street-part-i-history-of-analytics\/\" target=\"_blank\">Ed Seykota<\/a> ran on weekends would run on the PC &#8211; but analysis was still manual. The trader would have to sort through hundreds of \u2018runs\u2019 of the data to find the combination of parameters which led to the most profitable (successful) strategies, and then apply them to the market going forward. \u00a0More complex statistics could be calculated &#8211; such as Sharpe Ratios, CAGR, and maximum drawdown &#8211; and these were developed and diffused over time into wider usage. \u00a0Complex financial products such as options could now be priced more accurately in near-real time with algorithmic advances (such as the <a title=\"Binomial Pricing Model\" href=\"https:\/\/en.wikipedia.org\/wiki\/Binomial_options_pricing_model\" target=\"_blank\">binomial pricing model<\/a>).<\/span><br \/>\n<span style=\"font: 13.0px Arial;\">\u00a0 \u00a0 \u00a0The health care corollary would be in-house early electronic record systems tied in to the hospital&#8217;s billing system. \u00a0Some patient data was present, but in siloed databases with limited connectivity. \u00a0To actually use the data you would ask IT for a data dump which would then be uploaded into Excel for basic analysis. \u00a0Data would come from different systems and combining it was challenging.\u00a0 Because of the difficulty in curating the data (think massive spreadsheets with pivot tables), this could be a full-time job for an analyst or team of analysts, and careful selection of what data was being followed and what was discarded would need to be considered, a priori.\u00a0 The quality of the analysis improved, but was still human labor intensive, particularly because of large data sets &amp; difficulty in collecting the information.\u00a0 For analytic tools think Excel by Microsoft or Minitab.<\/span><br \/>\n<span style=\"font: 13.0px Arial;\">\u00a0 \u00a0 \u00a0This corresponds to HIMSS stage 2-3.<\/span><\/p>\n<p><span style=\"font: 13.0px Arial;\">3. \u00a0 \u00a0 <span style=\"text-decoration: underline;\">Further improvement in technology correlates with algorithmic improvement.<\/span><\/span><a href=\"http:\/\/n2value.com\/blog\/wp-content\/uploads\/2014\/04\/Pretty.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignright size-medium wp-image-8825\" alt=\"Pretty\" src=\"http:\/\/n2value.com\/blog\/wp-content\/uploads\/2014\/04\/Pretty-300x218.png\" width=\"300\" height=\"218\" srcset=\"https:\/\/n2value.com\/blog\/wp-content\/uploads\/2014\/04\/Pretty-300x218.png 300w, https:\/\/n2value.com\/blog\/wp-content\/uploads\/2014\/04\/Pretty.png 556w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><br \/>\n<span style=\"font: 13.0px Arial;\">\u00a0 \u00a0\u00a0 With new levels of computing power, analysis of data became quick and relatively cheap allowing automated analysis. \u00a0Taking the same data set of computed results from price\/time data that was analyzed by hand before; now apply an automated algorithm to run through ALL possible combinations of included parameters.\u00a0 This is brute-force optimization. \u00a0 The best solve for the data set is found, and a trader is more confident that the model will be profitable going forward.<\/span><br \/>\n<a href=\"http:\/\/n2value.com\/blog\/wp-content\/uploads\/2014\/04\/ACTV5MA.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft size-thumbnail wp-image-8840\" alt=\"ACTV5MA\" src=\"http:\/\/n2value.com\/blog\/wp-content\/uploads\/2014\/04\/ACTV5MA-150x150.jpg\" width=\"150\" height=\"150\" \/><\/a><span style=\"font: 13.0px Arial;\">\u00a0 \u00a0 \u00a0For example, consider ACTV(2). \u00a0Running a brute force optimization on this security with a moving average over the last 2 years yields a profitable trading strategy that returns 117% with the ideal solve. \u00a0Well, on paper that looks great. \u00a0What could be done to make it even MORE profitable? \u00a0Perhaps you could add a stop loss. \u00a0Do another optimization and theoretical return increases. \u00a0Want more? \u00a0Sure.\u00a0 Change the indicator and re-optimize. \u00a0Now your hypothetical return soars. \u00a0Why would you ever want to do anything else?<\/span> (3,4)<br \/>\n<span style=\"font: 13.0px Arial;\">\u00a0 \u00a0 \u00a0But it&#8217;s not as easy as it sounds.\u00a0 The best of the optimized models would work for a while, and then stop. \u00a0The worst would immediately diverge and lose money from day 1 &#8211; never recovering.\u00a0 <strong>Most importantly : what did we learn from this experience<\/strong>?\u00a0 <strong>We learned that how the models were developed mattered<\/strong>.\u00a0 <\/span><span style=\"font: 13.0px Arial;\">And to understand this, we need to go into a bit of math.<\/span><br \/>\n<span style=\"font: 13.0px Arial;\">\u00a0 \u00a0 Looking at security prices, you can model (approximate) the price activity as a function, F(X)= the squiggles of a chart. \u00a0The model can be as complex or simple as desired. \u00a0Above, we start with a simple model (the moving average), and make it progressively more complex adding additional rules and conditions. \u00a0As we do so, the accuracy of the model increases, so the profitability increases as well. \u00a0However, as we increase the accuracy of the model, we use up\u00a0<\/span><a title=\"Degrees of Freedom\" href=\"https:\/\/en.wikipedia.org\/wiki\/Degrees_of_freedom_%28statistics%29\" target=\"_blank\"><span style=\"font: 13.0px Arial; color: #042eee;\">degrees of freedom<\/span><\/a><span style=\"font: 13.0px Arial;\">, making the model more rigid and less resilient.<\/span><br \/>\n<span style=\"font: 13.0px Arial;\">\u00a0 \u00a0 \u00a0Hence the system trader\u2019s curse &#8211; everything works great on paper, but when applied to the market, the more complex the rules, and the less robustly the data is tested, the more likely the system will fail due to a phenomenon known as over-fitting.<\/span>\u00a0 <span style=\"font: 13.0px Arial;\">Take a look at the 3D graph below which shows a profitability model of the above analysis:<\/span><a href=\"http:\/\/n2value.com\/blog\/wp-content\/uploads\/2014\/04\/3D-optimization.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-8839\" alt=\"3D optimization\" src=\"http:\/\/n2value.com\/blog\/wp-content\/uploads\/2014\/04\/3D-optimization-1024x502.jpg\" width=\"768\" height=\"376\" srcset=\"https:\/\/n2value.com\/blog\/wp-content\/uploads\/2014\/04\/3D-optimization-1024x502.jpg 1024w, https:\/\/n2value.com\/blog\/wp-content\/uploads\/2014\/04\/3D-optimization-300x147.jpg 300w, https:\/\/n2value.com\/blog\/wp-content\/uploads\/2014\/04\/3D-optimization.jpg 1236w\" sizes=\"auto, (max-width: 768px) 100vw, 768px\" \/><\/a><br \/>\n<span style=\"font: 13.0px Arial;\">\u00a0 \u00a0\u00a0 You will note that there is a spike in profitability using a 5 day moving average at the left of the graph, but profitability sharply falls off after that, rises a bit, and then craters.\u00a0 There is a much broader plateau of profitability in the middle of the graph, where many values are consistently and similarly profitable.\u00a0 Changes in market conditions could quickly invalidate the more profitable 5 day moving average model, but a model with a value chosen in the middle of the chart might be more consistently profitable over time.\u00a0 While more evaluation would need to be done, the less profitable (but still profitable) model is said to be more &#8216;Robust&#8217;.<br \/>\n<\/span><\/p>\n<p><span style=\"font: 13.0px Arial;\">To combat this, better statistical sampling methods were utilized, namely\u00a0<\/span><a title=\"Cross-Validation\" href=\"https:\/\/en.wikipedia.org\/wiki\/Cross-validation_%28statistics%29\" target=\"_blank\"><span style=\"font: 13.0px Arial; color: #042eee;\">cross-validation<\/span><\/a><span style=\"font: 13.0px Arial;\">\u00a0where an in-sample set is used to test an out-of-sample set for performance. \u00a0This gave a system which was less prone to immediate failure, i.e. more robust. \u00a0A balance between profitability and robustness can be struck, netting you the sweet spot in the <a title=\"Where I\u2019m going with this blog\" href=\"http:\/\/n2value.com\/blog\/where-im-going-with-this-blog\/\" target=\"_blank\">Training vs. Test-set performance curve<\/a> I\u2019ve posted before.<\/span><br \/>\n<span style=\"font: 13.0px Arial;\">So why didn\u2019t everyone do this?\u00a0 Quick answer: they did. \u00a0And by everyone analyzing the same data set of end-of-day historical price data in the same way, many people began to reach the same conclusions as each other. \u00a0This created an <a title=\"The Measure is the Metric\" href=\"http:\/\/n2value.com\/blog\/the-measure-is-the-metric\/\" target=\"_blank\">\u2018observer effect\u2019<\/a> where you had to be first to market to execute your strategy, or trade in a market that was liquid enough (think the S&amp;P 500 index) that the impact of your trade (if you were a small enough trader &#8211; doesn\u2019t work for a large institutional trader) would not affect the price. \u00a0Classic case of \u2018the early bird gets the worm\u2019. <\/span><a href=\"http:\/\/n2value.com\/blog\/wp-content\/uploads\/2014\/04\/here1.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignright size-thumbnail wp-image-8832\" alt=\"here\" src=\"http:\/\/n2value.com\/blog\/wp-content\/uploads\/2014\/04\/here1-150x150.jpg\" width=\"150\" height=\"150\" srcset=\"https:\/\/n2value.com\/blog\/wp-content\/uploads\/2014\/04\/here1-150x150.jpg 150w, https:\/\/n2value.com\/blog\/wp-content\/uploads\/2014\/04\/here1-300x300.jpg 300w, https:\/\/n2value.com\/blog\/wp-content\/uploads\/2014\/04\/here1.jpg 500w\" sizes=\"auto, (max-width: 150px) 100vw, 150px\" \/><\/a><br \/>\n<span style=\"font: 13.0px Arial;\">\u00a0 \u00a0 \u00a0The important point is that <em>WE ARE HERE<\/em> in healthcare.\u00a0 We have moderately complex computer systems that have been implemented largely due to <a title=\"Meaningful Use\" href=\"http:\/\/wiki.galenhealthcare.com\/Meaningful_Use\" target=\"_blank\">Meaningful Use<\/a> concerns, bringing us to between HIMSS stages 4-7. \u00a0We are beginning to use the back ends of computer systems to interface with analytic engines for useful descriptive analytics that can be used to inform business and clinical care decisions. \u00a0While this data is still largely descriptive, some attempts at predictive analytics have been made. \u00a0These are largely proprietary (trade secrets) but I have seen some vendors beginning to offer proprietary models to the healthcare community (hospitals, insurers, related entities) which aim at predictive analytics. \u00a0I don\u2019t have specific knowledge of the methods used to create these analytics, but after the experience of Wall Street, I\u2019m pretty certain that a number of them are going to fall into the overfitting trap. \u00a0There are other, more complex reasons why these predictive analytics might not work (and conversely, good reasons why they may), which I\u2019ll cover in future posts. \u00a0<\/span><br \/>\n<span style=\"font: 13.0px Arial;\">\u00a0 \u00a0 \u00a0One final point &#8211; application of predictive analytics to healthcare will succeed in the area where it fails on Wall Street for a specific reason. \u00a0On Wall Street, the relationship once discovered and exploited causes the relationship to disappear. \u00a0That is the nature of arbitrage &#8211; market forces reduce arbitrage opportunities since they represent \u2018free money\u2019 and once enough people are doing it, it is no longer profitable. \u00a0However, biological organisms don\u2019t response to gaming the system in that manner. \u00a0For a conclusive diagnosis, there may exist an efficacious treatment that is consistently reproducible. \u00a0In other words, for a particular condition in a particular patient with a particular set of characteristics (age, sex, demographics, disease processes, genetics) if accurately diagnosed and competently executed, we can expect a reproducible biologic response, optimally a total cure of the individual.\u00a0 And that reproducible response applies to processes present in the complex dynamic systems that comprise our healthcare delivery system.\u00a0 That is where the opportunity lies in applying predictive analytics to healthcare.<\/span><\/p>\n<p><span style=\"font: 13.0px Arial;\">(1) Technical Analysis of Stock Trends, Edwards and Magee, 8th Edition, St. Lucie Press<\/span><br \/>\n<span style=\"font: 13.0px Arial;\">(2) ACTIVE Technologies, acquired (taken private) by Vista Equity Partners and delisted on 11\/15\/2013. \u00a0You can\u2019t trade this stock. \u00a0\u00a0 <\/span><br \/>\n<span style=\"font: 13.0px Arial;\">(3) Head of Trading, First Chicago Bank, personal communication<\/span> <br style=\"font: 13.0px Arial;\" \/> <span style=\"font: 13.0px Arial;\">(4) Reminder &#8211; <a title=\"About me\" href=\"http:\/\/n2value.com\/blog\/about-me\/\" target=\"_blank\">see the disclaimer for this blog<\/a>!\u00a0 And if you think you are going to apply this particular technique to the markets to be the next George Soros, <a title=\"How to buy a piece of the brooklyn bridge\" href=\"http:\/\/www.museumofhoaxes.com\/hoax\/Hoaxipedia\/Brooklyn_Bridge_Scams\/\" target=\"_blank\">I&#8217;ve got a piece of the Brooklyn Bridge to sell you<\/a>.<br \/>\n<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>If you missed the first part, read it\u00a0 here.\u00a0 Note: For the HIT crowd reading this, I\u2019ll offer (rough) comparison to the\u00a0HIMSS stages (1-7). 1. \u00a0 \u00a0\u00a0Descriptive Analytics based upon historical data. \u00a0 \u00a0\u00a0 This was the most basic use of data analysis. \u00a0 When newspapers printed price data (Open-High-Low-Close or OHLC), that data could [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"New: What Medicine can learn from Wall Street - Part 2 - evolution of data analysis  http:\/\/wp.me\/p4mtfP-2i6 #HITsm","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false},"version":2}},"categories":[4,8,2,6],"tags":[],"class_list":["post-8810","post","type-post","status-publish","format-standard","hentry","category-data-science","category-finance","category-healthcare","category-process-analytics"],"jetpack_publicize_connections":[],"aioseo_notices":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p4mtfP-2i6","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/n2value.com\/blog\/wp-json\/wp\/v2\/posts\/8810","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/n2value.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/n2value.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/n2value.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/n2value.com\/blog\/wp-json\/wp\/v2\/comments?post=8810"}],"version-history":[{"count":27,"href":"https:\/\/n2value.com\/blog\/wp-json\/wp\/v2\/posts\/8810\/revisions"}],"predecessor-version":[{"id":9018,"href":"https:\/\/n2value.com\/blog\/wp-json\/wp\/v2\/posts\/8810\/revisions\/9018"}],"wp:attachment":[{"href":"https:\/\/n2value.com\/blog\/wp-json\/wp\/v2\/media?parent=8810"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/n2value.com\/blog\/wp-json\/wp\/v2\/categories?post=8810"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/n2value.com\/blog\/wp-json\/wp\/v2\/tags?post=8810"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}