{"id":2386,"date":"2014-03-10T11:28:58","date_gmt":"2014-03-10T15:28:58","guid":{"rendered":"http:\/\/n2value.com\/blog\/?p=2386"},"modified":"2018-02-02T07:07:13","modified_gmt":"2018-02-02T12:07:13","slug":"where-im-going-with-this-blog","status":"publish","type":"post","link":"https:\/\/n2value.com\/blog\/where-im-going-with-this-blog\/","title":{"rendered":"The trap of overfitting data &#8211; future directions in the blog (2014)"},"content":{"rendered":"<p><span style=\"font: 13.0px Arial;\">Quick thought for this Monday morning\u2026.<\/span><\/p>\n<p><a href=\"http:\/\/n2value.com\/blog\/wp-content\/uploads\/2014\/03\/Overfitting-.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2387\" src=\"http:\/\/n2value.com\/blog\/wp-content\/uploads\/2014\/03\/Overfitting-.jpg\" alt=\"Overfitting\" width=\"755\" height=\"566\" srcset=\"https:\/\/n2value.com\/blog\/wp-content\/uploads\/2014\/03\/Overfitting-.jpg 755w, https:\/\/n2value.com\/blog\/wp-content\/uploads\/2014\/03\/Overfitting--300x224.jpg 300w\" sizes=\"auto, (max-width: 755px) 100vw, 755px\" \/><\/a><\/p>\n<p><span style=\"font: 13.0px Arial;\">I\u2019m taking Trevor Hastie&#8217;s and Robert Tibshirani\u2019s fantastic Statistical Learning course online through Stanford. <\/span><br \/>\n<span style=\"font: 13.0px Arial;\">The slide here is great &#8211; and shows the danger of complex models and overfitting. \u00a0 For the data scientists &#8211; cross validation. \u00a0If any system traders or finance people are reading, think walk-forward analysis. \u00a0<\/span><\/p>\n<p><span style=\"font: 13.0px Arial;\">Basically, what the graph says is that when you apply a better and better model with higher levels of refinement to your system, you \u2018fit\u2019 your established data more accurately. \u00a0However, because the system is more complex, it is more rigid and less flexible (degrees of freedom, anyone?) and less <span style=\"text-decoration: underline;\">resilient<\/span>. \u00a0Tracks the data better, but works less well in practice. \u00a0That\u2019s why the red line starts going up again as the scale of complexity goes from low to high. \u00a0Does this resonate with any of the process improvement (PI) people who are six-sigma trained? \u00a0Once you scoop up the low-hanging fruit and pass that first or second sigma in iteration, things get tougher. \u00a0A <strong>lot<\/strong> tougher. \u00a0<\/span><\/p>\n<p><span style=\"font: 13.0px Arial;\">This is the trap of curve-fitting (also known as <span style=\"text-decoration: underline;\">overfitting<\/span>). \u00a0More in this case is less, as the model fails to be predictive. \u00a0<\/span><\/p>\n<p><span style=\"font: 13.0px Arial;\">Where I\u2019m going in this blog is to integrate some of these concepts with resource allocation problems in healthcare; and their resulting effect on patient care. \u00a0 I\u2019m particularly interested in applying these techniques to predictive analytics in healthcare. \u00a0I think we can learn a great deal from the practical applications of these computational statistic tools which have been successfully applied (I know because I started my career doing it) to the markets on Wall St. \u00a0 What medicine can learn from Wall St. is a topic I intend to cover\u00a0 in a series of posts. But healthcare is not the markets, and can\u2019t be approached entirely similarly. \u00a0 The costs of error are catastrophic in terms of lives &#8211; it&#8217;s not just about money. \u00a0<\/span><\/p>\n<p><span style=\"font: 13.0px Arial;\">I hope you stay with me as I develop this theme.\u00a0<\/span><\/p>\n<p><span style=\"font: 13.0px Arial;\">Image from :<\/span><span style=\"font: 13.0px Arial; color: #042eee;\"><span style=\"text-decoration: underline;\">https:\/\/class.stanford.edu\/courses\/HumanitiesScience\/StatLearning\/Winter2014\/about<\/span><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Quick thought for this Monday morning\u2026. I\u2019m taking Trevor Hastie&#8217;s and Robert Tibshirani\u2019s fantastic Statistical Learning course online through Stanford. The slide here is great &#8211; and shows the danger of complex models and overfitting. \u00a0 For the data scientists &#8211; cross validation. \u00a0If any system traders or finance people are reading, think walk-forward analysis. [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"New post on #datascience in #healthcare: Where I'm going with this -","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false},"version":2}},"categories":[4,2,6],"tags":[],"class_list":["post-2386","post","type-post","status-publish","format-standard","hentry","category-data-science","category-healthcare","category-process-analytics"],"jetpack_publicize_connections":[],"aioseo_notices":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p4mtfP-Cu","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/n2value.com\/blog\/wp-json\/wp\/v2\/posts\/2386","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/n2value.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/n2value.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/n2value.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/n2value.com\/blog\/wp-json\/wp\/v2\/comments?post=2386"}],"version-history":[{"count":5,"href":"https:\/\/n2value.com\/blog\/wp-json\/wp\/v2\/posts\/2386\/revisions"}],"predecessor-version":[{"id":13639,"href":"https:\/\/n2value.com\/blog\/wp-json\/wp\/v2\/posts\/2386\/revisions\/13639"}],"wp:attachment":[{"href":"https:\/\/n2value.com\/blog\/wp-json\/wp\/v2\/media?parent=2386"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/n2value.com\/blog\/wp-json\/wp\/v2\/categories?post=2386"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/n2value.com\/blog\/wp-json\/wp\/v2\/tags?post=2386"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}