Why is PCA sensitive to outliers?





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty{ margin-bottom:0;
}






up vote
4
down vote

favorite
1












There are many posts on this SE that discuss robust approaches to Principal Component Analysis (PCA) but I cannot find a single good explanation of why PCA is sensitive to outliers in the first place?










share|cite|improve this question









New contributor




Psi is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.


























    up vote
    4
    down vote

    favorite
    1












    There are many posts on this SE that discuss robust approaches to Principal Component Analysis (PCA) but I cannot find a single good explanation of why PCA is sensitive to outliers in the first place?










    share|cite|improve this question









    New contributor




    Psi is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






















      up vote
      4
      down vote

      favorite
      1









      up vote
      4
      down vote

      favorite
      1






      1





      There are many posts on this SE that discuss robust approaches to Principal Component Analysis (PCA) but I cannot find a single good explanation of why PCA is sensitive to outliers in the first place?










      share|cite|improve this question









      New contributor




      Psi is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      There are many posts on this SE that discuss robust approaches to Principal Component Analysis (PCA) but I cannot find a single good explanation of why PCA is sensitive to outliers in the first place?







      machine-learning pca outliers






      share|cite|improve this question









      New contributor




      Psi is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|cite|improve this question









      New contributor




      Psi is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|cite|improve this question




      share|cite|improve this question








      edited 3 hours ago





















      New contributor




      Psi is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 3 hours ago









      Psi

      1235




      1235




      New contributor




      Psi is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Psi is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Psi is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          5
          down vote



          accepted










          One of the reasons is that PCA can be thought as low-rank decomposition of the data that minimizes the sum of $L_2$ norms of the residuals of the decomposition. I.e. if $Y$ is your data ($m$ vectors of $n$ dimensions), and $X$ is the PCA basis ($k$ vectors of $n$ dimensions), then the decomposition will strictly minimize
          $$lVert Y-XA rVert^2_F = sum_{j=1}^{m} lVert Y_j - X A_{j.} rVert^2 $$
          Here $A$ is the matrix of coefficients of PCA decomposition and $lVert
          cdot rVert_F$
          is a Frobenius norm of the matrix



          Because the PCA minimizes the $L_2$ norms (i.e. quadratic norms) it has the same issues a least-squares or fitting a Gaussian by being sensitive to outliers.






          share|cite|improve this answer























          • Thank you so much, what an awesome reply! This was exactly what I was looking for and everything makes so much sense the way you explained it!!
            – Psi
            38 mins ago











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "65"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });






          Psi is a new contributor. Be nice, and check out our Code of Conduct.










           

          draft saved


          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f378751%2fwhy-is-pca-sensitive-to-outliers%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          5
          down vote



          accepted










          One of the reasons is that PCA can be thought as low-rank decomposition of the data that minimizes the sum of $L_2$ norms of the residuals of the decomposition. I.e. if $Y$ is your data ($m$ vectors of $n$ dimensions), and $X$ is the PCA basis ($k$ vectors of $n$ dimensions), then the decomposition will strictly minimize
          $$lVert Y-XA rVert^2_F = sum_{j=1}^{m} lVert Y_j - X A_{j.} rVert^2 $$
          Here $A$ is the matrix of coefficients of PCA decomposition and $lVert
          cdot rVert_F$
          is a Frobenius norm of the matrix



          Because the PCA minimizes the $L_2$ norms (i.e. quadratic norms) it has the same issues a least-squares or fitting a Gaussian by being sensitive to outliers.






          share|cite|improve this answer























          • Thank you so much, what an awesome reply! This was exactly what I was looking for and everything makes so much sense the way you explained it!!
            – Psi
            38 mins ago















          up vote
          5
          down vote



          accepted










          One of the reasons is that PCA can be thought as low-rank decomposition of the data that minimizes the sum of $L_2$ norms of the residuals of the decomposition. I.e. if $Y$ is your data ($m$ vectors of $n$ dimensions), and $X$ is the PCA basis ($k$ vectors of $n$ dimensions), then the decomposition will strictly minimize
          $$lVert Y-XA rVert^2_F = sum_{j=1}^{m} lVert Y_j - X A_{j.} rVert^2 $$
          Here $A$ is the matrix of coefficients of PCA decomposition and $lVert
          cdot rVert_F$
          is a Frobenius norm of the matrix



          Because the PCA minimizes the $L_2$ norms (i.e. quadratic norms) it has the same issues a least-squares or fitting a Gaussian by being sensitive to outliers.






          share|cite|improve this answer























          • Thank you so much, what an awesome reply! This was exactly what I was looking for and everything makes so much sense the way you explained it!!
            – Psi
            38 mins ago













          up vote
          5
          down vote



          accepted







          up vote
          5
          down vote



          accepted






          One of the reasons is that PCA can be thought as low-rank decomposition of the data that minimizes the sum of $L_2$ norms of the residuals of the decomposition. I.e. if $Y$ is your data ($m$ vectors of $n$ dimensions), and $X$ is the PCA basis ($k$ vectors of $n$ dimensions), then the decomposition will strictly minimize
          $$lVert Y-XA rVert^2_F = sum_{j=1}^{m} lVert Y_j - X A_{j.} rVert^2 $$
          Here $A$ is the matrix of coefficients of PCA decomposition and $lVert
          cdot rVert_F$
          is a Frobenius norm of the matrix



          Because the PCA minimizes the $L_2$ norms (i.e. quadratic norms) it has the same issues a least-squares or fitting a Gaussian by being sensitive to outliers.






          share|cite|improve this answer














          One of the reasons is that PCA can be thought as low-rank decomposition of the data that minimizes the sum of $L_2$ norms of the residuals of the decomposition. I.e. if $Y$ is your data ($m$ vectors of $n$ dimensions), and $X$ is the PCA basis ($k$ vectors of $n$ dimensions), then the decomposition will strictly minimize
          $$lVert Y-XA rVert^2_F = sum_{j=1}^{m} lVert Y_j - X A_{j.} rVert^2 $$
          Here $A$ is the matrix of coefficients of PCA decomposition and $lVert
          cdot rVert_F$
          is a Frobenius norm of the matrix



          Because the PCA minimizes the $L_2$ norms (i.e. quadratic norms) it has the same issues a least-squares or fitting a Gaussian by being sensitive to outliers.







          share|cite|improve this answer














          share|cite|improve this answer



          share|cite|improve this answer








          edited 32 mins ago









          dsaxton

          9,39811535




          9,39811535










          answered 47 mins ago









          sega_sai

          43538




          43538












          • Thank you so much, what an awesome reply! This was exactly what I was looking for and everything makes so much sense the way you explained it!!
            – Psi
            38 mins ago


















          • Thank you so much, what an awesome reply! This was exactly what I was looking for and everything makes so much sense the way you explained it!!
            – Psi
            38 mins ago
















          Thank you so much, what an awesome reply! This was exactly what I was looking for and everything makes so much sense the way you explained it!!
          – Psi
          38 mins ago




          Thank you so much, what an awesome reply! This was exactly what I was looking for and everything makes so much sense the way you explained it!!
          – Psi
          38 mins ago










          Psi is a new contributor. Be nice, and check out our Code of Conduct.










           

          draft saved


          draft discarded


















          Psi is a new contributor. Be nice, and check out our Code of Conduct.













          Psi is a new contributor. Be nice, and check out our Code of Conduct.












          Psi is a new contributor. Be nice, and check out our Code of Conduct.















           


          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f378751%2fwhy-is-pca-sensitive-to-outliers%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How to ignore python UserWarning in pytest?

          What visual should I use to simply compare current year value vs last year in Power BI desktop

          Script to remove string up to first number