What is the most fair way to scale grades?












7














In many universities, professors scale or "curve" grades at the end to ensure (among other things) that there is no grade inflation. I'm interested in studying "fair" ways of doing this from a mathematical standpoint.



Let $S = {X_1, X_2 cdots X_k}$ where $X_i in [0,100]$ be the multiset of grades for a given class. A $textit{scale}$ $S'$ of $S$ is some other multiset $S'={phi(X_1), phi(X_2), cdots phi(X_k)}$ where $phi:[0,100] to [0,100]$ is some function. We say a scale is fair if $phi$ is monotone increasing. Given two fair scales $S'$ and $S''$ with respective scale-functions $phi, psi$, we say $S'$ is fairer than $S''$ if $sum_i |phi(X_i) - X_i| leq sum_i |psi(X_i) - X_i|$




Let us suppose that the professor wants to scale the grades such that the mean grade is $70 pm 5 %$. Given the above definitions, which scale function $phi$ should he choose to ensure the scale is as fair as possible? If there's not a simple function that always works, is there an algorithm or a strategy that might be helpful?




This is, of course, but one model. There's also issues of subjectivity associated with the word "fairness". Perhaps there's some notion of "fairness" that this model doesn't quite capture. If so, please mention it.
My opinion is that the "fairest" way of scaling is ensuring that the scaling preserves the original order, and disturbs the original dataset as little as possible.



One other possible notion (which you may consider if you are interested in, but not specifically the one I've chosen to ask about) is considering the double sum $$sum_{i,k} left||phi(X_i) - X_i| - |phi(X_k) - X_k|right|$$ and trying to minimize this among all possible (fair/monotone) scale functions $phi$. With my original model above, a scale is "fair" if it doesn't disturb the original dataset much. With this above model, a scale may disturb the original dataset a lot, but it still might be quite fair so long as students' grade are all altered a similar amount (for instance, a fixed scale of $20$%).




Feel free to discuss other mathematically rigorous notions of "fair" scaling which you believe are pertinent, or possibly cite relevant literature.











share|cite|improve this question






















  • If it weren't for the restriction that $phi : [0,100] to [0,100]$, then a trivial solution in the form $phi(x) = x+c$ would work.
    – JimmyK4542
    20 hours ago










  • Perhaps related: academia.stackexchange.com/questions/61038/…
    – HRSE
    6 hours ago






  • 1




    This might be presented as a math problem and considered as a such, but ultimately this is not something you can universally answer. There are multiple factors that one takes into the account when devising a grading scheme (and it starts with creation of the test, scoring system is just one part of a whole) and they can be considered fair or unfair by different people. I also think that this question would be better suited at the Academia SO.
    – MatthewRock
    5 hours ago
















7














In many universities, professors scale or "curve" grades at the end to ensure (among other things) that there is no grade inflation. I'm interested in studying "fair" ways of doing this from a mathematical standpoint.



Let $S = {X_1, X_2 cdots X_k}$ where $X_i in [0,100]$ be the multiset of grades for a given class. A $textit{scale}$ $S'$ of $S$ is some other multiset $S'={phi(X_1), phi(X_2), cdots phi(X_k)}$ where $phi:[0,100] to [0,100]$ is some function. We say a scale is fair if $phi$ is monotone increasing. Given two fair scales $S'$ and $S''$ with respective scale-functions $phi, psi$, we say $S'$ is fairer than $S''$ if $sum_i |phi(X_i) - X_i| leq sum_i |psi(X_i) - X_i|$




Let us suppose that the professor wants to scale the grades such that the mean grade is $70 pm 5 %$. Given the above definitions, which scale function $phi$ should he choose to ensure the scale is as fair as possible? If there's not a simple function that always works, is there an algorithm or a strategy that might be helpful?




This is, of course, but one model. There's also issues of subjectivity associated with the word "fairness". Perhaps there's some notion of "fairness" that this model doesn't quite capture. If so, please mention it.
My opinion is that the "fairest" way of scaling is ensuring that the scaling preserves the original order, and disturbs the original dataset as little as possible.



One other possible notion (which you may consider if you are interested in, but not specifically the one I've chosen to ask about) is considering the double sum $$sum_{i,k} left||phi(X_i) - X_i| - |phi(X_k) - X_k|right|$$ and trying to minimize this among all possible (fair/monotone) scale functions $phi$. With my original model above, a scale is "fair" if it doesn't disturb the original dataset much. With this above model, a scale may disturb the original dataset a lot, but it still might be quite fair so long as students' grade are all altered a similar amount (for instance, a fixed scale of $20$%).




Feel free to discuss other mathematically rigorous notions of "fair" scaling which you believe are pertinent, or possibly cite relevant literature.











share|cite|improve this question






















  • If it weren't for the restriction that $phi : [0,100] to [0,100]$, then a trivial solution in the form $phi(x) = x+c$ would work.
    – JimmyK4542
    20 hours ago










  • Perhaps related: academia.stackexchange.com/questions/61038/…
    – HRSE
    6 hours ago






  • 1




    This might be presented as a math problem and considered as a such, but ultimately this is not something you can universally answer. There are multiple factors that one takes into the account when devising a grading scheme (and it starts with creation of the test, scoring system is just one part of a whole) and they can be considered fair or unfair by different people. I also think that this question would be better suited at the Academia SO.
    – MatthewRock
    5 hours ago














7












7








7


2





In many universities, professors scale or "curve" grades at the end to ensure (among other things) that there is no grade inflation. I'm interested in studying "fair" ways of doing this from a mathematical standpoint.



Let $S = {X_1, X_2 cdots X_k}$ where $X_i in [0,100]$ be the multiset of grades for a given class. A $textit{scale}$ $S'$ of $S$ is some other multiset $S'={phi(X_1), phi(X_2), cdots phi(X_k)}$ where $phi:[0,100] to [0,100]$ is some function. We say a scale is fair if $phi$ is monotone increasing. Given two fair scales $S'$ and $S''$ with respective scale-functions $phi, psi$, we say $S'$ is fairer than $S''$ if $sum_i |phi(X_i) - X_i| leq sum_i |psi(X_i) - X_i|$




Let us suppose that the professor wants to scale the grades such that the mean grade is $70 pm 5 %$. Given the above definitions, which scale function $phi$ should he choose to ensure the scale is as fair as possible? If there's not a simple function that always works, is there an algorithm or a strategy that might be helpful?




This is, of course, but one model. There's also issues of subjectivity associated with the word "fairness". Perhaps there's some notion of "fairness" that this model doesn't quite capture. If so, please mention it.
My opinion is that the "fairest" way of scaling is ensuring that the scaling preserves the original order, and disturbs the original dataset as little as possible.



One other possible notion (which you may consider if you are interested in, but not specifically the one I've chosen to ask about) is considering the double sum $$sum_{i,k} left||phi(X_i) - X_i| - |phi(X_k) - X_k|right|$$ and trying to minimize this among all possible (fair/monotone) scale functions $phi$. With my original model above, a scale is "fair" if it doesn't disturb the original dataset much. With this above model, a scale may disturb the original dataset a lot, but it still might be quite fair so long as students' grade are all altered a similar amount (for instance, a fixed scale of $20$%).




Feel free to discuss other mathematically rigorous notions of "fair" scaling which you believe are pertinent, or possibly cite relevant literature.











share|cite|improve this question













In many universities, professors scale or "curve" grades at the end to ensure (among other things) that there is no grade inflation. I'm interested in studying "fair" ways of doing this from a mathematical standpoint.



Let $S = {X_1, X_2 cdots X_k}$ where $X_i in [0,100]$ be the multiset of grades for a given class. A $textit{scale}$ $S'$ of $S$ is some other multiset $S'={phi(X_1), phi(X_2), cdots phi(X_k)}$ where $phi:[0,100] to [0,100]$ is some function. We say a scale is fair if $phi$ is monotone increasing. Given two fair scales $S'$ and $S''$ with respective scale-functions $phi, psi$, we say $S'$ is fairer than $S''$ if $sum_i |phi(X_i) - X_i| leq sum_i |psi(X_i) - X_i|$




Let us suppose that the professor wants to scale the grades such that the mean grade is $70 pm 5 %$. Given the above definitions, which scale function $phi$ should he choose to ensure the scale is as fair as possible? If there's not a simple function that always works, is there an algorithm or a strategy that might be helpful?




This is, of course, but one model. There's also issues of subjectivity associated with the word "fairness". Perhaps there's some notion of "fairness" that this model doesn't quite capture. If so, please mention it.
My opinion is that the "fairest" way of scaling is ensuring that the scaling preserves the original order, and disturbs the original dataset as little as possible.



One other possible notion (which you may consider if you are interested in, but not specifically the one I've chosen to ask about) is considering the double sum $$sum_{i,k} left||phi(X_i) - X_i| - |phi(X_k) - X_k|right|$$ and trying to minimize this among all possible (fair/monotone) scale functions $phi$. With my original model above, a scale is "fair" if it doesn't disturb the original dataset much. With this above model, a scale may disturb the original dataset a lot, but it still might be quite fair so long as students' grade are all altered a similar amount (for instance, a fixed scale of $20$%).




Feel free to discuss other mathematically rigorous notions of "fair" scaling which you believe are pertinent, or possibly cite relevant literature.








statistics optimization






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked 20 hours ago









MathematicsStudent1122

8,13122364




8,13122364












  • If it weren't for the restriction that $phi : [0,100] to [0,100]$, then a trivial solution in the form $phi(x) = x+c$ would work.
    – JimmyK4542
    20 hours ago










  • Perhaps related: academia.stackexchange.com/questions/61038/…
    – HRSE
    6 hours ago






  • 1




    This might be presented as a math problem and considered as a such, but ultimately this is not something you can universally answer. There are multiple factors that one takes into the account when devising a grading scheme (and it starts with creation of the test, scoring system is just one part of a whole) and they can be considered fair or unfair by different people. I also think that this question would be better suited at the Academia SO.
    – MatthewRock
    5 hours ago


















  • If it weren't for the restriction that $phi : [0,100] to [0,100]$, then a trivial solution in the form $phi(x) = x+c$ would work.
    – JimmyK4542
    20 hours ago










  • Perhaps related: academia.stackexchange.com/questions/61038/…
    – HRSE
    6 hours ago






  • 1




    This might be presented as a math problem and considered as a such, but ultimately this is not something you can universally answer. There are multiple factors that one takes into the account when devising a grading scheme (and it starts with creation of the test, scoring system is just one part of a whole) and they can be considered fair or unfair by different people. I also think that this question would be better suited at the Academia SO.
    – MatthewRock
    5 hours ago
















If it weren't for the restriction that $phi : [0,100] to [0,100]$, then a trivial solution in the form $phi(x) = x+c$ would work.
– JimmyK4542
20 hours ago




If it weren't for the restriction that $phi : [0,100] to [0,100]$, then a trivial solution in the form $phi(x) = x+c$ would work.
– JimmyK4542
20 hours ago












Perhaps related: academia.stackexchange.com/questions/61038/…
– HRSE
6 hours ago




Perhaps related: academia.stackexchange.com/questions/61038/…
– HRSE
6 hours ago




1




1




This might be presented as a math problem and considered as a such, but ultimately this is not something you can universally answer. There are multiple factors that one takes into the account when devising a grading scheme (and it starts with creation of the test, scoring system is just one part of a whole) and they can be considered fair or unfair by different people. I also think that this question would be better suited at the Academia SO.
– MatthewRock
5 hours ago




This might be presented as a math problem and considered as a such, but ultimately this is not something you can universally answer. There are multiple factors that one takes into the account when devising a grading scheme (and it starts with creation of the test, scoring system is just one part of a whole) and they can be considered fair or unfair by different people. I also think that this question would be better suited at the Academia SO.
– MatthewRock
5 hours ago










3 Answers
3






active

oldest

votes


















20














I think the unfortunate truth is that the only fair scaling of grades is to not scale them at all.



Outside of the mathematical framework you want to consider, curving or scaling grades can only penalize those students who work hard and would have otherwise received high grades. Particularly in the case of a flat curve (where everyone gets $+x%$), I find it to be the definition of unfairness that someone could receive an A when they only did enough correct work to earn a B, or heaven forbid a C.



But the unfairness doesn't extend to just the student body; if there are scholarships tied to GPAs on the line, organizations might end up misspending money on students that aren't actually doing the work that they should. Employers might end up passing over a candidate with fewer credentials (but who would be a better fit) because they think that someone else has a better transcript. And so on...



But, even in the context of the model you have presented, in both of the metrics you have proposed the map $phi:[0,100]rightarrow[0,100]$ which is the "fairest" is just the identity map. By not curving at all, you are always guaranteed to be fair.



Now, you can argue that in order for the identity scale to be fair the professor has to do their job correctly and adequately, and that the inability of universities to promise that professors are doing their jobs well is why we tolerate curves, but I think the solution should simply be to fire those people who can't teach, or at the very least don't let them teach anything, rather than alter the metric by which we judge mastery of topics, particularly when the rest of society has to use that metric to decide who gets the contract to build that bridge (or any other "important" function that an individual might serve).






share|cite|improve this answer










New contributor




ImNotTheGuy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.














  • 1




    1) this is not an attempt to answer the question, and 2) the points made herein are both seriously impractical ("fire those who can't teach") and just plain wrong ("I find it to be the definition of unfairness..."). In particular, what does "they only did enough work to receive a B or C" mean? They haven't gotten a grade yet. And finally, your answer seems to ignore the main point of grades: not as an absolute measure of ability, but as a relative measure of how well a student succeeds in a particular subject when compared to his peers, as opposed to how well the student does in a different...
    – DreamConspiracy
    9 hours ago






  • 1




    ... subject when compared to his peers. Curving grades is the only way to make this comparison fair. One example that illustrates this quite well is my highschool physics class. Because the class was effectively mandatory, the average grade on most exams (even in an honors class at a very good school) was only about 60%. To assign all these hardworking students D's and F's is, in my opinion, the definition of unfairness.
    – DreamConspiracy
    9 hours ago










  • @DreamConspiracy "The main point of grades: not as an absolute measure of ability, but as a relative measure of how well a student succeeds in a particular subject when compared to his peers". Granted, the American system is very foreign to me, but where I'm come from students are assessed in absolute terms, not relative to each other. In an exam, for instance, each question in is worth a percentage and the grade is sum of these percentages.
    – Git Gud
    6 hours ago










  • @GitGud but this is not in absolute terms either. If students were genuinely being assessed in absolute terms, then they would be given the final exam of the 4th year course at the end of their first class. In reality, students are being assessed relative to what they are expected to know and be able to do. The argument in favor of curving is then that if grades are too low, too much was expected of them, and if grades are too high, not enough was expected of them. In other words, the class average becomes the only viable way to measure the feasibility of the exam.
    – DreamConspiracy
    5 hours ago






  • 1




    I agree with @DreamConspiracy and am truly amazed how many people support this answer. There is a reason why reference letter submission forms for students always contain questions about percentiles. Not rescaling grades to match some department standard is a terrible idea both because it incentivizes students to take "sweet" courses and because it makes educational success unnecessarily random. After all, to my knowledge schools tend not to add random noise to student's grades after teachers report scores, do they?
    – HRSE
    3 hours ago



















2














This is a cute little problem. I have several things to say about it. Before I do, let's introduce some notation.



Define $d_phi=sum_{i=1}^n|phi(X_i)-X_i|$, and let $[a,b]$ denote the target class average. (You have set $[a,b]=[65,75]$, but the numerical values don't really matter as to the structure of the problem.) Without loss of generality, suppose $X_1leq X_2leqcdotsleq X_n$.



(1) Notice that we don't really need to find a function on $[0,100]$. Rather, we just need a function from $S$ into $[0,100]$.



Obviously, if $mathbb{E}(S)in[a,b]$ then we let $phi$ be the identity operator. The remaining cases are where $mathbb{E}(S)<a$ or $mathbb{E}(S)>b$. But...



(2) Note that under realistic circumstances we must always have $phi(X_i)geq X_i$. With this additional constraint, it may not be possible to find $phi$ satisfying $mathbb{E}[phi(S)]in A$. In particular, if $mathbb{E}(S)>b$ and $phi$ is anything but the identity map, then $phi$ will only decrease fairness (i.e., increase $d_phi$) while separating the class average further away from the target range. The only case that remains is where $mathbb{E}(S)<a$.



(3) If $mathbb{E}(S)<a$ then we can minimize $d_phi$ subject to the constraint $mathbb{E}[phi(S)]in[a,b]$ by guaranteeing
$$sum_{i=1}^inftyphi(X_i)=na.$$
Clearly, such a function $phi$ exists, and is not unique.



(4) Note that ideally we would also wish to minimize the quantity
$$|phi(S)-S|_infty=min_i|phi(X_i)-X_i|.$$
In fact, in real life I should think that this is a greater priority than minimizing $d_phi$. However it turns out that there is a function $phi$ which will minimize both. For instance, we could simply find $cgeq 0$ such that $mathbb{E}(S+c)=a$, provided $X_nleq 100-c$. Of course, this may not work in general since we might have $X_n>100-c$.



Fortunately, this is not a great difficulty. The function $phi=phi_c$ is now given by the following:
$$phi_c(X_i)=left{begin{array}{ll}X_i+c&text{ if }X_i<100-c,\100&text{ if }X_igeq100-c,end{array}right.$$
where $mathbb{E}[phi_c(X_i)]=a$. There is a unique solution to this problem, and although it is annoying to compute in general, it's quite easy to compute given some concrete set $S$.



(5) Let's get back to real life. A grading scale is stipulated by the syllabus, which is a contract between instructor and students. And although it is technically permitted for an instructor to go Darth Vader and alter the deal at the last minute, it's almost always a very bad idea.



If you have any freedom for curving, you should look at the students rather than use a silly math formula. You should ask yourself, "judging from my impression of his work, is Joe student ready to pass this course?" People like to pretend that grading is objective. It's not. You have to make judgment calls. Math can help you with that, but at the end of the day you have to make your best call.






share|cite|improve this answer





















  • I completely disagree with your last paragraph. A despot may also say "Some people would like taxation to be objective. It's not, I tax whomever I (don't) like." Our evaluations of students have massive impacts on their lifetime outcomes and even knowing that it's a "Joe" rather than a "Mary" or an "Abdul" has been shown to greatly distort such evaluations. We should do everything we can to make this procedure as objective as possible. Or would you like to be told that you only went to community college rather than Harvard just because your math teacher made your exam a bit tough that year?
    – HRSE
    3 hours ago



















0














This answer proposes another definition of a "fair scale" and "fair exam" than the one proposed in the question.



We can approach this question from an information theoretic standpoint. In this perspective, grades should be informative about some underlying quality of students to solve certain problems. As such, there is a "true distribution" of various skills and abilities students have. Unfortunately, most likely these qualities are multidimensional but we need to "compress" these into an ordinal scale. This entails making some strange judgments such as "making a typo in an equation is 0.3 times as bad as accidentally multiplying both sides by zero". But suppose we have obtained some acceptable scale expressed as integer scores from 0 to 100. I am suspicious of cardinal scales and therefore I attach only ordinal meaning to these numbers (for now).



Importantly, if we would observe the results for the entire student pool there would be no need to ever rescale. The need to rescale arises because we observe different exams (information structures about the hypothetical score of students on "the true scale") for different parts of the student pool and want to make the scores between students comparable. In particular, if I believe to have set two similarly difficult exams to two random samples of 1000 students but in one case all students receive 0 points and in the other exam all students receive 100 points, then I should revise my belief about whether I truly set similarly difficult exams. If the sample size for each group is only 10 students, I won't update my belief as much and will be more reluctant to rescale the exam.



Now let's say for simplicity that we have observed the scores of two exams for the entire pool (or for each exam a sufficiently large sample) of students. Let's suppose that a density estimate of the distribution of scores of each exam is given by $s_1:[0,100]rightarrow mathbb{R}$ and $s_2:[0,100]rightarrow mathbb{R}$ with $int s_i(x)dx=1$. Now we are looking for transformations $t_1:[0,100]rightarrow [0,100]$ and $t_2$ to make the exams comparable.



As posted in the question, there is a strong case for preserving the order of the scores, thus $t_1$ and $t_2$ should be strictly increasing functions. However, I do not see why there should be a strong case for maintaining point differences or maximizing an objective as given in the question. If we were unable to set an exam such that the distribution of scores is equal to our target distribution for a large student sample, then there is no good reason to attach any cardinal meaning to these scores. However, we want to make the scores a) comparable to each other and b) loose as little information in this process as possible. I therefore propose to impose that



a) Comparability holds: $s_i(t_i(x)) = p^*(x)$ for all $xin [0,100], iin {1,2}$ and



b) Minimal information loss holds:
$$p^* = arg min_p sum_i D_{KL}(p,s_i)$$
where $D_{KL}$ is the Kullback Leibler distance of the distributions.



In practice, we of course observe the different exams year by year so one has to fix $p$ before the exams are taken. My simple rule of thumb for this is to try to get as close as possible to the maximum entropy distribution (uniform) of scores with both exams and adjusted scores. Ideally, I would only want to report percentiles to the department office. Unfortunately, rules such as "students below some cutoff fail and need to retake" prevent me from doing this and require choosing a different $p^*$ instead. (This means that to minimize the KL distance one has to adjust exams to match the target distribution rather than the other way around.) Also, it is hard to explain to students why one uses a crazy wiggle of a function to rescale scores, so $t_i$ tends to be "smoothed out" a bit.



tl;dr: My own idea on "fair grading and scaling" is that the exam should be designed to have a maximum entropy of scores from an ex ante perspective. Ex post, once I learn that an exam was too hard/easy I look for an order preserving map which yields the targeted distribution if many students have taken the exam. Once there are only few students in class... ...things become complicated.






share|cite|improve this answer





















    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "69"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3052313%2fwhat-is-the-most-fair-way-to-scale-grades%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    20














    I think the unfortunate truth is that the only fair scaling of grades is to not scale them at all.



    Outside of the mathematical framework you want to consider, curving or scaling grades can only penalize those students who work hard and would have otherwise received high grades. Particularly in the case of a flat curve (where everyone gets $+x%$), I find it to be the definition of unfairness that someone could receive an A when they only did enough correct work to earn a B, or heaven forbid a C.



    But the unfairness doesn't extend to just the student body; if there are scholarships tied to GPAs on the line, organizations might end up misspending money on students that aren't actually doing the work that they should. Employers might end up passing over a candidate with fewer credentials (but who would be a better fit) because they think that someone else has a better transcript. And so on...



    But, even in the context of the model you have presented, in both of the metrics you have proposed the map $phi:[0,100]rightarrow[0,100]$ which is the "fairest" is just the identity map. By not curving at all, you are always guaranteed to be fair.



    Now, you can argue that in order for the identity scale to be fair the professor has to do their job correctly and adequately, and that the inability of universities to promise that professors are doing their jobs well is why we tolerate curves, but I think the solution should simply be to fire those people who can't teach, or at the very least don't let them teach anything, rather than alter the metric by which we judge mastery of topics, particularly when the rest of society has to use that metric to decide who gets the contract to build that bridge (or any other "important" function that an individual might serve).






    share|cite|improve this answer










    New contributor




    ImNotTheGuy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.














    • 1




      1) this is not an attempt to answer the question, and 2) the points made herein are both seriously impractical ("fire those who can't teach") and just plain wrong ("I find it to be the definition of unfairness..."). In particular, what does "they only did enough work to receive a B or C" mean? They haven't gotten a grade yet. And finally, your answer seems to ignore the main point of grades: not as an absolute measure of ability, but as a relative measure of how well a student succeeds in a particular subject when compared to his peers, as opposed to how well the student does in a different...
      – DreamConspiracy
      9 hours ago






    • 1




      ... subject when compared to his peers. Curving grades is the only way to make this comparison fair. One example that illustrates this quite well is my highschool physics class. Because the class was effectively mandatory, the average grade on most exams (even in an honors class at a very good school) was only about 60%. To assign all these hardworking students D's and F's is, in my opinion, the definition of unfairness.
      – DreamConspiracy
      9 hours ago










    • @DreamConspiracy "The main point of grades: not as an absolute measure of ability, but as a relative measure of how well a student succeeds in a particular subject when compared to his peers". Granted, the American system is very foreign to me, but where I'm come from students are assessed in absolute terms, not relative to each other. In an exam, for instance, each question in is worth a percentage and the grade is sum of these percentages.
      – Git Gud
      6 hours ago










    • @GitGud but this is not in absolute terms either. If students were genuinely being assessed in absolute terms, then they would be given the final exam of the 4th year course at the end of their first class. In reality, students are being assessed relative to what they are expected to know and be able to do. The argument in favor of curving is then that if grades are too low, too much was expected of them, and if grades are too high, not enough was expected of them. In other words, the class average becomes the only viable way to measure the feasibility of the exam.
      – DreamConspiracy
      5 hours ago






    • 1




      I agree with @DreamConspiracy and am truly amazed how many people support this answer. There is a reason why reference letter submission forms for students always contain questions about percentiles. Not rescaling grades to match some department standard is a terrible idea both because it incentivizes students to take "sweet" courses and because it makes educational success unnecessarily random. After all, to my knowledge schools tend not to add random noise to student's grades after teachers report scores, do they?
      – HRSE
      3 hours ago
















    20














    I think the unfortunate truth is that the only fair scaling of grades is to not scale them at all.



    Outside of the mathematical framework you want to consider, curving or scaling grades can only penalize those students who work hard and would have otherwise received high grades. Particularly in the case of a flat curve (where everyone gets $+x%$), I find it to be the definition of unfairness that someone could receive an A when they only did enough correct work to earn a B, or heaven forbid a C.



    But the unfairness doesn't extend to just the student body; if there are scholarships tied to GPAs on the line, organizations might end up misspending money on students that aren't actually doing the work that they should. Employers might end up passing over a candidate with fewer credentials (but who would be a better fit) because they think that someone else has a better transcript. And so on...



    But, even in the context of the model you have presented, in both of the metrics you have proposed the map $phi:[0,100]rightarrow[0,100]$ which is the "fairest" is just the identity map. By not curving at all, you are always guaranteed to be fair.



    Now, you can argue that in order for the identity scale to be fair the professor has to do their job correctly and adequately, and that the inability of universities to promise that professors are doing their jobs well is why we tolerate curves, but I think the solution should simply be to fire those people who can't teach, or at the very least don't let them teach anything, rather than alter the metric by which we judge mastery of topics, particularly when the rest of society has to use that metric to decide who gets the contract to build that bridge (or any other "important" function that an individual might serve).






    share|cite|improve this answer










    New contributor




    ImNotTheGuy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.














    • 1




      1) this is not an attempt to answer the question, and 2) the points made herein are both seriously impractical ("fire those who can't teach") and just plain wrong ("I find it to be the definition of unfairness..."). In particular, what does "they only did enough work to receive a B or C" mean? They haven't gotten a grade yet. And finally, your answer seems to ignore the main point of grades: not as an absolute measure of ability, but as a relative measure of how well a student succeeds in a particular subject when compared to his peers, as opposed to how well the student does in a different...
      – DreamConspiracy
      9 hours ago






    • 1




      ... subject when compared to his peers. Curving grades is the only way to make this comparison fair. One example that illustrates this quite well is my highschool physics class. Because the class was effectively mandatory, the average grade on most exams (even in an honors class at a very good school) was only about 60%. To assign all these hardworking students D's and F's is, in my opinion, the definition of unfairness.
      – DreamConspiracy
      9 hours ago










    • @DreamConspiracy "The main point of grades: not as an absolute measure of ability, but as a relative measure of how well a student succeeds in a particular subject when compared to his peers". Granted, the American system is very foreign to me, but where I'm come from students are assessed in absolute terms, not relative to each other. In an exam, for instance, each question in is worth a percentage and the grade is sum of these percentages.
      – Git Gud
      6 hours ago










    • @GitGud but this is not in absolute terms either. If students were genuinely being assessed in absolute terms, then they would be given the final exam of the 4th year course at the end of their first class. In reality, students are being assessed relative to what they are expected to know and be able to do. The argument in favor of curving is then that if grades are too low, too much was expected of them, and if grades are too high, not enough was expected of them. In other words, the class average becomes the only viable way to measure the feasibility of the exam.
      – DreamConspiracy
      5 hours ago






    • 1




      I agree with @DreamConspiracy and am truly amazed how many people support this answer. There is a reason why reference letter submission forms for students always contain questions about percentiles. Not rescaling grades to match some department standard is a terrible idea both because it incentivizes students to take "sweet" courses and because it makes educational success unnecessarily random. After all, to my knowledge schools tend not to add random noise to student's grades after teachers report scores, do they?
      – HRSE
      3 hours ago














    20












    20








    20






    I think the unfortunate truth is that the only fair scaling of grades is to not scale them at all.



    Outside of the mathematical framework you want to consider, curving or scaling grades can only penalize those students who work hard and would have otherwise received high grades. Particularly in the case of a flat curve (where everyone gets $+x%$), I find it to be the definition of unfairness that someone could receive an A when they only did enough correct work to earn a B, or heaven forbid a C.



    But the unfairness doesn't extend to just the student body; if there are scholarships tied to GPAs on the line, organizations might end up misspending money on students that aren't actually doing the work that they should. Employers might end up passing over a candidate with fewer credentials (but who would be a better fit) because they think that someone else has a better transcript. And so on...



    But, even in the context of the model you have presented, in both of the metrics you have proposed the map $phi:[0,100]rightarrow[0,100]$ which is the "fairest" is just the identity map. By not curving at all, you are always guaranteed to be fair.



    Now, you can argue that in order for the identity scale to be fair the professor has to do their job correctly and adequately, and that the inability of universities to promise that professors are doing their jobs well is why we tolerate curves, but I think the solution should simply be to fire those people who can't teach, or at the very least don't let them teach anything, rather than alter the metric by which we judge mastery of topics, particularly when the rest of society has to use that metric to decide who gets the contract to build that bridge (or any other "important" function that an individual might serve).






    share|cite|improve this answer










    New contributor




    ImNotTheGuy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.









    I think the unfortunate truth is that the only fair scaling of grades is to not scale them at all.



    Outside of the mathematical framework you want to consider, curving or scaling grades can only penalize those students who work hard and would have otherwise received high grades. Particularly in the case of a flat curve (where everyone gets $+x%$), I find it to be the definition of unfairness that someone could receive an A when they only did enough correct work to earn a B, or heaven forbid a C.



    But the unfairness doesn't extend to just the student body; if there are scholarships tied to GPAs on the line, organizations might end up misspending money on students that aren't actually doing the work that they should. Employers might end up passing over a candidate with fewer credentials (but who would be a better fit) because they think that someone else has a better transcript. And so on...



    But, even in the context of the model you have presented, in both of the metrics you have proposed the map $phi:[0,100]rightarrow[0,100]$ which is the "fairest" is just the identity map. By not curving at all, you are always guaranteed to be fair.



    Now, you can argue that in order for the identity scale to be fair the professor has to do their job correctly and adequately, and that the inability of universities to promise that professors are doing their jobs well is why we tolerate curves, but I think the solution should simply be to fire those people who can't teach, or at the very least don't let them teach anything, rather than alter the metric by which we judge mastery of topics, particularly when the rest of society has to use that metric to decide who gets the contract to build that bridge (or any other "important" function that an individual might serve).







    share|cite|improve this answer










    New contributor




    ImNotTheGuy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.









    share|cite|improve this answer



    share|cite|improve this answer








    edited 20 hours ago





















    New contributor




    ImNotTheGuy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.









    answered 20 hours ago









    ImNotTheGuy

    4066




    4066




    New contributor




    ImNotTheGuy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.





    New contributor





    ImNotTheGuy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






    ImNotTheGuy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.








    • 1




      1) this is not an attempt to answer the question, and 2) the points made herein are both seriously impractical ("fire those who can't teach") and just plain wrong ("I find it to be the definition of unfairness..."). In particular, what does "they only did enough work to receive a B or C" mean? They haven't gotten a grade yet. And finally, your answer seems to ignore the main point of grades: not as an absolute measure of ability, but as a relative measure of how well a student succeeds in a particular subject when compared to his peers, as opposed to how well the student does in a different...
      – DreamConspiracy
      9 hours ago






    • 1




      ... subject when compared to his peers. Curving grades is the only way to make this comparison fair. One example that illustrates this quite well is my highschool physics class. Because the class was effectively mandatory, the average grade on most exams (even in an honors class at a very good school) was only about 60%. To assign all these hardworking students D's and F's is, in my opinion, the definition of unfairness.
      – DreamConspiracy
      9 hours ago










    • @DreamConspiracy "The main point of grades: not as an absolute measure of ability, but as a relative measure of how well a student succeeds in a particular subject when compared to his peers". Granted, the American system is very foreign to me, but where I'm come from students are assessed in absolute terms, not relative to each other. In an exam, for instance, each question in is worth a percentage and the grade is sum of these percentages.
      – Git Gud
      6 hours ago










    • @GitGud but this is not in absolute terms either. If students were genuinely being assessed in absolute terms, then they would be given the final exam of the 4th year course at the end of their first class. In reality, students are being assessed relative to what they are expected to know and be able to do. The argument in favor of curving is then that if grades are too low, too much was expected of them, and if grades are too high, not enough was expected of them. In other words, the class average becomes the only viable way to measure the feasibility of the exam.
      – DreamConspiracy
      5 hours ago






    • 1




      I agree with @DreamConspiracy and am truly amazed how many people support this answer. There is a reason why reference letter submission forms for students always contain questions about percentiles. Not rescaling grades to match some department standard is a terrible idea both because it incentivizes students to take "sweet" courses and because it makes educational success unnecessarily random. After all, to my knowledge schools tend not to add random noise to student's grades after teachers report scores, do they?
      – HRSE
      3 hours ago














    • 1




      1) this is not an attempt to answer the question, and 2) the points made herein are both seriously impractical ("fire those who can't teach") and just plain wrong ("I find it to be the definition of unfairness..."). In particular, what does "they only did enough work to receive a B or C" mean? They haven't gotten a grade yet. And finally, your answer seems to ignore the main point of grades: not as an absolute measure of ability, but as a relative measure of how well a student succeeds in a particular subject when compared to his peers, as opposed to how well the student does in a different...
      – DreamConspiracy
      9 hours ago






    • 1




      ... subject when compared to his peers. Curving grades is the only way to make this comparison fair. One example that illustrates this quite well is my highschool physics class. Because the class was effectively mandatory, the average grade on most exams (even in an honors class at a very good school) was only about 60%. To assign all these hardworking students D's and F's is, in my opinion, the definition of unfairness.
      – DreamConspiracy
      9 hours ago










    • @DreamConspiracy "The main point of grades: not as an absolute measure of ability, but as a relative measure of how well a student succeeds in a particular subject when compared to his peers". Granted, the American system is very foreign to me, but where I'm come from students are assessed in absolute terms, not relative to each other. In an exam, for instance, each question in is worth a percentage and the grade is sum of these percentages.
      – Git Gud
      6 hours ago










    • @GitGud but this is not in absolute terms either. If students were genuinely being assessed in absolute terms, then they would be given the final exam of the 4th year course at the end of their first class. In reality, students are being assessed relative to what they are expected to know and be able to do. The argument in favor of curving is then that if grades are too low, too much was expected of them, and if grades are too high, not enough was expected of them. In other words, the class average becomes the only viable way to measure the feasibility of the exam.
      – DreamConspiracy
      5 hours ago






    • 1




      I agree with @DreamConspiracy and am truly amazed how many people support this answer. There is a reason why reference letter submission forms for students always contain questions about percentiles. Not rescaling grades to match some department standard is a terrible idea both because it incentivizes students to take "sweet" courses and because it makes educational success unnecessarily random. After all, to my knowledge schools tend not to add random noise to student's grades after teachers report scores, do they?
      – HRSE
      3 hours ago








    1




    1




    1) this is not an attempt to answer the question, and 2) the points made herein are both seriously impractical ("fire those who can't teach") and just plain wrong ("I find it to be the definition of unfairness..."). In particular, what does "they only did enough work to receive a B or C" mean? They haven't gotten a grade yet. And finally, your answer seems to ignore the main point of grades: not as an absolute measure of ability, but as a relative measure of how well a student succeeds in a particular subject when compared to his peers, as opposed to how well the student does in a different...
    – DreamConspiracy
    9 hours ago




    1) this is not an attempt to answer the question, and 2) the points made herein are both seriously impractical ("fire those who can't teach") and just plain wrong ("I find it to be the definition of unfairness..."). In particular, what does "they only did enough work to receive a B or C" mean? They haven't gotten a grade yet. And finally, your answer seems to ignore the main point of grades: not as an absolute measure of ability, but as a relative measure of how well a student succeeds in a particular subject when compared to his peers, as opposed to how well the student does in a different...
    – DreamConspiracy
    9 hours ago




    1




    1




    ... subject when compared to his peers. Curving grades is the only way to make this comparison fair. One example that illustrates this quite well is my highschool physics class. Because the class was effectively mandatory, the average grade on most exams (even in an honors class at a very good school) was only about 60%. To assign all these hardworking students D's and F's is, in my opinion, the definition of unfairness.
    – DreamConspiracy
    9 hours ago




    ... subject when compared to his peers. Curving grades is the only way to make this comparison fair. One example that illustrates this quite well is my highschool physics class. Because the class was effectively mandatory, the average grade on most exams (even in an honors class at a very good school) was only about 60%. To assign all these hardworking students D's and F's is, in my opinion, the definition of unfairness.
    – DreamConspiracy
    9 hours ago












    @DreamConspiracy "The main point of grades: not as an absolute measure of ability, but as a relative measure of how well a student succeeds in a particular subject when compared to his peers". Granted, the American system is very foreign to me, but where I'm come from students are assessed in absolute terms, not relative to each other. In an exam, for instance, each question in is worth a percentage and the grade is sum of these percentages.
    – Git Gud
    6 hours ago




    @DreamConspiracy "The main point of grades: not as an absolute measure of ability, but as a relative measure of how well a student succeeds in a particular subject when compared to his peers". Granted, the American system is very foreign to me, but where I'm come from students are assessed in absolute terms, not relative to each other. In an exam, for instance, each question in is worth a percentage and the grade is sum of these percentages.
    – Git Gud
    6 hours ago












    @GitGud but this is not in absolute terms either. If students were genuinely being assessed in absolute terms, then they would be given the final exam of the 4th year course at the end of their first class. In reality, students are being assessed relative to what they are expected to know and be able to do. The argument in favor of curving is then that if grades are too low, too much was expected of them, and if grades are too high, not enough was expected of them. In other words, the class average becomes the only viable way to measure the feasibility of the exam.
    – DreamConspiracy
    5 hours ago




    @GitGud but this is not in absolute terms either. If students were genuinely being assessed in absolute terms, then they would be given the final exam of the 4th year course at the end of their first class. In reality, students are being assessed relative to what they are expected to know and be able to do. The argument in favor of curving is then that if grades are too low, too much was expected of them, and if grades are too high, not enough was expected of them. In other words, the class average becomes the only viable way to measure the feasibility of the exam.
    – DreamConspiracy
    5 hours ago




    1




    1




    I agree with @DreamConspiracy and am truly amazed how many people support this answer. There is a reason why reference letter submission forms for students always contain questions about percentiles. Not rescaling grades to match some department standard is a terrible idea both because it incentivizes students to take "sweet" courses and because it makes educational success unnecessarily random. After all, to my knowledge schools tend not to add random noise to student's grades after teachers report scores, do they?
    – HRSE
    3 hours ago




    I agree with @DreamConspiracy and am truly amazed how many people support this answer. There is a reason why reference letter submission forms for students always contain questions about percentiles. Not rescaling grades to match some department standard is a terrible idea both because it incentivizes students to take "sweet" courses and because it makes educational success unnecessarily random. After all, to my knowledge schools tend not to add random noise to student's grades after teachers report scores, do they?
    – HRSE
    3 hours ago











    2














    This is a cute little problem. I have several things to say about it. Before I do, let's introduce some notation.



    Define $d_phi=sum_{i=1}^n|phi(X_i)-X_i|$, and let $[a,b]$ denote the target class average. (You have set $[a,b]=[65,75]$, but the numerical values don't really matter as to the structure of the problem.) Without loss of generality, suppose $X_1leq X_2leqcdotsleq X_n$.



    (1) Notice that we don't really need to find a function on $[0,100]$. Rather, we just need a function from $S$ into $[0,100]$.



    Obviously, if $mathbb{E}(S)in[a,b]$ then we let $phi$ be the identity operator. The remaining cases are where $mathbb{E}(S)<a$ or $mathbb{E}(S)>b$. But...



    (2) Note that under realistic circumstances we must always have $phi(X_i)geq X_i$. With this additional constraint, it may not be possible to find $phi$ satisfying $mathbb{E}[phi(S)]in A$. In particular, if $mathbb{E}(S)>b$ and $phi$ is anything but the identity map, then $phi$ will only decrease fairness (i.e., increase $d_phi$) while separating the class average further away from the target range. The only case that remains is where $mathbb{E}(S)<a$.



    (3) If $mathbb{E}(S)<a$ then we can minimize $d_phi$ subject to the constraint $mathbb{E}[phi(S)]in[a,b]$ by guaranteeing
    $$sum_{i=1}^inftyphi(X_i)=na.$$
    Clearly, such a function $phi$ exists, and is not unique.



    (4) Note that ideally we would also wish to minimize the quantity
    $$|phi(S)-S|_infty=min_i|phi(X_i)-X_i|.$$
    In fact, in real life I should think that this is a greater priority than minimizing $d_phi$. However it turns out that there is a function $phi$ which will minimize both. For instance, we could simply find $cgeq 0$ such that $mathbb{E}(S+c)=a$, provided $X_nleq 100-c$. Of course, this may not work in general since we might have $X_n>100-c$.



    Fortunately, this is not a great difficulty. The function $phi=phi_c$ is now given by the following:
    $$phi_c(X_i)=left{begin{array}{ll}X_i+c&text{ if }X_i<100-c,\100&text{ if }X_igeq100-c,end{array}right.$$
    where $mathbb{E}[phi_c(X_i)]=a$. There is a unique solution to this problem, and although it is annoying to compute in general, it's quite easy to compute given some concrete set $S$.



    (5) Let's get back to real life. A grading scale is stipulated by the syllabus, which is a contract between instructor and students. And although it is technically permitted for an instructor to go Darth Vader and alter the deal at the last minute, it's almost always a very bad idea.



    If you have any freedom for curving, you should look at the students rather than use a silly math formula. You should ask yourself, "judging from my impression of his work, is Joe student ready to pass this course?" People like to pretend that grading is objective. It's not. You have to make judgment calls. Math can help you with that, but at the end of the day you have to make your best call.






    share|cite|improve this answer





















    • I completely disagree with your last paragraph. A despot may also say "Some people would like taxation to be objective. It's not, I tax whomever I (don't) like." Our evaluations of students have massive impacts on their lifetime outcomes and even knowing that it's a "Joe" rather than a "Mary" or an "Abdul" has been shown to greatly distort such evaluations. We should do everything we can to make this procedure as objective as possible. Or would you like to be told that you only went to community college rather than Harvard just because your math teacher made your exam a bit tough that year?
      – HRSE
      3 hours ago
















    2














    This is a cute little problem. I have several things to say about it. Before I do, let's introduce some notation.



    Define $d_phi=sum_{i=1}^n|phi(X_i)-X_i|$, and let $[a,b]$ denote the target class average. (You have set $[a,b]=[65,75]$, but the numerical values don't really matter as to the structure of the problem.) Without loss of generality, suppose $X_1leq X_2leqcdotsleq X_n$.



    (1) Notice that we don't really need to find a function on $[0,100]$. Rather, we just need a function from $S$ into $[0,100]$.



    Obviously, if $mathbb{E}(S)in[a,b]$ then we let $phi$ be the identity operator. The remaining cases are where $mathbb{E}(S)<a$ or $mathbb{E}(S)>b$. But...



    (2) Note that under realistic circumstances we must always have $phi(X_i)geq X_i$. With this additional constraint, it may not be possible to find $phi$ satisfying $mathbb{E}[phi(S)]in A$. In particular, if $mathbb{E}(S)>b$ and $phi$ is anything but the identity map, then $phi$ will only decrease fairness (i.e., increase $d_phi$) while separating the class average further away from the target range. The only case that remains is where $mathbb{E}(S)<a$.



    (3) If $mathbb{E}(S)<a$ then we can minimize $d_phi$ subject to the constraint $mathbb{E}[phi(S)]in[a,b]$ by guaranteeing
    $$sum_{i=1}^inftyphi(X_i)=na.$$
    Clearly, such a function $phi$ exists, and is not unique.



    (4) Note that ideally we would also wish to minimize the quantity
    $$|phi(S)-S|_infty=min_i|phi(X_i)-X_i|.$$
    In fact, in real life I should think that this is a greater priority than minimizing $d_phi$. However it turns out that there is a function $phi$ which will minimize both. For instance, we could simply find $cgeq 0$ such that $mathbb{E}(S+c)=a$, provided $X_nleq 100-c$. Of course, this may not work in general since we might have $X_n>100-c$.



    Fortunately, this is not a great difficulty. The function $phi=phi_c$ is now given by the following:
    $$phi_c(X_i)=left{begin{array}{ll}X_i+c&text{ if }X_i<100-c,\100&text{ if }X_igeq100-c,end{array}right.$$
    where $mathbb{E}[phi_c(X_i)]=a$. There is a unique solution to this problem, and although it is annoying to compute in general, it's quite easy to compute given some concrete set $S$.



    (5) Let's get back to real life. A grading scale is stipulated by the syllabus, which is a contract between instructor and students. And although it is technically permitted for an instructor to go Darth Vader and alter the deal at the last minute, it's almost always a very bad idea.



    If you have any freedom for curving, you should look at the students rather than use a silly math formula. You should ask yourself, "judging from my impression of his work, is Joe student ready to pass this course?" People like to pretend that grading is objective. It's not. You have to make judgment calls. Math can help you with that, but at the end of the day you have to make your best call.






    share|cite|improve this answer





















    • I completely disagree with your last paragraph. A despot may also say "Some people would like taxation to be objective. It's not, I tax whomever I (don't) like." Our evaluations of students have massive impacts on their lifetime outcomes and even knowing that it's a "Joe" rather than a "Mary" or an "Abdul" has been shown to greatly distort such evaluations. We should do everything we can to make this procedure as objective as possible. Or would you like to be told that you only went to community college rather than Harvard just because your math teacher made your exam a bit tough that year?
      – HRSE
      3 hours ago














    2












    2








    2






    This is a cute little problem. I have several things to say about it. Before I do, let's introduce some notation.



    Define $d_phi=sum_{i=1}^n|phi(X_i)-X_i|$, and let $[a,b]$ denote the target class average. (You have set $[a,b]=[65,75]$, but the numerical values don't really matter as to the structure of the problem.) Without loss of generality, suppose $X_1leq X_2leqcdotsleq X_n$.



    (1) Notice that we don't really need to find a function on $[0,100]$. Rather, we just need a function from $S$ into $[0,100]$.



    Obviously, if $mathbb{E}(S)in[a,b]$ then we let $phi$ be the identity operator. The remaining cases are where $mathbb{E}(S)<a$ or $mathbb{E}(S)>b$. But...



    (2) Note that under realistic circumstances we must always have $phi(X_i)geq X_i$. With this additional constraint, it may not be possible to find $phi$ satisfying $mathbb{E}[phi(S)]in A$. In particular, if $mathbb{E}(S)>b$ and $phi$ is anything but the identity map, then $phi$ will only decrease fairness (i.e., increase $d_phi$) while separating the class average further away from the target range. The only case that remains is where $mathbb{E}(S)<a$.



    (3) If $mathbb{E}(S)<a$ then we can minimize $d_phi$ subject to the constraint $mathbb{E}[phi(S)]in[a,b]$ by guaranteeing
    $$sum_{i=1}^inftyphi(X_i)=na.$$
    Clearly, such a function $phi$ exists, and is not unique.



    (4) Note that ideally we would also wish to minimize the quantity
    $$|phi(S)-S|_infty=min_i|phi(X_i)-X_i|.$$
    In fact, in real life I should think that this is a greater priority than minimizing $d_phi$. However it turns out that there is a function $phi$ which will minimize both. For instance, we could simply find $cgeq 0$ such that $mathbb{E}(S+c)=a$, provided $X_nleq 100-c$. Of course, this may not work in general since we might have $X_n>100-c$.



    Fortunately, this is not a great difficulty. The function $phi=phi_c$ is now given by the following:
    $$phi_c(X_i)=left{begin{array}{ll}X_i+c&text{ if }X_i<100-c,\100&text{ if }X_igeq100-c,end{array}right.$$
    where $mathbb{E}[phi_c(X_i)]=a$. There is a unique solution to this problem, and although it is annoying to compute in general, it's quite easy to compute given some concrete set $S$.



    (5) Let's get back to real life. A grading scale is stipulated by the syllabus, which is a contract between instructor and students. And although it is technically permitted for an instructor to go Darth Vader and alter the deal at the last minute, it's almost always a very bad idea.



    If you have any freedom for curving, you should look at the students rather than use a silly math formula. You should ask yourself, "judging from my impression of his work, is Joe student ready to pass this course?" People like to pretend that grading is objective. It's not. You have to make judgment calls. Math can help you with that, but at the end of the day you have to make your best call.






    share|cite|improve this answer












    This is a cute little problem. I have several things to say about it. Before I do, let's introduce some notation.



    Define $d_phi=sum_{i=1}^n|phi(X_i)-X_i|$, and let $[a,b]$ denote the target class average. (You have set $[a,b]=[65,75]$, but the numerical values don't really matter as to the structure of the problem.) Without loss of generality, suppose $X_1leq X_2leqcdotsleq X_n$.



    (1) Notice that we don't really need to find a function on $[0,100]$. Rather, we just need a function from $S$ into $[0,100]$.



    Obviously, if $mathbb{E}(S)in[a,b]$ then we let $phi$ be the identity operator. The remaining cases are where $mathbb{E}(S)<a$ or $mathbb{E}(S)>b$. But...



    (2) Note that under realistic circumstances we must always have $phi(X_i)geq X_i$. With this additional constraint, it may not be possible to find $phi$ satisfying $mathbb{E}[phi(S)]in A$. In particular, if $mathbb{E}(S)>b$ and $phi$ is anything but the identity map, then $phi$ will only decrease fairness (i.e., increase $d_phi$) while separating the class average further away from the target range. The only case that remains is where $mathbb{E}(S)<a$.



    (3) If $mathbb{E}(S)<a$ then we can minimize $d_phi$ subject to the constraint $mathbb{E}[phi(S)]in[a,b]$ by guaranteeing
    $$sum_{i=1}^inftyphi(X_i)=na.$$
    Clearly, such a function $phi$ exists, and is not unique.



    (4) Note that ideally we would also wish to minimize the quantity
    $$|phi(S)-S|_infty=min_i|phi(X_i)-X_i|.$$
    In fact, in real life I should think that this is a greater priority than minimizing $d_phi$. However it turns out that there is a function $phi$ which will minimize both. For instance, we could simply find $cgeq 0$ such that $mathbb{E}(S+c)=a$, provided $X_nleq 100-c$. Of course, this may not work in general since we might have $X_n>100-c$.



    Fortunately, this is not a great difficulty. The function $phi=phi_c$ is now given by the following:
    $$phi_c(X_i)=left{begin{array}{ll}X_i+c&text{ if }X_i<100-c,\100&text{ if }X_igeq100-c,end{array}right.$$
    where $mathbb{E}[phi_c(X_i)]=a$. There is a unique solution to this problem, and although it is annoying to compute in general, it's quite easy to compute given some concrete set $S$.



    (5) Let's get back to real life. A grading scale is stipulated by the syllabus, which is a contract between instructor and students. And although it is technically permitted for an instructor to go Darth Vader and alter the deal at the last minute, it's almost always a very bad idea.



    If you have any freedom for curving, you should look at the students rather than use a silly math formula. You should ask yourself, "judging from my impression of his work, is Joe student ready to pass this course?" People like to pretend that grading is objective. It's not. You have to make judgment calls. Math can help you with that, but at the end of the day you have to make your best call.







    share|cite|improve this answer












    share|cite|improve this answer



    share|cite|improve this answer










    answered 19 hours ago









    Ben W

    1,426513




    1,426513












    • I completely disagree with your last paragraph. A despot may also say "Some people would like taxation to be objective. It's not, I tax whomever I (don't) like." Our evaluations of students have massive impacts on their lifetime outcomes and even knowing that it's a "Joe" rather than a "Mary" or an "Abdul" has been shown to greatly distort such evaluations. We should do everything we can to make this procedure as objective as possible. Or would you like to be told that you only went to community college rather than Harvard just because your math teacher made your exam a bit tough that year?
      – HRSE
      3 hours ago


















    • I completely disagree with your last paragraph. A despot may also say "Some people would like taxation to be objective. It's not, I tax whomever I (don't) like." Our evaluations of students have massive impacts on their lifetime outcomes and even knowing that it's a "Joe" rather than a "Mary" or an "Abdul" has been shown to greatly distort such evaluations. We should do everything we can to make this procedure as objective as possible. Or would you like to be told that you only went to community college rather than Harvard just because your math teacher made your exam a bit tough that year?
      – HRSE
      3 hours ago
















    I completely disagree with your last paragraph. A despot may also say "Some people would like taxation to be objective. It's not, I tax whomever I (don't) like." Our evaluations of students have massive impacts on their lifetime outcomes and even knowing that it's a "Joe" rather than a "Mary" or an "Abdul" has been shown to greatly distort such evaluations. We should do everything we can to make this procedure as objective as possible. Or would you like to be told that you only went to community college rather than Harvard just because your math teacher made your exam a bit tough that year?
    – HRSE
    3 hours ago




    I completely disagree with your last paragraph. A despot may also say "Some people would like taxation to be objective. It's not, I tax whomever I (don't) like." Our evaluations of students have massive impacts on their lifetime outcomes and even knowing that it's a "Joe" rather than a "Mary" or an "Abdul" has been shown to greatly distort such evaluations. We should do everything we can to make this procedure as objective as possible. Or would you like to be told that you only went to community college rather than Harvard just because your math teacher made your exam a bit tough that year?
    – HRSE
    3 hours ago











    0














    This answer proposes another definition of a "fair scale" and "fair exam" than the one proposed in the question.



    We can approach this question from an information theoretic standpoint. In this perspective, grades should be informative about some underlying quality of students to solve certain problems. As such, there is a "true distribution" of various skills and abilities students have. Unfortunately, most likely these qualities are multidimensional but we need to "compress" these into an ordinal scale. This entails making some strange judgments such as "making a typo in an equation is 0.3 times as bad as accidentally multiplying both sides by zero". But suppose we have obtained some acceptable scale expressed as integer scores from 0 to 100. I am suspicious of cardinal scales and therefore I attach only ordinal meaning to these numbers (for now).



    Importantly, if we would observe the results for the entire student pool there would be no need to ever rescale. The need to rescale arises because we observe different exams (information structures about the hypothetical score of students on "the true scale") for different parts of the student pool and want to make the scores between students comparable. In particular, if I believe to have set two similarly difficult exams to two random samples of 1000 students but in one case all students receive 0 points and in the other exam all students receive 100 points, then I should revise my belief about whether I truly set similarly difficult exams. If the sample size for each group is only 10 students, I won't update my belief as much and will be more reluctant to rescale the exam.



    Now let's say for simplicity that we have observed the scores of two exams for the entire pool (or for each exam a sufficiently large sample) of students. Let's suppose that a density estimate of the distribution of scores of each exam is given by $s_1:[0,100]rightarrow mathbb{R}$ and $s_2:[0,100]rightarrow mathbb{R}$ with $int s_i(x)dx=1$. Now we are looking for transformations $t_1:[0,100]rightarrow [0,100]$ and $t_2$ to make the exams comparable.



    As posted in the question, there is a strong case for preserving the order of the scores, thus $t_1$ and $t_2$ should be strictly increasing functions. However, I do not see why there should be a strong case for maintaining point differences or maximizing an objective as given in the question. If we were unable to set an exam such that the distribution of scores is equal to our target distribution for a large student sample, then there is no good reason to attach any cardinal meaning to these scores. However, we want to make the scores a) comparable to each other and b) loose as little information in this process as possible. I therefore propose to impose that



    a) Comparability holds: $s_i(t_i(x)) = p^*(x)$ for all $xin [0,100], iin {1,2}$ and



    b) Minimal information loss holds:
    $$p^* = arg min_p sum_i D_{KL}(p,s_i)$$
    where $D_{KL}$ is the Kullback Leibler distance of the distributions.



    In practice, we of course observe the different exams year by year so one has to fix $p$ before the exams are taken. My simple rule of thumb for this is to try to get as close as possible to the maximum entropy distribution (uniform) of scores with both exams and adjusted scores. Ideally, I would only want to report percentiles to the department office. Unfortunately, rules such as "students below some cutoff fail and need to retake" prevent me from doing this and require choosing a different $p^*$ instead. (This means that to minimize the KL distance one has to adjust exams to match the target distribution rather than the other way around.) Also, it is hard to explain to students why one uses a crazy wiggle of a function to rescale scores, so $t_i$ tends to be "smoothed out" a bit.



    tl;dr: My own idea on "fair grading and scaling" is that the exam should be designed to have a maximum entropy of scores from an ex ante perspective. Ex post, once I learn that an exam was too hard/easy I look for an order preserving map which yields the targeted distribution if many students have taken the exam. Once there are only few students in class... ...things become complicated.






    share|cite|improve this answer


























      0














      This answer proposes another definition of a "fair scale" and "fair exam" than the one proposed in the question.



      We can approach this question from an information theoretic standpoint. In this perspective, grades should be informative about some underlying quality of students to solve certain problems. As such, there is a "true distribution" of various skills and abilities students have. Unfortunately, most likely these qualities are multidimensional but we need to "compress" these into an ordinal scale. This entails making some strange judgments such as "making a typo in an equation is 0.3 times as bad as accidentally multiplying both sides by zero". But suppose we have obtained some acceptable scale expressed as integer scores from 0 to 100. I am suspicious of cardinal scales and therefore I attach only ordinal meaning to these numbers (for now).



      Importantly, if we would observe the results for the entire student pool there would be no need to ever rescale. The need to rescale arises because we observe different exams (information structures about the hypothetical score of students on "the true scale") for different parts of the student pool and want to make the scores between students comparable. In particular, if I believe to have set two similarly difficult exams to two random samples of 1000 students but in one case all students receive 0 points and in the other exam all students receive 100 points, then I should revise my belief about whether I truly set similarly difficult exams. If the sample size for each group is only 10 students, I won't update my belief as much and will be more reluctant to rescale the exam.



      Now let's say for simplicity that we have observed the scores of two exams for the entire pool (or for each exam a sufficiently large sample) of students. Let's suppose that a density estimate of the distribution of scores of each exam is given by $s_1:[0,100]rightarrow mathbb{R}$ and $s_2:[0,100]rightarrow mathbb{R}$ with $int s_i(x)dx=1$. Now we are looking for transformations $t_1:[0,100]rightarrow [0,100]$ and $t_2$ to make the exams comparable.



      As posted in the question, there is a strong case for preserving the order of the scores, thus $t_1$ and $t_2$ should be strictly increasing functions. However, I do not see why there should be a strong case for maintaining point differences or maximizing an objective as given in the question. If we were unable to set an exam such that the distribution of scores is equal to our target distribution for a large student sample, then there is no good reason to attach any cardinal meaning to these scores. However, we want to make the scores a) comparable to each other and b) loose as little information in this process as possible. I therefore propose to impose that



      a) Comparability holds: $s_i(t_i(x)) = p^*(x)$ for all $xin [0,100], iin {1,2}$ and



      b) Minimal information loss holds:
      $$p^* = arg min_p sum_i D_{KL}(p,s_i)$$
      where $D_{KL}$ is the Kullback Leibler distance of the distributions.



      In practice, we of course observe the different exams year by year so one has to fix $p$ before the exams are taken. My simple rule of thumb for this is to try to get as close as possible to the maximum entropy distribution (uniform) of scores with both exams and adjusted scores. Ideally, I would only want to report percentiles to the department office. Unfortunately, rules such as "students below some cutoff fail and need to retake" prevent me from doing this and require choosing a different $p^*$ instead. (This means that to minimize the KL distance one has to adjust exams to match the target distribution rather than the other way around.) Also, it is hard to explain to students why one uses a crazy wiggle of a function to rescale scores, so $t_i$ tends to be "smoothed out" a bit.



      tl;dr: My own idea on "fair grading and scaling" is that the exam should be designed to have a maximum entropy of scores from an ex ante perspective. Ex post, once I learn that an exam was too hard/easy I look for an order preserving map which yields the targeted distribution if many students have taken the exam. Once there are only few students in class... ...things become complicated.






      share|cite|improve this answer
























        0












        0








        0






        This answer proposes another definition of a "fair scale" and "fair exam" than the one proposed in the question.



        We can approach this question from an information theoretic standpoint. In this perspective, grades should be informative about some underlying quality of students to solve certain problems. As such, there is a "true distribution" of various skills and abilities students have. Unfortunately, most likely these qualities are multidimensional but we need to "compress" these into an ordinal scale. This entails making some strange judgments such as "making a typo in an equation is 0.3 times as bad as accidentally multiplying both sides by zero". But suppose we have obtained some acceptable scale expressed as integer scores from 0 to 100. I am suspicious of cardinal scales and therefore I attach only ordinal meaning to these numbers (for now).



        Importantly, if we would observe the results for the entire student pool there would be no need to ever rescale. The need to rescale arises because we observe different exams (information structures about the hypothetical score of students on "the true scale") for different parts of the student pool and want to make the scores between students comparable. In particular, if I believe to have set two similarly difficult exams to two random samples of 1000 students but in one case all students receive 0 points and in the other exam all students receive 100 points, then I should revise my belief about whether I truly set similarly difficult exams. If the sample size for each group is only 10 students, I won't update my belief as much and will be more reluctant to rescale the exam.



        Now let's say for simplicity that we have observed the scores of two exams for the entire pool (or for each exam a sufficiently large sample) of students. Let's suppose that a density estimate of the distribution of scores of each exam is given by $s_1:[0,100]rightarrow mathbb{R}$ and $s_2:[0,100]rightarrow mathbb{R}$ with $int s_i(x)dx=1$. Now we are looking for transformations $t_1:[0,100]rightarrow [0,100]$ and $t_2$ to make the exams comparable.



        As posted in the question, there is a strong case for preserving the order of the scores, thus $t_1$ and $t_2$ should be strictly increasing functions. However, I do not see why there should be a strong case for maintaining point differences or maximizing an objective as given in the question. If we were unable to set an exam such that the distribution of scores is equal to our target distribution for a large student sample, then there is no good reason to attach any cardinal meaning to these scores. However, we want to make the scores a) comparable to each other and b) loose as little information in this process as possible. I therefore propose to impose that



        a) Comparability holds: $s_i(t_i(x)) = p^*(x)$ for all $xin [0,100], iin {1,2}$ and



        b) Minimal information loss holds:
        $$p^* = arg min_p sum_i D_{KL}(p,s_i)$$
        where $D_{KL}$ is the Kullback Leibler distance of the distributions.



        In practice, we of course observe the different exams year by year so one has to fix $p$ before the exams are taken. My simple rule of thumb for this is to try to get as close as possible to the maximum entropy distribution (uniform) of scores with both exams and adjusted scores. Ideally, I would only want to report percentiles to the department office. Unfortunately, rules such as "students below some cutoff fail and need to retake" prevent me from doing this and require choosing a different $p^*$ instead. (This means that to minimize the KL distance one has to adjust exams to match the target distribution rather than the other way around.) Also, it is hard to explain to students why one uses a crazy wiggle of a function to rescale scores, so $t_i$ tends to be "smoothed out" a bit.



        tl;dr: My own idea on "fair grading and scaling" is that the exam should be designed to have a maximum entropy of scores from an ex ante perspective. Ex post, once I learn that an exam was too hard/easy I look for an order preserving map which yields the targeted distribution if many students have taken the exam. Once there are only few students in class... ...things become complicated.






        share|cite|improve this answer












        This answer proposes another definition of a "fair scale" and "fair exam" than the one proposed in the question.



        We can approach this question from an information theoretic standpoint. In this perspective, grades should be informative about some underlying quality of students to solve certain problems. As such, there is a "true distribution" of various skills and abilities students have. Unfortunately, most likely these qualities are multidimensional but we need to "compress" these into an ordinal scale. This entails making some strange judgments such as "making a typo in an equation is 0.3 times as bad as accidentally multiplying both sides by zero". But suppose we have obtained some acceptable scale expressed as integer scores from 0 to 100. I am suspicious of cardinal scales and therefore I attach only ordinal meaning to these numbers (for now).



        Importantly, if we would observe the results for the entire student pool there would be no need to ever rescale. The need to rescale arises because we observe different exams (information structures about the hypothetical score of students on "the true scale") for different parts of the student pool and want to make the scores between students comparable. In particular, if I believe to have set two similarly difficult exams to two random samples of 1000 students but in one case all students receive 0 points and in the other exam all students receive 100 points, then I should revise my belief about whether I truly set similarly difficult exams. If the sample size for each group is only 10 students, I won't update my belief as much and will be more reluctant to rescale the exam.



        Now let's say for simplicity that we have observed the scores of two exams for the entire pool (or for each exam a sufficiently large sample) of students. Let's suppose that a density estimate of the distribution of scores of each exam is given by $s_1:[0,100]rightarrow mathbb{R}$ and $s_2:[0,100]rightarrow mathbb{R}$ with $int s_i(x)dx=1$. Now we are looking for transformations $t_1:[0,100]rightarrow [0,100]$ and $t_2$ to make the exams comparable.



        As posted in the question, there is a strong case for preserving the order of the scores, thus $t_1$ and $t_2$ should be strictly increasing functions. However, I do not see why there should be a strong case for maintaining point differences or maximizing an objective as given in the question. If we were unable to set an exam such that the distribution of scores is equal to our target distribution for a large student sample, then there is no good reason to attach any cardinal meaning to these scores. However, we want to make the scores a) comparable to each other and b) loose as little information in this process as possible. I therefore propose to impose that



        a) Comparability holds: $s_i(t_i(x)) = p^*(x)$ for all $xin [0,100], iin {1,2}$ and



        b) Minimal information loss holds:
        $$p^* = arg min_p sum_i D_{KL}(p,s_i)$$
        where $D_{KL}$ is the Kullback Leibler distance of the distributions.



        In practice, we of course observe the different exams year by year so one has to fix $p$ before the exams are taken. My simple rule of thumb for this is to try to get as close as possible to the maximum entropy distribution (uniform) of scores with both exams and adjusted scores. Ideally, I would only want to report percentiles to the department office. Unfortunately, rules such as "students below some cutoff fail and need to retake" prevent me from doing this and require choosing a different $p^*$ instead. (This means that to minimize the KL distance one has to adjust exams to match the target distribution rather than the other way around.) Also, it is hard to explain to students why one uses a crazy wiggle of a function to rescale scores, so $t_i$ tends to be "smoothed out" a bit.



        tl;dr: My own idea on "fair grading and scaling" is that the exam should be designed to have a maximum entropy of scores from an ex ante perspective. Ex post, once I learn that an exam was too hard/easy I look for an order preserving map which yields the targeted distribution if many students have taken the exam. Once there are only few students in class... ...things become complicated.







        share|cite|improve this answer












        share|cite|improve this answer



        share|cite|improve this answer










        answered 4 hours ago









        HRSE

        208110




        208110






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Mathematics Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3052313%2fwhat-is-the-most-fair-way-to-scale-grades%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            What visual should I use to simply compare current year value vs last year in Power BI desktop

            How to ignore python UserWarning in pytest?

            Alexandru Averescu