
In the field of machine learning and specifically the problem of statistical classification, a confusion matrix, also known as error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one; in unsupervised learning it is usually called a matching matrix.
Each row of the matrix represents the instances in an actual class while each column represents the instances in a predicted class, or vice versa – both variants are found in the literature. The diagonal of the matrix therefore represents all instances that are correctly predicted. The name stems from the fact that it makes it easy to see whether the system is confusing two classes (i.e. commonly mislabeling one as another).
It is a special kind of contingency table, with two dimensions ("actual" and "predicted"), and identical sets of "classes" in both dimensions (each combination of dimension and class is a variable in the contingency table).
Example
Given a sample of 12 individuals, 8 that have been diagnosed with cancer and 4 that are cancer-free, where individuals with cancer belong to class 1 (positive) and non-cancer individuals belong to class 0 (negative), we can display that data as follows:
Individual Number | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Actual Classification | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
Assume that we have a classifier that distinguishes between individuals with and without cancer in some way, we can take the 12 individuals and run them through the classifier. The classifier then makes 9 accurate predictions and misses 3: 2 individuals with cancer wrongly predicted as being cancer-free (sample 1 and 2), and 1 person without cancer that is wrongly predicted to have cancer (sample 9).
Individual Number | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Actual Classification | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
Predicted Classification | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 |
Notice, that if we compare the actual classification set to the predicted classification set, there are 4 different outcomes that could result in any particular column. One, if the actual classification is positive and the predicted classification is positive (1,1), this is called a true positive result because the positive sample was correctly identified by the classifier. Two, if the actual classification is positive and the predicted classification is negative (1,0), this is called a false negative result because the positive sample is incorrectly identified by the classifier as being negative. Third, if the actual classification is negative and the predicted classification is positive (0,1), this is called a false positive result because the negative sample is incorrectly identified by the classifier as being positive. Fourth, if the actual classification is negative and the predicted classification is negative (0,0), this is called a true negative result because the negative sample gets correctly identified by the classifier.
We can then perform the comparison between actual and predicted classifications and add this information to the table, making correct results appear in green so they are more easily identifiable.
Individual Number | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Actual Classification | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
Predicted Classification | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 |
Result | FN | FN | TP | TP | TP | TP | TP | TP | FP | TN | TN | TN |
The template for any binary confusion matrix uses the four kinds of results discussed above (true positives, false negatives, false positives, and true negatives) along with the positive and negative classifications. The four outcomes can be formulated in a 2×2 confusion matrix, as follows:
Predicted condition | |||
Total population = P + N | Positive (PP) | Negative (PN) | |
Actual condition | Positive (P) | True positive (TP) | False negative (FN) |
Negative (N) | False positive (FP) | True negative (TN) | |
Sources: |
The color convention of the three data tables above were picked to match this confusion matrix, in order to easily differentiate the data.
Now, we can simply total up each type of result, substitute into the template, and create a confusion matrix that will concisely summarize the results of testing the classifier:
Predicted condition | |||
Total 8 + 4 = 12 | Cancer 7 | Non-cancer 5 | |
Actual condition | Cancer 8 | 6 | 2 |
Non-cancer 4 | 1 | 3 |
In this confusion matrix, of the 8 samples with cancer, the system judged that 2 were cancer-free, and of the 4 samples without cancer, it predicted that 1 did have cancer. All correct predictions are located in the diagonal of the table (highlighted in green), so it is easy to visually inspect the table for prediction errors, as values outside the diagonal will represent them. By summing up the 2 rows of the confusion matrix, one can also deduce the total number of positive (P) and negative (N) samples in the original dataset, i.e. and
.
Table of confusion
In predictive analytics, a table of confusion (sometimes also called a confusion matrix) is a table with two rows and two columns that reports the number of true positives, false negatives, false positives, and true negatives. This allows more detailed analysis than simply observing the proportion of correct classifications (accuracy). Accuracy will yield misleading results if the data set is unbalanced; that is, when the numbers of observations in different classes vary greatly.
For example, if there were 95 cancer samples and only 5 non-cancer samples in the data, a particular classifier might classify all the observations as having cancer. The overall accuracy would be 95%, but in more detail the classifier would have a 100% recognition rate (sensitivity) for the cancer class but a 0% recognition rate for the non-cancer class. F1 score is even more unreliable in such cases, and here would yield over 97.4%, whereas informedness removes such bias and yields 0 as the probability of an informed decision for any form of guessing (here always guessing cancer).
According to Davide Chicco and Giuseppe Jurman, the most informative metric to evaluate a confusion matrix is the Matthews correlation coefficient (MCC).
Other metrics can be included in a confusion matrix, each of them having their significance and use.
Predicted condition | Sources: | ||||
Total population = P + N | Predicted positive | Predicted negative | Informedness, bookmaker informedness (BM) = TPR + TNR − 1 | Prevalence threshold (PT) = √TPR × FPR - FPR/TPR - FPR | |
Actual condition | Positive (P) | True positive (TP), hit | False negative (FN), miss, underestimation | True positive rate (TPR), recall, sensitivity (SEN), probability of detection, hit rate, power = TP/P = 1 − FNR | False negative rate (FNR), miss rate type II error = FN/P = 1 − TPR |
Negative (N) | False positive (FP), false alarm, overestimation | True negative (TN), correct rejection | False positive rate (FPR), probability of false alarm, fall-out type I error = FP/N = 1 − TNR | True negative rate (TNR), specificity (SPC), selectivity = TN/N = 1 − FPR | |
Prevalence = P/P + N | Positive predictive value (PPV), precision = TP/TP + FP = 1 − FDR | False omission rate (FOR) = FN/TN + FN = 1 − NPV | Positive likelihood ratio (LR+) = TPR/FPR | Negative likelihood ratio (LR−) = FNR/TNR | |
Accuracy (ACC) = TP + TN/P + N | False discovery rate (FDR) = FP/TP + FP = 1 − PPV | Negative predictive value (NPV) = TN/TN + FN = 1 − FOR | Markedness (MK), deltaP (Δp) = PPV + NPV − 1 | Diagnostic odds ratio (DOR) = LR+/LR− | |
Balanced accuracy (BA) = TPR + TNR/2 | F1 score = 2 PPV × TPR/PPV + TPR = 2 TP/2 TP + FP + FN | Fowlkes–Mallows index (FM) = √PPV × TPR | Matthews correlation coefficient (MCC) = √TPR × TNR × PPV × NPV - √FNR × FPR × FOR × FDR | Threat score (TS), critical success index (CSI), Jaccard index = TP/TP + FN + FP |
- the number of real positive cases in the data
- A test result that correctly indicates the presence of a condition or characteristic
- Type II error: A test result which wrongly indicates that a particular condition or attribute is absent
- the number of real negative cases in the data
- A test result that correctly indicates the absence of a condition or characteristic
- Type I error: A test result which wrongly indicates that a particular condition or attribute is present
Confusion matrices with more than two categories
Confusion matrix is not limited to binary classification and can be used in multi-class classifiers as well. The confusion matrices discussed above have only two conditions: positive and negative. For example, the table below summarizes communication of a whistled language between two speakers, with zero values omitted for clarity.
Perceived vowel Vowel produced | i | e | a | o | u |
---|---|---|---|---|---|
i | 15 | 1 | |||
e | 1 | 1 | |||
a | 79 | 5 | |||
o | 4 | 15 | 3 | ||
u | 2 | 2 |
See also
- Positive and negative predictive values
References
- Stehman, Stephen V. (1997). "Selecting and interpreting measures of thematic classification accuracy". Remote Sensing of Environment. 62 (1): 77–89. Bibcode:1997RSEnv..62...77S. doi:10.1016/S0034-4257(97)00083-7.
- Powers, David M. W. (2011). "Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation". Journal of Machine Learning Technologies. 2 (1): 37–63. S2CID 55767944.
- Opitz, Juri (2024). "A Closer Look at Classification Evaluation Metrics and a Critical Reflection of Common Evaluation Practice". Transactions of the Association for Computational Linguistics. 12: 820–836. arXiv:2404.16958. doi:10.1162/tacl_a_00675.
- Provost, Foster; Fawcett, Tom (2013). Data science for business: what you need to know about data mining and data-analytic thinking (1. ed., 2. release ed.). Beijing Köln: O'Reilly. ISBN 978-1-4493-6132-7.
- Fawcett, Tom (2006). "An Introduction to ROC Analysis" (PDF). Pattern Recognition Letters. 27 (8): 861–874. Bibcode:2006PaReL..27..861F. doi:10.1016/j.patrec.2005.10.010. S2CID 2027090.
- Powers, David M. W. (2011). "Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation". Journal of Machine Learning Technologies. 2 (1): 37–63.
- Ting, Kai Ming (2011). Sammut, Claude; Webb, Geoffrey I. (eds.). Encyclopedia of machine learning. Springer. doi:10.1007/978-0-387-30164-8. ISBN 978-0-387-30164-8.
- Brooks, Harold; Brown, Barb; Ebert, Beth; Ferro, Chris; Jolliffe, Ian; Koh, Tieh-Yong; Roebber, Paul; Stephenson, David (2015-01-26). "WWRP/WGNE Joint Working Group on Forecast Verification Research". Collaboration for Australian Weather and Climate Research. World Meteorological Organisation. Retrieved 2019-07-17.
- Chicco D, Jurman G (January 2020). "The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation". BMC Genomics. 21 (1): 6-1–6-13. doi:10.1186/s12864-019-6413-7. PMC 6941312. PMID 31898477.
- Tharwat A. (August 2018). "Classification assessment methods". Applied Computing and Informatics. 17: 168–192. doi:10.1016/j.aci.2018.08.003.
- Chicco D, Jurman G (January 2020). "The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation". BMC Genomics. 21 (1): 6-1–6-13. doi:10.1186/s12864-019-6413-7. PMC 6941312. PMID 31898477.
- Fawcett, Tom (2006). "An Introduction to ROC Analysis" (PDF). Pattern Recognition Letters. 27 (8): 861–874. doi:10.1016/j.patrec.2005.10.010. S2CID 2027090.
- Provost, Foster; Tom Fawcett (2013-08-01). "Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking". O'Reilly Media, Inc.
- Powers, David M. W. (2011). "Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation". Journal of Machine Learning Technologies. 2 (1): 37–63.
- Ting, Kai Ming (2011). Sammut, Claude; Webb, Geoffrey I. (eds.). Encyclopedia of machine learning. Springer. doi:10.1007/978-0-387-30164-8. ISBN 978-0-387-30164-8.
- Brooks, Harold; Brown, Barb; Ebert, Beth; Ferro, Chris; Jolliffe, Ian; Koh, Tieh-Yong; Roebber, Paul; Stephenson, David (2015-01-26). "WWRP/WGNE Joint Working Group on Forecast Verification Research". Collaboration for Australian Weather and Climate Research. World Meteorological Organisation. Retrieved 2019-07-17.
- Chicco D, Jurman G (January 2020). "The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation". BMC Genomics. 21 (1): 6-1–6-13. doi:10.1186/s12864-019-6413-7. PMC 6941312. PMID 31898477.
- Chicco D, Toetsch N, Jurman G (February 2021). "The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation". BioData Mining. 14 (13): 13. doi:10.1186/s13040-021-00244-z. PMC 7863449. PMID 33541410.
- Tharwat A. (August 2018). "Classification assessment methods". Applied Computing and Informatics. 17: 168–192. doi:10.1016/j.aci.2018.08.003.
- Rialland, Annie (August 2005). "Phonological and phonetic aspects of whistled languages". Phonology. 22 (2): 237–271. CiteSeerX 10.1.1.484.4384. doi:10.1017/S0952675705000552. S2CID 18615779.
In the field of machine learning and specifically the problem of statistical classification a confusion matrix also known as error matrix is a specific table layout that allows visualization of the performance of an algorithm typically a supervised learning one in unsupervised learning it is usually called a matching matrix Each row of the matrix represents the instances in an actual class while each column represents the instances in a predicted class or vice versa both variants are found in the literature The diagonal of the matrix therefore represents all instances that are correctly predicted The name stems from the fact that it makes it easy to see whether the system is confusing two classes i e commonly mislabeling one as another It is a special kind of contingency table with two dimensions actual and predicted and identical sets of classes in both dimensions each combination of dimension and class is a variable in the contingency table ExampleGiven a sample of 12 individuals 8 that have been diagnosed with cancer and 4 that are cancer free where individuals with cancer belong to class 1 positive and non cancer individuals belong to class 0 negative we can display that data as follows Individual Number 1 2 3 4 5 6 7 8 9 10 11 12Actual Classification 1 1 1 1 1 1 1 1 0 0 0 0 Assume that we have a classifier that distinguishes between individuals with and without cancer in some way we can take the 12 individuals and run them through the classifier The classifier then makes 9 accurate predictions and misses 3 2 individuals with cancer wrongly predicted as being cancer free sample 1 and 2 and 1 person without cancer that is wrongly predicted to have cancer sample 9 Individual Number 1 2 3 4 5 6 7 8 9 10 11 12Actual Classification 1 1 1 1 1 1 1 1 0 0 0 0Predicted Classification 0 0 1 1 1 1 1 1 1 0 0 0 Notice that if we compare the actual classification set to the predicted classification set there are 4 different outcomes that could result in any particular column One if the actual classification is positive and the predicted classification is positive 1 1 this is called a true positive result because the positive sample was correctly identified by the classifier Two if the actual classification is positive and the predicted classification is negative 1 0 this is called a false negative result because the positive sample is incorrectly identified by the classifier as being negative Third if the actual classification is negative and the predicted classification is positive 0 1 this is called a false positive result because the negative sample is incorrectly identified by the classifier as being positive Fourth if the actual classification is negative and the predicted classification is negative 0 0 this is called a true negative result because the negative sample gets correctly identified by the classifier We can then perform the comparison between actual and predicted classifications and add this information to the table making correct results appear in green so they are more easily identifiable Individual Number 1 2 3 4 5 6 7 8 9 10 11 12Actual Classification 1 1 1 1 1 1 1 1 0 0 0 0Predicted Classification 0 0 1 1 1 1 1 1 1 0 0 0Result FN FN TP TP TP TP TP TP FP TN TN TN The template for any binary confusion matrix uses the four kinds of results discussed above true positives false negatives false positives and true negatives along with the positive and negative classifications The four outcomes can be formulated in a 2 2 confusion matrix as follows Predicted conditionTotal population P N Positive PP Negative PN Actual condition Positive P True positive TP False negative FN Negative N False positive FP True negative TN Sources The color convention of the three data tables above were picked to match this confusion matrix in order to easily differentiate the data Now we can simply total up each type of result substitute into the template and create a confusion matrix that will concisely summarize the results of testing the classifier Predicted conditionTotal 8 4 12 Cancer 7 Non cancer 5Actual condition Cancer 8 6 2Non cancer 4 1 3 In this confusion matrix of the 8 samples with cancer the system judged that 2 were cancer free and of the 4 samples without cancer it predicted that 1 did have cancer All correct predictions are located in the diagonal of the table highlighted in green so it is easy to visually inspect the table for prediction errors as values outside the diagonal will represent them By summing up the 2 rows of the confusion matrix one can also deduce the total number of positive P and negative N samples in the original dataset i e P TP FN displaystyle P TP FN and N FP TN displaystyle N FP TN Table of confusionIn predictive analytics a table of confusion sometimes also called a confusion matrix is a table with two rows and two columns that reports the number of true positives false negatives false positives and true negatives This allows more detailed analysis than simply observing the proportion of correct classifications accuracy Accuracy will yield misleading results if the data set is unbalanced that is when the numbers of observations in different classes vary greatly For example if there were 95 cancer samples and only 5 non cancer samples in the data a particular classifier might classify all the observations as having cancer The overall accuracy would be 95 but in more detail the classifier would have a 100 recognition rate sensitivity for the cancer class but a 0 recognition rate for the non cancer class F1 score is even more unreliable in such cases and here would yield over 97 4 whereas informedness removes such bias and yields 0 as the probability of an informed decision for any form of guessing here always guessing cancer According to Davide Chicco and Giuseppe Jurman the most informative metric to evaluate a confusion matrix is the Matthews correlation coefficient MCC Other metrics can be included in a confusion matrix each of them having their significance and use Predicted condition Sources viewtalkeditTotal population P N Predicted positive Predicted negative Informedness bookmaker informedness BM TPR TNR 1 Prevalence threshold PT TPR FPR FPR TPR FPR Actual condition Positive P True positive TP hit False negative FN miss underestimation True positive rate TPR recall sensitivity SEN probability of detection hit rate power TP P 1 FNR False negative rate FNR miss rate type II error FN P 1 TPRNegative N False positive FP false alarm overestimation True negative TN correct rejection False positive rate FPR probability of false alarm fall out type I error FP N 1 TNR True negative rate TNR specificity SPC selectivity TN N 1 FPRPrevalence P P N Positive predictive value PPV precision TP TP FP 1 FDR False omission rate FOR FN TN FN 1 NPV Positive likelihood ratio LR TPR FPR Negative likelihood ratio LR FNR TNR Accuracy ACC TP TN P N False discovery rate FDR FP TP FP 1 PPV Negative predictive value NPV TN TN FN 1 FOR Markedness MK deltaP Dp PPV NPV 1 Diagnostic odds ratio DOR LR LR Balanced accuracy BA TPR TNR 2 F1 score 2 PPV TPR PPV TPR 2 TP 2 TP FP FN Fowlkes Mallows index FM PPV TPR Matthews correlation coefficient MCC TPR TNR PPV NPV FNR FPR FOR FDR Threat score TS critical success index CSI Jaccard index TP TP FN FP the number of real positive cases in the data A test result that correctly indicates the presence of a condition or characteristic Type II error A test result which wrongly indicates that a particular condition or attribute is absent the number of real negative cases in the data A test result that correctly indicates the absence of a condition or characteristic Type I error A test result which wrongly indicates that a particular condition or attribute is presentConfusion matrices with more than two categoriesConfusion matrix is not limited to binary classification and can be used in multi class classifiers as well The confusion matrices discussed above have only two conditions positive and negative For example the table below summarizes communication of a whistled language between two speakers with zero values omitted for clarity Perceived vowelVowel produced i e a o ui 15 1e 1 1a 79 5o 4 15 3u 2 2See alsoPositive and negative predictive valuesReferencesStehman Stephen V 1997 Selecting and interpreting measures of thematic classification accuracy Remote Sensing of Environment 62 1 77 89 Bibcode 1997RSEnv 62 77S doi 10 1016 S0034 4257 97 00083 7 Powers David M W 2011 Evaluation From Precision Recall and F Measure to ROC Informedness Markedness amp Correlation Journal of Machine Learning Technologies 2 1 37 63 S2CID 55767944 Opitz Juri 2024 A Closer Look at Classification Evaluation Metrics and a Critical Reflection of Common Evaluation Practice Transactions of the Association for Computational Linguistics 12 820 836 arXiv 2404 16958 doi 10 1162 tacl a 00675 Provost Foster Fawcett Tom 2013 Data science for business what you need to know about data mining and data analytic thinking 1 ed 2 release ed Beijing Koln O Reilly ISBN 978 1 4493 6132 7 Fawcett Tom 2006 An Introduction to ROC Analysis PDF Pattern Recognition Letters 27 8 861 874 Bibcode 2006PaReL 27 861F doi 10 1016 j patrec 2005 10 010 S2CID 2027090 Powers David M W 2011 Evaluation From Precision Recall and F Measure to ROC Informedness Markedness amp Correlation Journal of Machine Learning Technologies 2 1 37 63 Ting Kai Ming 2011 Sammut Claude Webb Geoffrey I eds Encyclopedia of machine learning Springer doi 10 1007 978 0 387 30164 8 ISBN 978 0 387 30164 8 Brooks Harold Brown Barb Ebert Beth Ferro Chris Jolliffe Ian Koh Tieh Yong Roebber Paul Stephenson David 2015 01 26 WWRP WGNE Joint Working Group on Forecast Verification Research Collaboration for Australian Weather and Climate Research World Meteorological Organisation Retrieved 2019 07 17 Chicco D Jurman G January 2020 The advantages of the Matthews correlation coefficient MCC over F1 score and accuracy in binary classification evaluation BMC Genomics 21 1 6 1 6 13 doi 10 1186 s12864 019 6413 7 PMC 6941312 PMID 31898477 Tharwat A August 2018 Classification assessment methods Applied Computing and Informatics 17 168 192 doi 10 1016 j aci 2018 08 003 Chicco D Jurman G January 2020 The advantages of the Matthews correlation coefficient MCC over F1 score and accuracy in binary classification evaluation BMC Genomics 21 1 6 1 6 13 doi 10 1186 s12864 019 6413 7 PMC 6941312 PMID 31898477 Fawcett Tom 2006 An Introduction to ROC Analysis PDF Pattern Recognition Letters 27 8 861 874 doi 10 1016 j patrec 2005 10 010 S2CID 2027090 Provost Foster Tom Fawcett 2013 08 01 Data Science for Business What You Need to Know about Data Mining and Data Analytic Thinking O Reilly Media Inc Powers David M W 2011 Evaluation From Precision Recall and F Measure to ROC Informedness Markedness amp Correlation Journal of Machine Learning Technologies 2 1 37 63 Ting Kai Ming 2011 Sammut Claude Webb Geoffrey I eds Encyclopedia of machine learning Springer doi 10 1007 978 0 387 30164 8 ISBN 978 0 387 30164 8 Brooks Harold Brown Barb Ebert Beth Ferro Chris Jolliffe Ian Koh Tieh Yong Roebber Paul Stephenson David 2015 01 26 WWRP WGNE Joint Working Group on Forecast Verification Research Collaboration for Australian Weather and Climate Research World Meteorological Organisation Retrieved 2019 07 17 Chicco D Jurman G January 2020 The advantages of the Matthews correlation coefficient MCC over F1 score and accuracy in binary classification evaluation BMC Genomics 21 1 6 1 6 13 doi 10 1186 s12864 019 6413 7 PMC 6941312 PMID 31898477 Chicco D Toetsch N Jurman G February 2021 The Matthews correlation coefficient MCC is more reliable than balanced accuracy bookmaker informedness and markedness in two class confusion matrix evaluation BioData Mining 14 13 13 doi 10 1186 s13040 021 00244 z PMC 7863449 PMID 33541410 Tharwat A August 2018 Classification assessment methods Applied Computing and Informatics 17 168 192 doi 10 1016 j aci 2018 08 003 Rialland Annie August 2005 Phonological and phonetic aspects of whistled languages Phonology 22 2 237 271 CiteSeerX 10 1 1 484 4384 doi 10 1017 S0952675705000552 S2CID 18615779