10.4. Summary#
10.4.1. Terminology Review#
Use the flashcards below to help you review the terminology introduced in this chapter. \(~~~~ ~~~~ ~~~~ \mbox{ }\)
10.4.2. Key Take-Aways#
Binary Decisions from Continuous Data: Non-Bayesian Approaches
We considered binary hypothesis tests in which the output of a system is a continuous random variable that depends on the input or hidden state of the system, which is binary in nature.
We first showed the form of the likelihoods for this scenario and then showed how to find the maximum-likelihood (ML) decision rule.
We generalized the ML decision rule to a likelihood ratio test that allows a tradeoff between the probability of false alarm and probability of miss.
Receiver operating characteristic (ROC) curves plot the probability of false alarm (or False Positive Rate or FPR) as a function of the probability of detection (or True Positive Rate or TPR). They can be used to visualize the overall performance of a detector. The closer the curve is to the top-left, the better the detector’s performance.
The overall detector performance can be quantified in terms of area under the curve (AUC), which can be found by integrating the area under the ROC curve. AUC takes values between 1/2 and 1. The minimum of 1/2 can be achieved without using the observed value at all. The maximum of 1 corresponds to a perfect detector that never makes a false alarm or miss.
Point Conditioning
Point conditioning occurs when we condition on a continuous random variable taking on a particular value. This has to be handled differently than when the conditioning event has a nonzero probability.
Given a continuous random variable \(X\), the conditional probability of some event \(A\) given that \(X=x\) can be calculated as \begin{equation*} P\left(A \vert X=1 \right) = \frac{f_X(x|A)}{f_X(x)} P(A), \end{equation*} when \(f_X(x|A)\) and \(f_X(x)\) are defined and \(f_X(x) \ne 0\).
If \(f_X(x)\) is not known, it can be calculated using the Law of Total Probability as \begin{equation*} f_X(x) = \sum_i f_X(x |A_i) P(A_i). \end{equation*}
Combining the two equations above gives a point-conditioning form of Bayes’ Rule: \begin{equation*} P(A_i|X !=! x) = \frac{f_X(x|A_i) P(A_i)}{ \sum_{i=0}^{n-1}{f_X(x|A_i) P(A_i)} }. \end{equation*}
For many applications, the probabilities and densities in the above form of Bayes’ Rule have the following interpretations: \(P(A_i |X=x)\) is an a posteriori probability, \(f_X(x|A_i)\) is a likelihood, and \(P(A_i)\) is an a priori probability.
A Law of Total Probability with Point Conditioning is \begin{equation*} P(A) = \int_{-\infty}^{\infty} P\left( A |X=x\right) f_X(x) ~dx. \end{equation*}
Optimal Bayesian Decision Making with Continuous Random Variables
Given a finite collection of discrete input events \(A_0, A_1, \ldots, A_{n-1}\) and an observed value \(X=x\), where the distribution of \(X\) depends on the input event \(A_i\), then the MAP decision rule chooses the \(A_i\) that maximizes the a posteriori probability \begin{align*} \hat{A}i & = \arg \max{A_i }P(A_i |X=x) \ & = \arg \max_{A_i} \frac{ f_X(x|A_i) P(A_i)}{ f_X(x)}\ & = \arg \max_{A_i} f_X(x|A_i) P(A_i). \end{align*}
The last expression shows that the MAP rule compares weighted likelihoods, where the weights are just the corresponding a priori probabilities.
For a binary input system with likelihoods that are Normal with equal \(\sigma\) but different means, the MAP decision rule determines a single threshold \(\gamma\) such that we make one decision if \(X>\gamma\) and the opposite decision if \(X \le \gamma\).
For a binary input system with likelihoods that are Normal with equal \(\sigma\) but different means \(\mu_0\) and \(\mu_1\), the ML rule determines a single threshold, which is equal to the average of the conditional means, \begin{equation*} \gamma = \frac{\mu_0 + \mu_1}{2}. \end{equation*}