Mathematical Immaturity - 1.14 Construction of orthogonal sets. The Gram-Schmidt process

Calculus, Volume 2: Multi-Variable Calculus and Linear Algebra with Applications to Differential Equations and Probability
Tom M. Apostol
Second Edition
1991
978-1-119-49676-2

1.14 Construction of orthogonal sets. The Gram-Schmidt process

$\quad$ Every finite-dimensional linear space has a finite basis. If the space is Euclidean, we can always construct an orthogonal basis, the construction of which is called the Gram-Schmidt orthogonalization process.

$\quad$ Theorem 15.13. $\quad$ Orthogonalization Theorem. $\quad$ Let $x_1,$ $x_2,$ $\dots,$ be a finite or infinite sequence of elements in a Euclidean space $V,$ and let $L(x_1, \dots, x_k)$ denote the subspace spanned by the first $k$ of these elements. Then there is a corresponding sequence of elements $y_1, y_2, \dots,$ in $V$ which has the following properties for each integer $k:$
$\quad \text{(a)}$ The element $y_k$ is orthogonal to every element in the subspace $L(y_1, \dots, y_{k-1})$
$\quad \text{(b)}$ The subspace spanned by $y_1, \dots, y_k$ is the same as that spanned by $x_1, \dots, x_k:$ $$L(y_1, \dots, y_k) = L(x_1, \dots, x_k).$$ $\quad \text{(c)}$ The sequence $y_1, y_2, \dots,$ is unique, except for scalar factors. That is, if $y_1', y_2', \dots,$ is another sequence of elements in $V$ satisfying properties (a) and (b), then for each $k$ there is a scalar $c_k$ such that $y'_k = c_ky_k.$

$\quad$ Proof. $\quad$ Let $y_1 = x_1.$ We can see that (a), (b), and (c) are satisfied trivially. Now, we perform an induction for $r \geq 1.$ Suppose the elements $y_1, \dots, y_r$ are chosen so that (a) and (b) are satisfied when $k = r.$ Now, define $y_{r + 1}$ as: \begin{align*} y_{r + 1} &= x_{r + 1} - \sum_{i=1}^r a_iy_i, \end{align*} where $a_1, \dots, a_{r}$ are scalars to be determined. For $j \leq r,$ the inner product $(y_j, y_{r+1})$ is given by: \begin{align*} (y_j, y_{r+1}) &= \left(y_j, x_{r+1} - \sum_{i=1}^r a_iy_i \right) \\ &= (y_j, x_{r+1}) - \sum_{i=1}^r a_i \left(y_j, y_i\right) \end{align*} But because $y_j$ is orthogonal to every $y_i$ for $i \neq j,$ we have $$ (y_j, y_{r+1}) = (y_j, x_{r+1}) - a_j(y_j, y_j).$$ If $y_j = O,$ the right-hand side of the equation becomes $0$, which makes $y_{r+1}$ orthogonal to $y_j.$ If $y_j \neq O,$ we know by positivity that $(y_j, y_j) \gt 0.$ Accordingly, we can define $a_j$ as: \begin{align*} a_j &= \frac{(y_j, x_{r+1})}{(y_j, y_j)} \end{align*} This gives us: \begin{align*} (y_j, y_{r+1}) &= (y_j, x_{r+1}) - a_j(y_j, y_j) \\ &= (y_j, x_{r+1}) - \frac{(y_j, x_{r+1})}{(y_j, y_j)}(y_j, y_j) \\ &= (y_j, x_{r+1}) - (y_j, x_{r+1}) \\ &= 0. \end{align*} But since this is true for all $j \leq r,$ this means that $y_{r+1}$ is orthogonal to $y_1, \dots, y_r$ and is thus orthogonal to all elements of $L(y_1, \dots, y_r).$ This proves part $(a)$ for $k = r + 1.$

$\quad$ To prove part (b), we must show that $$L(y_1, \dots, y_{r+1}) = L(x_1, \dots, x_{r+1}).$$ Previously, we defined $y_1, \dots, y_r$ such that $L(y_1, \dots, y_r) = L(x_1, \dots, x_r),$ so we can see that this subspace is part of $L(y_1, \dots, y_{r+1})$ and $L(x_1, \dots, x_{r+1})$ alike. Returning to the definition of $y_{r+1}$ \begin{align*} (1.14) \qquad y_{r+1} &= x_{r+1} - \sum_{i=1}^r a_iy_i. \end{align*} we see that because $y_{r+1}$ is a linear combination of $x_{r+1}$ and $y_1, \dots, y_r,$ that $$L(y_1, \dots, y_{r+1}) \subseteq L(x_1, \dots, x_{r+1}).$$ Likewise, because $x_{r+1}$ is a linear combination of $y_1, \dots, y_{r+1},$ we have $$L(x_1, \dots, x_{r+1}) \subseteq L(y_1, \dots, y_{r+1}).$$ This proves part (b) for $k = r+1,$ thus proving (a) and (b) by induction.

$\quad$ Now, we perform an induction for (c). If $k = 1,$ we have $L(y_1) = L(y_1'),$ which makes $y_1' = c_1y_1$ for some arbitrary scalar $c_1 \neq 0.$ Now, if we assume that (c) is true for $k = r,$ we have $y_1', \dots, y_r' = c_1y_1, \dots, c_ry_r$ for nonzero scalars $c_1, \dots, c_r.$ Now, consider the element $y_{r+1}'.$ Because it satisfies part (b), we know that it is an element of $L(y_1, \dots, y_{r+1}).$ As such, we can write it as a linear combination of $y_1, \dots, y_{r+1}:$ \begin{align*} y_{r+1}' &= \sum_{i=1}^{r+1}c_iy_i \\ &= \sum_{i=1}^r c_iy_i + c_{r+1}y_{r+1} \end{align*} To show that $y_{r+1}'$ is a scalar multiple of $y_{r+1},$ we wish to show that $z_r = \sum_{i=1}^r c_iy_i = O.$ To do so, we take the inner product of $z_r$ with itself: \begin{align*} (z_r, z_r) &= (z_r, y_{r+1}' - c_{r+1}y_{r+1}) \\ &= (z_r, y_{r+1}') - (z_r, c_{r+1}y_{r+1}) \end{align*} But from part (a), we know that $y_{r+1}'$ and $y_{r+1}$ are orthogonal to $y_1, \dots, y_r,$ thus making the right-hand side of the equation zero. As such, $(z_r, z_r) = 0$ which means $z_r = \sum_{i=1}^r c_iy_i = O.$ Hence, $y_{r+1}' = c_{r+1}y_{r+1},$ proving part (c) for $k = r+1. \quad \blacksquare$

$\quad$ In the foregoing construction, suppose that $y_{r+1} = O$ for some $r.$ Then, by $(1.14)$ we see that $x_{r+1}$ is a linear combination of $y_1, \dots, y_r,$ and hence of $x_1, \dots, x_r,$ making $x_1, \dots, x_{r+1}$ dependent. In other words, if $x_1, \dots, x_k$ are independent, then the corresponding elements $y_1, \dots, y_k$ are nonzero. In this case, the coefficients $a_i$ in $(1.14)$ are given by $(1.15),$ and the formulas for $y_1, \dots, y_k$ are given by: \begin{align*} (1.16) \qquad y_1 = x_1, \quad y_{r+1} &= x_{r+1} - \sum_{i=1}^r \frac{(x_{r+1}, y_i)}{(y_i, y_i)}y_i \quad \text{for} \quad r = 1, 2, \dots, k-1 \end{align*} These formulas describe the Gram-Schmidt process for constructing an orthogonal set of nonzero elements $y_1, \dots, y_k$ which spans the same subspace as a given independent set $x_1, \dots, x_k.$ In particular, if $x_1, \dots, x_k$ is a basis for a finite-dimensional Euclidean space, then $y_1, \dots, y_k$ is an orthogonal basis for the same space. If we normalize each element $y_i,$ that is, if we divide $y_i$ by its norm $\|y_i\|,$ then we form an orthonormal basis for the Euclidean space spanned by $x_1, \dots, x_k.$ Therefore, as a corollary to Theorem 15.13, we have:

$\quad$Theorem 1.14.$\quad$ Every finite-dimensional Euclidean space has an orthonormal basis.

$\quad$ If $x$ and $y$ are elements in a Euclidean space, with $y \neq O,$ the element $$\frac{(x, y)}{(y, y)}y$$ is called the projection of $x$ along $y$. In the Gram-Schmidt process $(1.16),$ we construct the element $y_{r+1}$ by subtracting from $x_{r+1}$ the projection of $x_{r+1}$ along each of the earlier elements $y_1, \dots, y_r.$