
- Calculus, Volume 2: Multi-Variable Calculus and Linear Algebra with Applications to Differential Equations and Probability
- Tom M. Apostol
- Second Edition
- 1991
- 978-1-119-49676-2
1.14 Construction of orthogonal sets. The Gram-Schmidt process
$\quad$ Every finite-dimensional linear space has a finite basis. If the space is Euclidean, we can always construct an orthogonal basis, the construction of which is called the Gram-Schmidt orthogonalization process.
$\quad$ Theorem 15.13. $\quad$ Orthogonalization Theorem.
$\quad$ Let $x_1,$ $x_2,$ $\dots,$ be a finite or infinite sequence of elements in a Euclidean space $V,$ and let $L(x_1, \dots, x_k)$ denote the subspace spanned by the first $k$ of these elements. Then there is a corresponding sequence of elements $y_1, y_2, \dots,$ in $V$ which has the following properties for each integer $k:$
$\quad \text{(a)}$ The element $y_k$ is orthogonal to every element in the subspace $L(y_1, \dots, y_{k-1})$
$\quad \text{(b)}$ The subspace spanned by $y_1, \dots, y_k$ is the same as that spanned by $x_1, \dots, x_k:$ $$L(y_1, \dots, y_k) = L(x_1, \dots, x_k).$$
$\quad \text{(c)}$ The sequence $y_1, y_2, \dots,$ is unique, except for scalar factors. That is, if $y_1', y_2', \dots,$ is another sequence of elements in $V$ satisfying properties (a) and (b), then for each $k$ there is a scalar $c_k$ such that $y'_k = c_ky_k.$
$\quad$ Proof. $\quad$ Let $y_1 = x_1.$ We can see that (a), (b), and (c) are satisfied trivially. Now, we perform an induction for $r \geq 1.$ Suppose the elements $y_1, \dots, y_r$ are chosen so that (a) and (b) are satisfied when $k = r.$ Now, define $y_{r + 1}$ as:
\begin{align*}
y_{r + 1} &= x_{r + 1} - \sum_{i=1}^r a_iy_i,
\end{align*}
where $a_1, \dots, a_{r}$ are scalars to be determined. For $j \leq r,$ the inner product $(y_j, y_{r+1})$ is given by:
\begin{align*}
(y_j, y_{r+1}) &= \left(y_j, x_{r+1} - \sum_{i=1}^r a_iy_i \right)
\\
&= (y_j, x_{r+1}) - \sum_{i=1}^r a_i \left(y_j, y_i\right)
\end{align*}
But because $y_j$ is orthogonal to every $y_i$ for $i \neq j,$ we have $$ (y_j, y_{r+1}) = (y_j, x_{r+1}) - a_j(y_j, y_j).$$
If $y_j = O,$ the right-hand side of the equation becomes $0$, which makes $y_{r+1}$ orthogonal to $y_j.$ If $y_j \neq O,$ we know by positivity that $(y_j, y_j) \gt 0.$ Accordingly, we can define $a_j$ as:
\begin{align*}
a_j &= \frac{(y_j, x_{r+1})}{(y_j, y_j)}
\end{align*}
This gives us:
\begin{align*}
(y_j, y_{r+1}) &= (y_j, x_{r+1}) - a_j(y_j, y_j)
\\
&= (y_j, x_{r+1}) - \frac{(y_j, x_{r+1})}{(y_j, y_j)}(y_j, y_j)
\\
&= (y_j, x_{r+1}) - (y_j, x_{r+1})
\\
&= 0.
\end{align*}
But since this is true for all $j \leq r,$ this means that $y_{r+1}$ is orthogonal to $y_1, \dots, y_r$ and is thus orthogonal to all elements of $L(y_1, \dots, y_r).$ This proves part $(a)$ for $k = r + 1.$
$\quad$ To prove part (b), we must show that $$L(y_1, \dots, y_{r+1}) = L(x_1, \dots, x_{r+1}).$$
Previously, we defined $y_1, \dots, y_r$ such that $L(y_1, \dots, y_r) = L(x_1, \dots, x_r),$ so we can see that this subspace is part of $L(y_1, \dots, y_{r+1})$ and $L(x_1, \dots, x_{r+1})$ alike. Returning to the definition of $y_{r+1}$
\begin{align*}
(1.14) \qquad y_{r+1} &= x_{r+1} - \sum_{i=1}^r a_iy_i.
\end{align*}
we see that because $y_{r+1}$ is a linear combination of $x_{r+1}$ and $y_1, \dots, y_r,$ that $$L(y_1, \dots, y_{r+1}) \subseteq L(x_1, \dots, x_{r+1}).$$ Likewise, because $x_{r+1}$ is a linear combination of $y_1, \dots, y_{r+1},$ we have $$L(x_1, \dots, x_{r+1}) \subseteq L(y_1, \dots, y_{r+1}).$$
This proves part (b) for $k = r+1,$ thus proving (a) and (b) by induction.
$\quad$ Now, we perform an induction for (c). If $k = 1,$ we have $L(y_1) = L(y_1'),$ which makes $y_1' = c_1y_1$ for some arbitrary scalar $c_1 \neq 0.$ Now, if we assume that (c) is true for $k = r,$ we have $y_1', \dots, y_r' = c_1y_1, \dots, c_ry_r$ for nonzero scalars $c_1, \dots, c_r.$ Now, consider the element $y_{r+1}'.$ Because it satisfies part (b), we know that it is an element of $L(y_1, \dots, y_{r+1}).$ As such, we can write it as a linear combination of $y_1, \dots, y_{r+1}:$
\begin{align*}
y_{r+1}' &= \sum_{i=1}^{r+1}c_iy_i
\\
&= \sum_{i=1}^r c_iy_i + c_{r+1}y_{r+1}
\end{align*}
To show that $y_{r+1}'$ is a scalar multiple of $y_{r+1},$ we wish to show that $z_r = \sum_{i=1}^r c_iy_i = O.$ To do so, we take the inner product of $z_r$ with itself:
\begin{align*}
(z_r, z_r) &= (z_r, y_{r+1}' - c_{r+1}y_{r+1})
\\
&= (z_r, y_{r+1}') - (z_r, c_{r+1}y_{r+1})
\end{align*}
But from part (a), we know that $y_{r+1}'$ and $y_{r+1}$ are orthogonal to $y_1, \dots, y_r,$ thus making the right-hand side of the equation zero. As such, $(z_r, z_r) = 0$ which means $z_r = \sum_{i=1}^r c_iy_i = O.$ Hence, $y_{r+1}' = c_{r+1}y_{r+1},$ proving part (c) for $k = r+1. \quad \blacksquare$
$\quad$ In the foregoing construction, suppose that $y_{r+1} = O$ for some $r.$ Then, by $(1.14)$ we see that $x_{r+1}$ is a linear combination of $y_1, \dots, y_r,$ and hence of $x_1, \dots, x_r,$ making $x_1, \dots, x_{r+1}$ dependent. In other words, if $x_1, \dots, x_k$ are independent, then the corresponding elements $y_1, \dots, y_k$ are nonzero. In this case, the coefficients $a_i$ in $(1.14)$ are given by $(1.15),$ and the formulas for $y_1, \dots, y_k$ are given by: \begin{align*} (1.16) \qquad y_1 = x_1, \quad y_{r+1} &= x_{r+1} - \sum_{i=1}^r \frac{(x_{r+1}, y_i)}{(y_i, y_i)}y_i \quad \text{for} \quad r = 1, 2, \dots, k-1 \end{align*} These formulas describe the Gram-Schmidt process for constructing an orthogonal set of nonzero elements $y_1, \dots, y_k$ which spans the same subspace as a given independent set $x_1, \dots, x_k.$ In particular, if $x_1, \dots, x_k$ is a basis for a finite-dimensional Euclidean space, then $y_1, \dots, y_k$ is an orthogonal basis for the same space. If we normalize each element $y_i,$ that is, if we divide $y_i$ by its norm $\|y_i\|,$ then we form an orthonormal basis for the Euclidean space spanned by $x_1, \dots, x_k.$ Therefore, as a corollary to Theorem 15.13, we have:
$\quad$Theorem 1.14.$\quad$ Every finite-dimensional Euclidean space has an orthonormal basis.
$\quad$ If $x$ and $y$ are elements in a Euclidean space, with $y \neq O,$ the element $$\frac{(x, y)}{(y, y)}y$$ is called the projection of $x$ along $y$. In the Gram-Schmidt process $(1.16),$ we construct the element $y_{r+1}$ by subtracting from $x_{r+1}$ the projection of $x_{r+1}$ along each of the earlier elements $y_1, \dots, y_r.$