Consider a Hamiltonian that can be broken up into an exactly solvable piece \(\hat{H}_0\) and a small correction piece \(\hat{V}\): \[\hat{H}=\hat{H}_0+V,\] where \(\hat{H}_0\) satisfies \[\hat{H}_0\left|k,n_k\right\rangle_0=E_k^0\left|k,n_k\right\rangle.\] Here, \(n_k=1,2,\ldots d(k)\) with \(d(k)\) the degeneracy of the \(k\)th energy level. We assume each state \(\left|k,n_k\right\rangle\) is part of an orthonormal basis set. We can always achieve this via a Gram-Schmidt orthogonalization procedure from a set of eigenvectors which span the degenerate subspace.
Our eigenvalue equation is \[(\hat{H}_0+\hat{V})\left|k,n_k\right\rangle=E_{k,n_k}\left|k,n_k\right\rangle.\] The perturbation \(\hat{V}\) may totally lift the degeneracy, partially lift the degeneracy, or not lift it at all. Follow the Rayleigh-Schrödinger strategy, we rewrite the energy eigenvalue equation as \[(E_k^0-\hat{H}_0)\left|k,n_k\right\rangle=(\hat{V}-\Delta E_{k,n_k})\left|k,n_k\right\rangle.\] We need to "protect" with projection operators to invert. But now, the projection operators for the degenerate subspace involve a sum over \(d(k)\) terms. Define \[\hat{P}_k=\sum_{n_k=1}^{d(k)}\left|k,n_k\right\rangle_0\left\langle k,n_k\right|_0,\] which projects onto the \(E_k^0\) subspace. Check: \[\hat{P}_k^2=\sum_{n_k=1}^{d(k)}\sum_{n'_k=1}^{d(k)}\left|k,n_k\right\rangle_0\left\langle k,n_k\right|_0\left|k,n'_k\right\rangle_0\left\langle k,n'_k\right|_0=\sum_{n_k=1}^{d(k)}\left|k,n_k\right\rangle_0\left\langle k,n_k\right|_0=\hat{P}_k,\] because the energy eigenstates inside the degenerate subspace form an orthonormal set. Since \(\hat{P}_k^2=\hat{P}_k\), this is indeed a projection operator. Define \(\hat{Q}_k=1-\hat{P}_k\) to project onto a subspace orthogonal to the degenerate subspace.
Note further that \[[\hat{P}_k,\hat{H}_0]_-=[\hat{Q}_k,\hat{H}_0]_-=0\] so, we multiply the Schrödinger equation by \(\hat{Q}_k\), \[(E_k^0-\hat{H}_0)\hat{Q}_k\left|k,n_k\right\rangle=\hat{Q}_n(\hat{V}-\Delta E_{k,n_k})\left|k,n_k\right\rangle.\] But, \(\hat{Q}_k\left|k,n_k\right\rangle=\left|k,n_k\right\rangle_\perp\), so \[\left|k,n_k\right\rangle_\perp=\frac{\hat{Q}_k}{E_k^0-\hat{H}_0}(\hat{V}-\Delta E_{k,n_k})\left|k,n_k\right\rangle\] and since \(\left|k,n_k\right\rangle=\left|k,n_k\right\rangle_\parallel+\left|k,n_k\right\rangle_\perp=\hat{P}_k\left|k,n_k\right\rangle+\hat{Q}_k\left|k,n_k\right\rangle\) \[\implies\left|k,n_k\right\rangle=\left|k,n_k\right\rangle_{\parallel}+\frac{\hat{Q}_k}{E_k^0-\hat{H}_0}(\hat{V}-\Delta E_{k,n_k})\left|k,n_k\right\rangle.\] Solving for \(\left|k,n_k\right\rangle\) gives \[\boxed{\left|k,n_k\right\rangle=\left(\mathbb{I}-\frac{\hat{Q}_k}{E_k^0-\hat{H}_0}(\hat{V}-\Delta E_{k,n_k})\right)^{-1}\left|k,n_k\right\rangle_\parallel}.\]
In non-degenerate perturbation theory \(\left|k,n_k\right\rangle_\parallel\) was known since there was only one state \[\left|k\right\rangle_\parallel=\hat{P}_k\left|k\right\rangle=\left|k\right\rangle_0\left\langle k\middle|k\right\rangle_0=\left|k\right\rangle_0\quad \text{(nondegenerate case)}.\] But in degenerate perturbation theory, we don’t know a priori the direction in the \(E_k^0\) subspace for the wavefunctions \[\left|k,n_k\right\rangle_\parallel=\sum_{n'_k=1}^{d(k)}\left|k,n'_k\right\rangle_0\underbrace{\left\langle k,n'_k\middle|k,n_k\right\rangle}_{\text{do not know these}},\] and in general, we do not know these inner product coefficients. So we need an equation to find them! Recall that \[\left|k,n_k\right\rangle_\perp=\hat{Q}_k\left(\mathbb{I}-\frac{\hat{Q}_k}{E_k^0-\hat{H}_0}(\hat{V}-\Delta E_{k,n_k})\right)^{-1}\left|k,n_k\right\rangle_\parallel.\] For the parallel space, we try to project the eigenvalue equation into the degenerate subspace. This will not solve the whole problem, but will allow us to potentially find the right directions outside of that space that the perturbation takes us. We have \[\begin{aligned} \hat{P}_k\hat{V}\left|k,n_k\right\rangle=& \ \hat{P}_k(E_k^0-\hat{H}_0+\Delta E_{k,n_k})\left|k,n_k\right\rangle \\ =& \ \Delta E_{k,n_k}\hat{P}_k\left|k,n_k\right\rangle \\ =& \ \Delta E_{k,n_k}\left|k,n_k\right\rangle_\parallel. \end{aligned}\] This is not yet in the parallel subspace only, so we use the perturbation expansion expression to rewrite this inside the parallel subspace via \[\hat{P}_k\hat{V}\left[\mathbb{I}-\frac{\hat{Q}_k}{E_k^0-\hat{H}_0}(\hat{V}-\Delta E_{k,n_k})\right]^{-1}\left|k,n_k\right\rangle_\parallel=\Delta E_{k,n_k}\left|k,n_k\right\rangle_\parallel.\] Note how the operator acts on a state in the parallel subspace and results in an eigenvalue equation that stays in the parallel subspace. This is a valid eigenvalue equation in a \(d(k)\)-dimensional space. It does not find the exact eigenvalue, but is the first step in solving the whole problem. We can insert a \(\hat{P}_k\) next to the ket, since it projects parallel to the \(k\)-subspace. So, \[\left\{\hat{P}_k\hat{V}\left[\mathbb{I}-\frac{\hat{Q}_k}{E_k^0-\hat{H}_0}(\hat{V}-\Delta E_{k,n_k})\right]^{-1}\hat{P}_k-\Delta E_{k,n_k}\right\}\left|k,n_k\right\rangle_\parallel=0\] or, more compactly \[(\hat{L}_{k,n_k}-\Delta E_{k,n_k})\left|k,n_k\right\rangle_\parallel=0\] where \[\hat{L}_{k,n_k}\equiv\hat{P}_k\hat{V}\left[\mathbb{I}-\frac{\hat{Q}_k}{E_k^0-\hat{H}_0}(\hat{V}-\Delta E_{k,n_k})\right]^{-1}\hat{P}_k.\] Note that we usually wish to re-express this as a finite-dimensional matrix eigenvalue equation to solve.
This equation allows us to find \(\Delta E_{k,n_k}\) and \(\left|k,n_k\right\rangle_\parallel\). To see how this works we write an expansion in powers of \(\hat{V}\). \[\Delta E_{k,n_k}=E_k-E_k^0=E_{k,n_k}^{(1)}+E_{k,n_k}^{(2)}+E_{k,n_k}^{(3)}+\cdots\] \[\hat{L}_{k,n_k}=\hat{L}_{k,n_k}^{(1)}+\hat{L}_{k,n_k}^{(2)}+\cdots\] \[\left|k,n_k\right\rangle_\parallel=\left|k,n_k\right\rangle_\parallel^{(1)}+\left|k,n_k\right\rangle_\parallel^{(2)}+\cdots\] It is convention to start from (1) here. At lowest order, \[\begin{aligned} &\Delta E_{k,n_k}=\Delta E_{k,n_k}^{(1)} \\ &\hat{L}_{k,n_k}=\hat{L}_{k,n_k}^{(1)}=\hat{P}_k\hat{V}\hat{P}_k \\ &\implies(\hat{L}_{k,n_k}^{(1)}-E_{k,n_k}^{(1)})\left|k,n_k\right\rangle_\parallel^{(1)}=0 \end{aligned}\] This gives us \[\hat{P}_k\hat{V}\hat{P}_k\left|k,n_k\right\rangle_\parallel^{(1)}=E_{k,n_k}^{(1)}\left|k,n_k\right\rangle_\parallel^{(1)}\] which is a “Schrödinger-like” equation in the degenerate subspace. Expand in terms \(\{\left|k,n_k\right\rangle_0\}\) to get \[\boxed{\sum_{n''_k}\left\langle k,n'_k\middle|\hat{V}\middle|k,n''_k\right\rangle_0\left\langle k,n''_k\middle|k,n_k\right\rangle_\parallel^{(1)}=E_{k,n_k}^{(1)}\left\langle k,n'_k\middle|k,n_k\right\rangle_\parallel}\] This is a \(d(k)\times d(k)\) matrix equation for the first-order energy shift and for the parallel directions.
Let’s take a break to look at the simplest example. This example has the degenerate subspace equal to the entire space. \[\hat{H}_0=\begin{pmatrix}a & 0\\0 & a\end{pmatrix}, \ \hat{V}=\begin{pmatrix}V_{11} & V_{12}\\ V_{21} & V_{22}\end{pmatrix}\] The first-order matrix equation is \[\det(\hat{V}-E^{(1)}\mathbb{I})=0\] \[\det\begin{pmatrix}V_{11}-E^{(1)} & V_{12}\\ V_{21} & V_{22}-E^{(1)}\end{pmatrix}=0\] \[E^{(1) \ 2}-(V_{11}+V_{22})E^{(1)}+V_{11}V_{22}-V_{12}V_{21}=0\] Therefore, \[\boxed{E^{(1)}=\frac{V_{11}+V_{22}}{2}\pm\frac{1}{2}\sqrt{(V_{11}-V_{22})^2+4V_{12}V_{21}}}\] We find the directions of \(\left|k,n_k\right\rangle_\parallel\) by finding the eigenvectors (See Homework). Note that this formula cannot be expanded in a power series for small \(V\). This is one of the reasons why the degenerate theory is more complicated than the nondegenerate theory.
Now, suppose the first-order perturbation theory lifts all of the degeneracies \(E_{k,n_k}^{(1)}\ne E_{k,n'_k}^{(1)}\) for all \(n_k,n'_k\). So, \[\hat{P}_k\hat{V}\hat{P}_k\left|k,n_k\right\rangle_\parallel^{(1)}=E_{k,n_k}^{(1)}\left|k,n_k\right\rangle_\parallel^{(1)}\] with all \(E_{k,n_k}^{(1)}\) different. Now look at second order. \[\begin{aligned} \hat{L}_k=& \ \hat{L}_k^{(1)}+\hat{L}_k^{(2)}=\hat{P}_k\hat{V}\hat{P}_k+\hat{P}_k\hat{V}\frac{\hat{Q}_k}{E_k^0-\hat{H}_0}(\hat{V}-\Delta E_{k,n_k}^{(1)})\hat{P}_k \\ =& \ \hat{P}_k\hat{V}\hat{P}_k+\hat{P}_k\hat{V}\frac{\hat{Q}_k}{E_k^0-\hat{H}_0}\hat{V}\hat{P}_k \end{aligned}\] and the second-order equation is \[(\hat{P}_k\hat{V}\hat{P}_k-E_{k,n_k}^{(1)})\left|k,n_k\right\rangle_\parallel^{(1)}+(\hat{P}_k\hat{V}\hat{P}_k-E_{k,n_k}^{(1)})\left|k,n_{k}\right\rangle_\parallel^{(2)}+\left(\hat{P}_k\hat{V}\frac{\hat{Q}_k}{E_k^0-\hat{H}_0}\hat{V}\hat{P}_k-E_{k,n_k}^{(2)}\right)\left|k,n_k\right\rangle_\parallel^{(1)}=0\] Multiply by \(\left\langle k,n'_k\right|\) to get \[\begin{gathered} \left\langle k,n'_k\right|(\hat{P}_k\hat{V}\hat{P}_k-E_{k,n_k}^{(1)})\left|k,n_k\right\rangle_\parallel^{(1)}+\left\langle k,n'_k\right|(\hat{P}_k\hat{V}\hat{P}_k-E_{k,n_k}^{(1)})\left|k,n_{k}\right\rangle_\parallel^{(2)}\\+\left\langle k,n'_k\right|\left(\hat{P}_k\hat{V}\frac{\hat{Q}_k}{E_k^0-\hat{H}_0}\hat{V}\hat{P}_k-E_{k,n_k}^{(2)}\right)\left|k,n_k\right\rangle_\parallel^{(1)}=0 \end{gathered}\] Note \(\left\langle k,n_k\middle|k,n'_k\right\rangle_\parallel^{(1)}=\delta_{n_k,n'_k}\) (by convention) and \(\hat{P}_k\hat{V}\hat{P}_k\left|k,n_k\right\rangle_\parallel^{(1)}=E_{k,n_k}^{(1)}\left|k,n_k\right\rangle_\parallel^{(1)}\). So, \[(E_{k,n_k}^{(1)}-E_{k,n'_k}^{(1)})\left\langle k,n'_k\middle|k,n_k\right\rangle_\parallel^{(2)}+\left\langle k,n'_k\middle|\hat{P}_k\hat{V}\frac{\hat{Q}_k}{E_k^0-\hat{H}_0}\hat{V}\hat{P}_k\middle|k,n_k\right\rangle_\parallel^{(1)}-E_{k,n_k}^{(2)}\delta_{n_k,n'_k}=0.\] To find \(E_{k,n_k}^{(2)}\), set \(n'_k=n_k\) to get \[E_{k,n_k}^{(2)}=\left\langle k,n_k\middle|\hat{P}_k\hat{V}\frac{\hat{Q}_k}{E_k^0-\hat{H}_0}\hat{V}\hat{P}_k\middle|k,n_k\right\rangle_\parallel^{(1)}\] which is the formula for non-degenerate second-order perturbation theory within the basis found at first order. Once the degeneracy is lifted, subsequent expansions look like the non-degenerate formulas.