I guess it's possible that dropping the >2.0 gap rule would make a sudden improvement, but I'm guessing it's much more complicated than that.
First, I think they give more weight to match results between opponents with a lower gap. So it's not like all matches with <2.0 gap are weighted the same and >2.0 weigh nothing with a hard cutoff. I'm thinking the weight gradually decreases to zero at 2.0. So a change to this part of their algorithm would not be a simple rule switch but would mean changing their weighting formula, which might have difficult to predict consequences.
Second, I think the larger issue is probably with the iterations that
@schmke has described. Because everyone's rating affects all of their opponents' ratings and vice versa, there's a circularity that can be difficult to handle mathematically. Re-running calculations iteratively is one way to handle it, where each iteration will get the ratings closer to converging to an optimum. Most active players probably don't change much after each iteration, so UTR does not bother running many of them every single day. But then there are the weird edge cases you've identified who probably do need at least several more iterations to settle down to something reasonable.