I'm starting to believe in WTN at least more than UTR.
I don't like what I'm hearing about how WTN is handing out starting #s based on age/gender, but amongst people I'm playing against or seeing in tournaments, it seems to be solid.
UTR for example had a girl at 4.42 (for some reason I could see her 2 decimals) but she's been on a tear in womens' and mixed. Even winning a 4.0 womens tournament as a 3.5 player.
UTR had several people at 5.xx that she beat. WTN and TR both reflected her much more accurately. The argument that she's just beating low ranked player is a non-sequitour. She's beating the opponents ahead of her, many of whom are according to UTR rated above her!
WTN is weak on the self-rating with over-use of pre-assumptions (like the age penalty), but gains momentum and is probably the most accurate of the rating systems for players that have a substantial amount of data in the system. It quickly recovers accuracy if you log matches with opponents of different ages.
UTR is pretty solid for singles when players are active and playing a lot of matches. But outside of that UTR kills its itself with bad assumptions. By continuing to adjust rating after a player goes inactive, it compromises accuracy of whole system and goes unstable. And the stupid 2-unit gap rule in the algo, along with a bug in the algo for how it calculates opponent strength and partner strength adjustment in doubles, makes it useless for mixed doubles, ignoring a large fraction of the only matches with both genders on same court at same time. It overweights opponent strength, so it is more of a rating of your strength of schedule than it is of the player, giving illusion of accuracy. That’s how you end up with stupid results like Jenson Brooksby briefly rising to UTR World #1 this summer.
TR is pretty solid for rec level play in both singles and doubles, with bonus for separating out mixed. Algo is simple and logical and easy to reverse engineer. It doesn’t separate singles and doubles rating, so can be misleading if you don’t review the match history to estimate those ratings separately yourself. It’s also a bit less sticky than the other rating systems as it only averages last several match ratings, so it has more recency bias and more sensitive to outlier results.