Our primary lab here at totalnetsolutions.net is a bit space-constrained (RAM constantly 50% overcommited in ESX), so we have it limited to only 2 domains, a parent, and a child. There’s 2 sites, one of them is a “BranchOffice” with only an RODC. This serves us well for the majority of customer cases, especially since the DHCP pool serves the BranchOffice site, which is routed via DSL to the CorpOffice site.

Recently we needed a 3rd domain (2nd child) for testing a particular user/group creation scheme for a customer. Upon completing testing in August, to reduce RAM utilization on the ESX host (and because it makes the AD Schema even more “real-world”, we dcpromo’ed the DC for the 3rd domain, deleting the entire domain. We deleted the objects, and ran the ntdsutil metadata cleanup, like good AD admins. Then we watched the logs fill up with Information events only for a few weeks, and forgot about the episode.

The thing was: the DC in the 3rd domain, because it was short-lived, was left with its DHCP address (it was never a DNS server), so it got added to the BranchOffice site. According to this 2009 Technet AD Troubleshooting Blog entry from Ingolfur Strangeland, RODCs won’t register as the Intersite Topology Generator (ISTG) for the site, but they will perform that role for themselves. (Background on the ISTG.) Because of this, the *new* DC, which was writeable, took over the role of the ISTG for the site, and made sure that all the replication connections went through it, as the only writeable DC in the site.

The problem is, we removed this server. When we removed the writeable DC in the BranchOffice site, there was no other server available to write “I’m now the ISTG” and replicate that setting outbound to the other site(s)… because the only remaining server was an RODC.

We discovered the problem 2 months later (Rob had a baby and wasn’t paying any attention to the lab for a while) when, after creating 1,000,000 new users in the lab, replication was surprisingly slow, but only into the Branch office. So slow that when we joined new computers, they’d boot up with “Service Principal Unknown” errors, and AD users couldn’t log in. In Active Directory Sites and Services we saw that the ISTG Server and Site were both “Invalid”.  This post from 2011 discusses how to move this, but not if it works for RODCs… the good news is that it does:

  1. open the Configuration container, like with adsiedit.msc
  2. Expand CN=Sites, and then the site with the broken ISTG (likely the site with the RODC)
  3. Double-click to open the properties of CN=NTDS Settings
  4. Find the value: “InterSiteTopologyGenerator” and paste in the full DN (from the Configuration Container, not the RootDN) of the RODC
    1. This is the “distinguishedName” value of the CN=<servername>,CN=<sitename>,CN=Sites,CN=Configuration object of the server in the site in question that *should* be the ISTG.
  5. Click “OK” to Save, use ‘repadmin’ or dssite.msc (AD Sites and Services) to force replication and wait 15 minutes (or your own inter-site replication time)