The UPS on the lab servers is near the end of its life, so we’ve changed the shutdown script in power outages to simply hard-power off some of the servers (using Rob’s fork of ESXIDown from https://github.com/docsmooth/esxidown ). Of course, the second DC had to be one of them, so that there would be enough battery left for the SQL server and FSMO / PDC to shut down cleanly. THis means that the second DC often fails to boot properly when the power comes back after our yearly ComEd outage.

So, how do you fix this without doing the research every time? Thankfully, modern versions of Windows can be configured to boot to recovery mode. If you log in and launch a command prompt, you can run the following commands to find and repair problems:

  1. Launch the command prompt.

    recovery mode cmd.exe window example

    Launching CMD.EXE in recovery window

  2. navigate to your drive with the NTDS.DIT file (on our lab server, it’s always d:\windows\ntds:
    d:
    cd\windows\ntds
  3. Run an ESENTUTL checksum on the Active Directory database file ntds.dit:
    d:\windows\system32\esentutl.exe /k ntds.dit

    ESENTUTL.exe completed checksum

    ESENTUTL.exe completed checksum

  4. Even though that’s successful, you’ll probably fail an integrity check:
    d:\windows\system32\esentutl.exe /g edb
    On our server, that always generates the error:

    The database is not up-to-date. Integrity check may find that this database is corrupt because data from the log files has yet to be placed in the database. It is strongly recommended the database is brought up-to-date before continuing! Do you wish to abort the operation?

  5. If the checksum passed and you get this error, try rebooting into Directory Services Repair Mode – exit the command prompt and hit “Restart” and then pound on F8 to get the DSRM boot option.
  6. If DSRM bluescreens, then you need to go deeper into esentutl.exe to repair your DB. If this is NOT your only DC, you will have data loss, but it should replicate from other DCs back into this one, so it shouldn’t be a huge problem.
    If DSRM works, try an ntdsutil in cmd.exe:
    ntdsutil.exe
    activate instance ntds
    files
    recover
  7. If ntdsutil file recovery errors with

    Could not initialize the Jet engine: Jet Error -543.
    Failed to open DIT for AD DS/LDS instance NTDS. Error -21478418113

    then you need to do more work with esentutl, but can continue inside DSRM, rather than having to reboot.

    If you couldn’t boot into DSRM, continue as below, but from the recovery install CMD.exe. I’ll continue these commands from THAT pathing, since it turns out not bothering to jump into DSRM makes for a faster recovery, with ensured data loss. I don’t care abuot data loss in our lab though.

  8. Try to recover the database: d:\windows\system32\esenutl.exe /ml d:\windows\ntds\edb
  9. If that doesn’t work, delete the edb.log files, then try recovery again. You’ll get an error like

    Operation terminated with error -501 (JET_errLogFileCorrupt, Log file is corrupt) after 2.123 Seconds.

    So backup your logs to a new location or delete them outright:
    mkdir log-backup
    move edb*.log log-backup
    move edb.chk log-backup

  10. Recreate the log files with a hard recovery of the database:
    d:\windows\system32\esentutl.exe /p d:\windows\ntds\ntds.dit
    You’ll get an error saying you should only run this on damaged or corrupt databases. The checks before this have proven that that is the case in this situation, so click “OK”.
  11. This should restore the database and complete successfully. Reboot and test it. If it doesn’t work, or doesn’t boot, try again in all offline mode without jumping to DSRM.
  12. If that still doesn’t work:
  13. Recheck the database integrity:
    cd \windows\ntds
    esentutl /g: ntds.dit
  14. Do a database repair again:d:\windows\system32\esentutl.exe /p ntds.dit
  15. And reboot when that completes successfully. You *should* now boot properly, and the edb.chk and edb.log files should get rebuilt.

If you are setting up a cross-forest trust with selective authentication (which requires a Windows Server 2003 Native mode level forest and domain), don’t forget to grant the “Allowed to Authenticate” right to the users from the trusted domain to the servers they’ll need access to in your domain. The error messages you’ll get back (replicated here in my test VM domains) don’t really say much helpful.

System Error 317 has occurred. The system cannot find message text for message number 0x*** in the message file for ***.

System Error 317

Further information about adding the “Allowed to Authenticate” right to the trusted users is available at Microsoft TechNet. If you have the opportunity to raise your forest and domain functional levels to take advantage of this, I highly recommend it. But I recommend also (even more strongly) documenting precisely what you set.