The UPS on the lab servers is near the end of its life, so we’ve changed the shutdown script in power outages to simply hard-power off some of the servers (using Rob’s fork of ESXIDown from https://github.com/docsmooth/esxidown ). Of course, the second DC had to be one of them, so that there would be enough battery left for the SQL server and FSMO / PDC to shut down cleanly. THis means that the second DC often fails to boot properly when the power comes back after our yearly ComEd outage.

So, how do you fix this without doing the research every time? Thankfully, modern versions of Windows can be configured to boot to recovery mode. If you log in and launch a command prompt, you can run the following commands to find and repair problems:

  1. Launch the command prompt.

    recovery mode cmd.exe window example

    Launching CMD.EXE in recovery window

  2. navigate to your drive with the NTDS.DIT file (on our lab server, it’s always d:\windows\ntds:
    d:
    cd\windows\ntds
  3. Run an ESENTUTL checksum on the Active Directory database file ntds.dit:
    d:\windows\system32\esentutl.exe /k ntds.dit

    ESENTUTL.exe completed checksum

    ESENTUTL.exe completed checksum

  4. Even though that’s successful, you’ll probably fail an integrity check:
    d:\windows\system32\esentutl.exe /g edb
    On our server, that always generates the error:

    The database is not up-to-date. Integrity check may find that this database is corrupt because data from the log files has yet to be placed in the database. It is strongly recommended the database is brought up-to-date before continuing! Do you wish to abort the operation?

  5. If the checksum passed and you get this error, try rebooting into Directory Services Repair Mode – exit the command prompt and hit “Restart” and then pound on F8 to get the DSRM boot option.
  6. If DSRM bluescreens, then you need to go deeper into esentutl.exe to repair your DB. If this is NOT your only DC, you will have data loss, but it should replicate from other DCs back into this one, so it shouldn’t be a huge problem.
    If DSRM works, try an ntdsutil in cmd.exe:
    ntdsutil.exe
    activate instance ntds
    files
    recover
  7. If ntdsutil file recovery errors with

    Could not initialize the Jet engine: Jet Error -543.
    Failed to open DIT for AD DS/LDS instance NTDS. Error -21478418113

    then you need to do more work with esentutl, but can continue inside DSRM, rather than having to reboot.

    If you couldn’t boot into DSRM, continue as below, but from the recovery install CMD.exe. I’ll continue these commands from THAT pathing, since it turns out not bothering to jump into DSRM makes for a faster recovery, with ensured data loss. I don’t care abuot data loss in our lab though.

  8. Try to recover the database: d:\windows\system32\esenutl.exe /ml d:\windows\ntds\edb
  9. If that doesn’t work, delete the edb.log files, then try recovery again. You’ll get an error like

    Operation terminated with error -501 (JET_errLogFileCorrupt, Log file is corrupt) after 2.123 Seconds.

    So backup your logs to a new location or delete them outright:
    mkdir log-backup
    move edb*.log log-backup
    move edb.chk log-backup

  10. Recreate the log files with a hard recovery of the database:
    d:\windows\system32\esentutl.exe /p d:\windows\ntds\ntds.dit
    You’ll get an error saying you should only run this on damaged or corrupt databases. The checks before this have proven that that is the case in this situation, so click “OK”.
  11. This should restore the database and complete successfully. Reboot and test it. If it doesn’t work, or doesn’t boot, try again in all offline mode without jumping to DSRM.
  12. If that still doesn’t work:
  13. Recheck the database integrity:
    cd \windows\ntds
    esentutl /g: ntds.dit
  14. Do a database repair again:d:\windows\system32\esentutl.exe /p ntds.dit
  15. And reboot when that completes successfully. You *should* now boot properly, and the edb.chk and edb.log files should get rebuilt.

I recently booted up a long-powered-off test system for a customer, and realized it was still CentOS 4.7. up2date gives one of these error messages:


Error: Bad/Outdated Mirrorlist and Baseurl

[Errno -1] Header is not complete.
Trying other mirror.

To fix, first run this to update to a newer version of yum:

for i in \
libxml2-2.6.16-12.6.i386.rpm \
libxml2-python-2.6.16-12.6.i386.rpm \
readline-4.3-13.i386.rpm \
python-2.3.4-14.7.el4.i386.rpm \
python-elementtree-1.2.6-5.el4.centos.i386.rpm \
sqlite-3.3.6-2.i386.rpm \
python-sqlite-1.1.7-1.2.1.i386.rpm \
elfutils-0.97.1-5.i386.rpm \
popt-1.9.1-32_nonptl.i386.rpm \
rpm-libs-4.3.3-32_nonptl.i386.rpm \
rpm-4.3.3-32_nonptl.i386.rpm \
rpm-python-4.3.3-32_nonptl.i386.rpm \
python-urlgrabber-2.9.8-2.noarch.rpm \
yum-metadata-parser-1.0-8.el4.centos.i386.rpm \
yum-2.4.3-4.el4.centos.noarch.rpm
do
rpm -Uvh http://archive.kernel.org/centos/4.9/os/i386/CentOS/RPMS/${i};
done

(note: the order is important for the dependencies, and the trailing “\” prevents bash from spitting out interpretation errors, but makes it readable.)

Then edit /etc/yum.repos.d/CentOS-Base.repo: comment out mirrorlist directives, and in each enabled section add baseurl=http://archive.kernel.org/centos/4.9/os/$basearch, baseurl=http://archive.kernel.org/centos/4.9/updates/$basearch, etc.

Note the change from vault.centos.org to archive.kernel.org/centos – the CentOS Vault is handling RPM headers differently than yum in CentOS 4.9 expects. The Kernel.org archive does not.

Many are the times we’ve run into DNS configuration problems with Microsoft AD.  After being asked for advice a few more times than normal this year, I’ve pulled together several emails for this list of “Troubleshooting Microsoft AD-integrated DNS” highlights below.  We’ll first cover the generic topics of checking the configuration of your server configuration,  then the configuration of the zones themselves. For each topic, we’ll do a checklist followed by an explanation.

Server configuration:

Checklist

  1. Is the server (Windows 2003 or higher) pointing to itself for primary DNS in the network configuration?
  2. If a standalone DC: Does the server have *no* secondary DNS in the network configuration?
  3. If there are multiple DCs: Does the server list only other DCs in the secondary DNS server list in the advanced network configuration?
  4. Does the server have proper forwarders in the DNS server configuration (to the parent domain or to the ISP, but not both)?
  5. In a command prompt, run the following:
    ipconfig /registerdns
    net stop netlogon
    net start netlogon
  6. Read DNS and System logs to make sure there are no issues being reported.
  7. wait 20 minutes

Explanation

One of the major problems we run into is that customers will put the ISP DNS servers in the network configuration on the DC, not in the DNS Forwarders list in the DNS Server configuration.  The DC *is* a DNS server.  It needs to talk to itself, so that it can register crucial DNS settings in its own database.  If its own database can’t find the information requested (such as www.google.com), then the DNS Server service is responsible for looking that data up, and then caching it so that it’s readily available for other clients, too.  This misconfiguration also has the problem of generating DDNS update requests back to the ISP DNS servers, which are ignored at best, and a security leak at worst (like for military/government installations).

I like to tell my Unix customers “the first rule of administering Active Directory is to go get another cup of coffee.” This forces them to take their hands off the keyboard and wait for cross-site replication (hopefully) before making another change.  It’s a good reminder for the seasoned Windows admins, as well.

Zone Configuration

Reverse Lookup Zones

We’ll cover reverse lookup zones before forward lookup zones, for two reasons: 1) customers screw up reverse lookup configuration much more often than forward lookup configuration ; 2) no SRV records in Reverse zones (normally).

Checklist

If you have non-Microsoft DNS servers or multiple AD domains in your environment

  1. Does the server have reverse DNS zones defined?
  2. Does any *other* server (in the DNS Forwarders configuration list) have the same reverse DNS zone defined?
  3. Do the defined reverse zones allow “unsecured dynamic updates”?
  4. Are all IP subnets in your network defined as reverse DNS zones on the primary DNS servers (the last forwarders in the network before the ISP)?
  5. Do you have aging and scavenging turned on in the server settings?  If so (you should), do you have all clients automatically renewing their records (Windows clients will by default)?

If you only have a single AD domain, or no non-Microsoft DNS servers

  1. Does the server have reverse DNS zones defined for all IP subnets (including IPv6) in your network?
  2. Do those reverse DNS zones allow dynamic updates?
  3. Is aging of old records enabled with sane no-refresh and refresh values  in the reverse zones?

Explanation

Each DNS Zone is a database.  There can only be one authoritative owner of the database, defined by the SOA record on the Zone.  Any other DNS servers get their information from this SOA, either by normal queries, or by zone transfer (AD replication does a kind of zone transfer).  If two servers are set up with the same zone (create 0.168.192.in-addr.arpa reverse DNS zone in dns1.contoso.com and ns1.worldwidetoys.com, for example), then there is no mechanism to transfer the information between those two servers.

For example: any individual client will only talk to the DNS server it’s configured to talk to (client1.contoso.com gets its DNS info from dns1.contoso.com and winxp1.worldwidetoys.com gets its information from ns1.worldwidetoys.com). Each client will also send updates only to its own DNS server.  This means that client1.contoso.com will register its IP 192.168.0.10 with dns1.contoso.com, and winxp1.worldwidetoys.com will register its IP 192.168.0.20 with ns1.worldwidetoys.com.  These two records will never be synched between dns1.contoso.com and ns1.worldwidetoys.com.  Therefore, when winxp1.worldwidetoys.com asks ns1.worldwidetoys.com “who has 192.168.0.10?”, ns1.worldwidetoys.com will answer “nobody!”.

The DNS admin must fix this problem by manually registering all of the records from ns1.worldwidetoys.com in the zone stored in dns1.contoso.com, deleting the 0.168.192.in-addr.arpa zone from ns1.worldwidetoys.com, and then setting up a forwarder or conditional forwarder to dns1.contoso.com.  Now, that same query results in ns1.worldwidetoys.com looking in its own database, finding no answer, and reaching out to its forwarders to ask, “who has 192.168.0.10?”.  Similarly, when winxp1.worldwidetoys.com goes to register 192.168.0.20, it is directed, via the SOA record, to send that registration to dns1.contoso.com.  This is why reverse zones often need to allow unsecured dynamic updates.

Forward Lookup Zones

I have a customer who needs this much data now – I’ll follow up with the Forward Lookup zones in a separate post later this week.

(Originally drafted November 2nd, 2007, finally finished and posted much later)
As I posted last night, we built a new Fedora Core 7 box last night for PHP testing. Whenever at all possible, I leave SELinux enabled on new systems in Enforcing mode. Oracle 10g hasn’t had any issues with it, Oracle 11i EBusiness Suite hasn’t had any issues with it, and my NFS and FTP servers run without at hitch. The Oracle systems are RHEL4 (Red Hat Enterprise Linux 4), and the NFS and FTP servers are RHEL5.

However, this new PHP webserver caused a few glitches. I feel a little silly for not catching this as being an SELinux problem earlier, but since it’s caused 0 issues in 9 months of use in production, I didn’t even consider it initially.

What we initially saw was 0 errors from PHP – all the pages would run without error. PHP.ini has the following lines:

sendmail_from = from@domain.com
sendmail_path = /usr/sbin/sendmail -t -i

and testing cat mail.txt | /usr/sbin/sendmail -t -i as a non-root user delivered mail properly as well. Combine that with /var/log/maillog being completely empty for every test page loaded, and it was sure that the mail wasn’t getting TO postfix (our preferred localhost MTA).

So, I looked at the /var/log/httpd/error_log for apache and found:

sh: /usr/sbin/sendmail: Permission denied
sh: /usr/sbin/sendmail: Permission denied
sh: /usr/sbin/sendmail: Permission denied
sh: /usr/sbin/sendmail: Permission denied
sh: /usr/sbin/sendmail: Permission denied

But I knew that non-root users could access sendmail as defined in php.ini, so I finally decided to tail /var/log/messages and saw:

Nov 2 11:05:41 $(servername) setroubleshoot: SELinux is preventing the sh from using potentially mislabeled files sendmail.postfix (sendmail_exec_t). For complete SELinux messages. run sealert -l c9001c48-5d48-4b7c-9fd7-8400544daa8f

So now to fix it…
This is surprisingly simple, actually. The sad part is, we had this problem, fixed it, forgot about it, had it again, and I blogged it… and lost the post. so this has been sitting in my “drafts” folder for about 10 months now:
setsebool httpd_can_sendmail=true
service httpd restart
service postfix restart

And retry sending mail. There’s a few posts about sendmail and having to change permissions on home directories or on “main.cf”, but I use postfix, and not sendmail, so I don’t know how effective or necessary those changes are.

 

(Edit: repost on 2/23/2012 because of a DB problem losing the original)

I ran into a problem 2 years ago where I couldn’t remember the native packet capture tool for Solaris and couldn’t install tcpdump, so i thought I’d put down as many as many native packet capture commands as I knew, by OS, in a single place.  I’ll update this as I find more, since there’s hundreds of Operating systems out there.

  • AIX: iptrace: /usr/sbin/iptrace [ -a ] [ -b ][ -e ] [ -u ] [ -PProtocol_list ] [ -iInterface ] [ -pPort_list ] [ -sHost [ -b ] ] [ -dHost ] [ -L Log_size ] [ -B ] [ -T ] [ -S snap_length] LogFile
  • FreeBSD: tcpdump (I think): tcpdump [ -adeflnNOpqRStuvxX ] [ -c count ] [ -C file_size ] [ -F file ] [ -i interface ] [ -m module ] [ -r file ] [ -s snaplen ] [ -T type ] [ -w file ] [ -E algo:secret ] [ expression ]
  • HP-UX: nettl: nettl requires a daemon start, and other setup: /usr/sbin/nettl -traceon kind… -entity subsystem… [-card dev_name…] [-file tracename] [-m bytes] [-size portsize] [-tracemax maxsize] [-n num_files] [-mem init_mem [max_mem]] [-bind cpu_id] [-timer timer_value]
  • Linux 2.4 and higher:
    • tcpdump (some distros): tcpdump [ -AdDefKlLnNOpqRStuUvxX ] [ -c count ] [ -C file_size ] [ -G rotate_seconds ] [ -F file ] [ -i interface ] [ -m module ] [ -M secret ] [ -r file ] [ -s snaplen ] [ -T type ] [ -w file ] [ -W filecount ] [ -E spi@ipaddr algo:secret,… ] [ -y datalinktype ] [ -z postrotate-command ] [ -Z user ] [ expression ]
    • wireshark (some distros, used to be called “ethereal”): GUI-config, no command-line, use tethereal (now tshark) for that
    • tshark: tshark [ -a <capture autostop condition> ] … [ -b <capture ring buffer option>] … [ -B <capture buffer size (Win32 only)> ]  [ -c <capture packet count> ] [ -C <configuration profile> ] [ -d <layer type>==<selector>,<decode-as protocol> ] [ -D ] [ -e <field> ] [ -E <field print option> ] [ -f <capture filter> ] [ -F <file format> ] [ -h ] [ -i <capture interface>|- ] [ -l ] [ -L ] [ -n ] [ -N <name resolving flags> ] [ -o <preference setting> ] … [ -p ] [ -q ] [ -r <infile> ] [ -R <read (display) filter> ] [ -s <capture snaplen> ] [ -S ] [ -t ad|a|r|d|e ] [ -T pdml|psml|ps|text|fields ] [ -v ] [ -V ] [ -w <outfile>|- ] [ -x ] [ -X <eXtension option>] [ -y <capture link type> ] [ -z <statistics> ] [ <capture filter> ]
  • Mac OSX: tcpdump (among others): tcpdump [ -adeflnNOpqRStuvxX ] [ -c count ] [ -C file_size ] [ -F file ] [ -i interface ] [ -m module ] [ -r file ] [ -s snaplen ] [ -T type ] [ -w file ] [ -E algo:secret ] [ expression ]
  • Solaris: snoop: snoop [ -aPDSvVNC ] [ -d device ] [ -s snaplen ] [ -c maxcount ] [ -i filename ] [ -o filename ] [ -n filename ] [ -t [ r | a | d ] ] [ -p first [ , last ] ] [ -x offset [ , length ] ] [ expression ]
  • Windows 2000, XP, 2003, Vista, 2008 and beyond:

Any others anyone wants added (or corrected), just comment or email and I’ll update this.
(Edit 7/29/08 – change tcpdump link)
(Edit 10/13/08 – add tshark info, thanks Jefferson!, and wireshark on Windows)
(Edit 2/23/2012 – repost since a DB problem lost this post.  Thanks wayback machine!)