The mysterious case of the failed AD domain join

By February 11, 2023Windows AutoPilot

[ad_1]

Sometimes I do some weird stuff, but this wasn’t one of those times. All I was trying to do was simple Configuration Manager task sequences to do bare metal deployment of Windows 10 and Windows 11, nothing fancy (not even any MDT integration, not that there’s anything wrong with that). But a strange thing happened: The Windows 10 task sequence worked fine, while the Windows 11 one failed to join AD. It’s the same domain, no OU specified, same join account. So how could one succeed and the other failed?

To get to that answer, it’s useful to provide another clue: The Windows 11 computer that was failing had an unexpected computer name assigned to it. I didn’t specify a name in the task sequence, so where did that come from? Well, I had previously done a bare metal deployment using Tanium Provision, which uses offline domain join (ODJ) to join the computer to AD, and it generates computer names using a pattern. I had specified a pattern of “TAN-%RAND:8%” to get a name that started with “TAN-” and ended with eight random digits. But how was that name ending up on the Configuration Manager-deployed OS?

While thinking about that, I figured I would check the NetSetup.log on the computer to see why the AD domain join failed during the Configuration Manager task sequence. That showed an error 2732. Ugh, the dreaded — and brand new — 2732 return code. (One of these days I’m going to write a series of blogs where each one is about a particular error code. It’s fertile ground.) So what does error 2732 mean? Easy enough to look it up:

OK, I get the gist of it: The computer name that Configuration Manager was using was already in AD, and that computer account couldn’t be reused. But since I was using the domain Administrator account to do the join, and it has full rights to all objects in AD, why didn’t it just replace the computer (or technically, just reset the computer account password so that it could be used again)? That’s related to a Microsoft update released back in October:

KB5020276—Netjoin: Domain join hardening changes

The critical text:

New behavior 

Once you install the October 11, 2022, or later Windows cumulative updates on a client computer, during domain join, the client will perform additional security checks before attempting to reuse an existing computer account.

Algorithm:

  1. Account reuse attempt will be permitted if the user attempting the operation is the creator of the existing account.
  2. Account reuse attempt will be permitted if the account was created by a member of domain administrators.

OK, so how does that apply in my situation? First, I’m using Windows 10 and 11 images that are fully patched, so they include this change. Next, when I did the initial deployment using Tanium Provision, the service that performed the offline domain join account was not running as a domain administrator account. Instead, it was running as LocalSystem, so running as the computer account of the machine the service was installed on, and that account had been delegated rights to the default OU for the AD domain. So, the join is being legitimately blocked due to this change.

But why? I’ve read through a whole bunch of posts about the underlying CVE-2022-38042 that this is designed to address. And it’s still not entirely obvious what this change solves. To exploit the vulnerability, you need a user account and password that has rights to a computer object in AD. So, I’m guessing this involves using that user account and password to “hijack” any computer account in AD that is owned by one of those people, and perhaps then taking on the rights of that computer account, which could be in one or more AD groups? This will make it “slightly” harder (since there’s a registry key workaround, the exploit would need to do one more step); perhaps this workaround will be removed at some future point. And if the user account and password already had Domain Admin rights, then you wouldn’t need to try this, since you would already have the needed rights.

Alright, so back to my issue. I ran into this because of a computer account that was created by a non-Domain Admin that was trying to be updated/replaced with a computer account being created by a Domain Admin, and that’s being blocked, hence the 2732 error. But where is this computer coming from? Well, back to my original use of this VM: I deployed the machine using Tanium Provision, which joined the device to AD. I then manually installed the Configuration Manager client (for other random purposes). Now, when I boot the computer using PXE or boot media, it’s not an unknown computer, it’s an existing client. So the _SMSTSMachineName variable is being set to the name from that Configuration Manager client record, and since I didn’t set OSDComputerName, that _SMSTSMachineName value is used (as opposed to the random name being used on the VM that I was deploying Windows 10 to, since it had never been a Configuration Manager client).

How do I fix the problem? In my case, the simple answer is to delete the existing Configuration Manager computer object so the computer is again an unknown computer. Then the first join will succeed because there is no existing object, and subsequent join attempts via Configuration Manager will succeed because the first join was done by a Domain Admin and the subsequent one was as well.

But let’s say you were a “security conscious” IT admin. You would be using an AD join account that wasn’t a member of Domain Admins, you would just delegate it the needed rights. And as a result, you’re going to run into this all the time, at least once your OS images have been updated to include the October 2022 cumulative update (or later). Hence you’ll probably find yourself reading articles like this one to implement a workaround.

For those using Tanium Provision, we ran into the same problem which we fixed in a different way. Previously, when creating an offline domain join blob for a new computer, we would change the name by adding a numeric suffix any time we got an error 2224:

because we didn’t want to replace the existing computer account. (Really, it’s a risky thing to do, depending on the rights of the join account you are using. If you use a Domain Admin account, it’s especially bad. Imagine if you let the user type in a computer name, and they specified the name of an existing domain controller. That would effectively kill the DC and you’d have a new workstation with the old name — and you’d be having a very bad day.) But we were only checking for return code 2224, not the new 2732 that effectively means the same thing. We’ve added logic now to check for both return codes and will be deploying that change soon.



[ad_2]
Source link

Share this post via

Leave a Reply