Machine Replacements, Part the First

Aug 13, 2005 12:10

My current project is rolling out brand new PCs to a number of new employees - specifically the ones with really old PCs. Like everything else around here, we've fucked it up. Here's how.

  • The machines and LCD monitors were ordered in three batches of 21. The first two have been delivered, the last one is in limbo.
  • A deployment lab with a Ghost server and a KVM/network switch was set up on a physically isolated network segment. The KVMN can handle 4 PCs at once.
  • The PCs, when delivered by shipping were unboxed and stacked in the deployment lab, as were the new LCD monitors.
  • The empty boxes are stored in the hall until somebody takes them to the dumpster, as once the PCs are removed Shipping won't touch them because they're garbage and not shipping.
  • A "corporate standard" Ghost image was provided on DVD by head office. This image had to be completely rebuilt from scratch as it contained various software either unnecessary or incompatible with our branch office environment. The image has had to be modified (sometimes extensively) during the deployment process, requiring all previously deployed PCs to be manually updated.
  • The Ghost server is used to multicast the standard disk image to the new PCs, four at a time. The image is sysprepped beforehand with a sysprep.inf file that includes license key, local admin password, etc. It does not name the PC or join the domain.
  • Once imaged, the new PCs are moved to a domain-connected network segment, rebooted, given a name and joined to the domain. Originally they were attached to the NT4 domain, but now they are joined to the new AD domain. Only the IT manager can do this, because neither he nor head office can figure out how to delegate "join machine to domain rights" to an account in a trusted NT4 domain (namely, mine).
  • The new PCs are named after the user they are to be deployed to. In many parts of the corporate compound, one PC is shared by as many as six people. In these cases, a name for the PC is either chosen at random from the group of users, or given a name that describes its function.
  • The PCs are deployed according to an 18-month old inventory spreadsheet which purports to identify which network PCs are < 500 MHz. It is wrong about 25% of the time. The machines to be replaced are scattered across a quarter-mile wide corporate compound.
  • User settings are migrated from the old PCs to the new PCs using Windows XP's User State Migration Tool.
  • After deployment, the technician must reset the user's password, log the password reset for security purposes, log in as the user and set up their Outlook 2003 profile and Citrix profile, and install any user-specific software.
  • Post-deployment, problems relating to Java, XPSP2 firewall and tighter security, and the PCs being in the AD domain while the user accounts are still in the NT4 domain are resolved on a case-by-case basis.

  • Got all that? Okay, now here's why things aren't working, in case you haven't already figured it out.

    1. Lack of planning. The overall deployment was done under the rubric of "Shit, the new PCs are here! Quick, get them out into the compound!". We had (and continue to have) little or no information about where our existing PCs are, who's using them, and what for. Although we have a fairly sophisticated outsourced web-based inventory system, the information it gathers is not user-modifiable, and they stopped using it a year and a half ago when it started having some problems registering some of the PCs. Changes to the deployment procedure are being made constantly, on an ad hoc basis without any forethought to their implications, and they aren't being tested before being rolled out.
    2. Trying to do too many things at once. An Active Directory deployment and a new PC rollout are two different projects, and by conflating them we've just caused more headaches as we try to deal with the tighter security restrictions of 2003 AD (people can't print, for instance, because AD won't let you print to a printer outside your forest. All the new PCs are in the AD. All the printers are in the old NT4 domain.)
    Okay, sit back, grab a fresh cup of joe, and I'll tell you how we should have done this.

    The Road Ahead

    First of all, the lack of any real information about our current environment should have sent up a red flag right away. While a real inventory management system is an absolute necessity in a branch this size, waiting for that would have put this off forever. Better to focus on what was needed for just this project: a list of the PCs that needed to be replaced. It should have been recognized right off that an 18-month old deprecated inventory system was not going to do it.

    Fortunately, this compound can be divided into two physical areas: "offices" and "manufacturing". The offices are concentrated in one area of the compound, but the manufacturing PCs are scattered throughout the compound and frequently exist in safety zones requiring protective equipment. Office workers generally have one primary PC that only they use; manufacturing personnel frequently roam around using many different PCs. Since previous PCs have been bought in batches at specific times, it is usually trivial to tell how fast a PC is here simply by looking at the case.

    Right away, the project is divided into two phases: offices and manufacturing. A quick inventory of the office PCs is done via a walkaround to the various cubicles. A partial inventory of the manufacturing PCs is done for those parts of the compound where PCs tend to cluster, such as foremen's offices.

    With this inventory in hand, we now know how many of the new PCs and monitors have to be delivered to each building. A secure area in each building is identified (a supervisor's office with a locking door will do) and a drop-off plan is given to shipping indicating how many boxes are to be delivered to that secure area in each building. The remaining PCs are stored in the IT workbench area.

    Standard and Automatic

    The "corporate standard" image presents a political problem, but a "standard" image that actually breaks a local branch network isn't a very well-thought-out standard. In this case, we set up a meeting with the head office IT managers responsible for the standard and identify the software and policies which are truly global standards (as opposed to what the head office branch happens to need locally). A prototype image is set up and rigorously tested in the production network to identify any potential problems. Particular attention is paid to XP SP2 firewall issues, IE 6 security enhancements, and any software which may not respond well to disk image-based deployment (such as network cleints which only generate a GUID upon install).

    A completely unattended install process is developed that will install Windows XP SP2 and all standard applications silently and without any human interaction. This install process is standardized and placed under change control. It is then used to create a standard deployment disk image for the new PCs. An unattended setup file is then used as a basis for both the unattended setup and the sysprep.inf file. This unattended setup file provides the license key and the corporate identification string, gives the PC a name selected from a predefined list, sets the local Administrator password to the corporate standard, and joins the PC to the NT4 domain.

    Aside: the PC naming convention was set based on the perceived need by head office for helpdesk technicians to find a particular person's PC in the browse list easily for remote control purposes. Since approximately half of the PCs in our branch office are not assigned to one specific person, this was a bad decision from the beginning. Windows XP's Remote Assistance Request feature allows a user to direct the helpdesk technicians to their PC by sending a one-click email, and there are many simple ways of matching a user to a machine. Since PCs tend to move from user to user over the course of their lifetime, the current naming scheme implies constant updating of the AD machine account names. Naming the PCs after their hardware serial numbers is the best solution, but can cause wrinkles in deployment.

    Note that we are not joining the PCs to the AD in this step. Combining the PC rollout with an AD migration is a mistake, and one we rectify here by rolling out all the corporate PCs and then migrating them (and the user accounts) to the AD as a separate project.

    A PXE-boot server is built with a boot menu that provides two options: deploy a disk image and perform an unattended install. This server is deployed into the production network. The boot menu should be password protected so as to prevent users from accidentally (or deliberately) re-installing their PCs without IT guidance.

    Divide and Conquer

    Phase One: The Offices

    With the necessary infrastructure in place in the back office, we turn to the front office. In phase one, the offices, we divide the offices into groups of people with similar software and environment needs. An unattended software install package comprising specific software for each group is created and stored alongside the standard image unattended installs. All unattended software install packages are placed under change control.

    Aside: in our environment, the unattended installs of Outlook 2003 failed when the user logged in because the alias in the Exchange server did not match the userids, which were changed as the result of a merger. When I say unattended, I really mean unattended: the helpdesk technicians should not have to log in to anyone's PC to install or configure any aspect of the software. If you have back office issues like our mismatched alias issue, fix those problems before you do anything else.

    Each group is assigned a date on which their PCs will be replaced and multiple warning notices are sent out to the users reminding them of any preprocessing steps necessary (such as moving files to network storage, if they've been sloppy about it in the past). The actual replacement of hardware is done after business hours to minimize disruption of the user's workday.

    Roll Out

    The actual replacement process is defined as follows, and is placed under change control:
    1. The login script is amended for the specific deployment group to run the group unattended software package install and the User State Migration Tool on login for the users whose PCs are being replaced. The script logic refrains from running these packages unless the user is logging on to the specific new PC that is being deployed.
    2. A helpdesk technician logs in to the user's PC and runs the User State Migration Tool to capture their environment. While the Microsoft white paper on USMT recommends "asking the users to run this tool before they log out for the night", it is assumed that they will forget. If you are running an environment where logout scripts as well as login scripts can be controlled, this may be all that is necessary to capture the users' environment.
    3. The old PCs are replaced with the new hardware. Each new PC in turn is deployed, booted, and the PXE boot client used to initiate a disk image deployment. Any necessary updating of PC inventory records is performed. By doing this in turn, the new PCs will be ready for use by the time the last PC has been deployed. While the last PC is being imaged, the old PCs are moved to the secure drop-point and any boxes, styrofoam garbage, etc., are disposed of (by keeping the garbage in small, distributed amounts, we make it easier to dispose of than trying to break down and eliminate 60 boxes and styrofoam inserts all at once).
    4. The next morning, as the users log in, their specific software package is automatically installed and their environment migrated to their PC. After an automatic reboot or two, they have a new PC with everything they need to begin work.
    As a side benefit, this deployment method has future use as a tool for rolling out new PCs for a specific group.

    Deployment is done on a group-by-group basis, with post-mortem analysis performed after each completed group. Any errors or issues discovered must be rectified and the deployment procedure amended before the next group is deployed.

    End of Phase One

    A full post-mortem analysis of Phase One is performed. Any and all errors and issues are documented, resolved if possible, procedures amended if necessary, and incorporated into the project documentation archive.

    It's late, and that's it for the first part. I'll publish Part the Second in a day or two.

sysadmin

Previous post Next post
Up