Active Directory Replication Survival Guide
A definite survival series for every Active Directory Engineer.
Microsoft Active Directory has evolved since NT days, engineers have designed , deployed active directory in several different designs and regions, one of the night mare of any System¶s Engineer / System Administrator who manages active directory is to troubleshoot replication problems. Active directory replication problems always have different shape and cannot follow common procedure / hand book to resolve the problem. With my experience in handling active directory and answering numerous posts in Microsoft TechNet forums(Directory Services , GPO and Networking) , I thought of writing the possible troubleshooting steps to resolve Active Directory replication problem. This article covers Active Directory Replication and doesn¶t cover basics of Active Directory.
Why Active Directory Replication Is Required ?
Back in NT days, directory database was managed by Primary Domain Controller ( PDC ) which manually replicated the changes to other Domain Controllers / Back up domain controllers. In the event of PDC failure / shutdown, no changes were replicated to BDC¶s, this model did not provide Data availability and Fault Tolerant features. Microsoft restructured active directory design and introduced Distributed Multimaster Replicated Database, which means the changes made at one database were automatically gets replicated to all other active directory databases which provides Data availability and Fault Tolerant features
What Data Gets Replicated ?
When you promote Windows Server 2003 / Windows Server 2008 operating system to Domain Controller and configure either New Domain / Additional Domain / Child Domain under New Forest / Existing Forest / Domain Tree, a Active Directory Database gets created. This database holds Active Directory Objects , these objects represents users / computers / groups etc. Every Domain Controller of particular domain holds the database which gets replicated across the domain controllers in the forest. Data gets replicated when y y y An object is created An object is modified An object is deleted An object is moved.
y
A database is collection of Directory Partitions which are very important to understand when working with Active Directory Replication. These are classified into 1. Schema Partition: Forest level directory partition which holds definitions of all the objects. Every Domain Controller in the forest has writable copy of schema partition.
2. Configuration Partition: Forest level directory partition which holds information specific to topology and contains data responsible for functioning entire active directory. 3. Domain Partition : Domain wide directory partition which stores data related to users, computers, groups and is replicated to all the domain controllers attached to particular domain.
Components Involved In Replication Design:
From the above explanation, we have seen why do we need Replication and the Components involved in Active Directory Replication across Domain Controllers. We will see what are the components required to replicate the database and the corresponding changes across the domain controllers. Following are the major components involved for successful active directory replication. y Sites: Site represents computers located in LAN or MAN. A site can consists of multiple IP subnets. Computers locate to nearest Domain Controller using sites. A site can consist of domain controllers from different Domains. Site Links: Site Link establishes logical path between two sites for replication to occur. Without site links replication cannot be performed. Bridgehead Servers: Administrators can assign a dedicated server to receive replication updates between sites and ensures replication occurs within the sites. Connection Objects: Replication partners are defined using connection objects. This connection is one way inbound route between domain controllers. InterSite Topology Generator ( ISTG ) : First domain controller in the site has the ISTG and this ISTG determines the replicas of directory partitions ( domain, configuration, schema )
y
y
y
y
There are 2 types of Replication which occurs y Intrasite Replication : Replication occurs within the site, any changes to the database are communicated between the domain controllers within the site. Domain Controllers make use of RPC as communication medium. Intersite Replication: This type of replication occurs between sites. Intersite replication occurs between Bridgehead servers and can use RPC / SMTP as communication medium.
y
I have listed the major components involved in replication and a small introduction pertaining to each component as our main topic would be troubleshooting replication problems. To understand more about the above components, you can visit the following site. http://technet.microsoft.com/enus/library/cc755994(WS.10).aspx
Troubleshooting Tips For Every Replication Issues
Below troubleshooting steps help administrators to troubleshoot and resolve replication errors successfully. I have written these steps which covers every component responsible for replication errors. y First step would be analyze IPconfig/all output from the Domain Controllers and the clients which are experiencing problems. Make sure the following are appropriate from the result 1. Primary DNS Suffix 2. Make sure Wins Proxy Enabled to ³No´ 3. Make sure IP Routing Enabled is set to ³No´ 4. Do not point the DNS servers to External / Public DNS servers. 5. Do not multihome the Domain Controllers which is not supported. Point the primary DNS server IP to itself on the domain controllers and try restarting the DNS service Make sure _msdcs.<domainname> do exists in the DNS server Make sure Domain controller A records are created both in forward lookup and reverse lookup zone Make sure you create appropriate delegation on the child domain controller where appropriate. Make sure Kerberos key Distribution Center service is started and startup type is set to automatic. Make sure DnsAdmins group has full control over MicrosoftDNS tree Make sure DNSUpdateProxy security group has not been renamed Make sure the following ports are opened across the firewalls which is a major check and run portqry utility to verify if the ports are able to communicate. http://technet.microsoft.com/en-us/library/dd772723(WS.10).aspx One of the utility administrators can rely on is Dcdiag which is a domain controller diagnostic tool. You can perform DNS diagnostic tests , security configuration errors and using dcdiag /v ( verbose ) gives you complete information required to troubleshoot. Omit the passed results and concentrate on failed results from the output. You can use dcdiag for checking 1. Sysvol replication errors 2. DFS replication 3. DNS configuration errors 4. File replication service errors 5. RPC endpoint mapper errors.
y
y y
y y
y y y
y
y
Run repadmins /showreps on the domain controller and check for the errors, the errors might vary but make sure DNS alias for NTDS setting object exists.
y
General replication problems occurs, when decommissioning DC and promoting new server with same name , in these cases verify if there are manually created connection objects under NTDS settings object of old DC in sites and services , if they are present , remove them manually and recycle KCC using repadmin /kcc . Also delete any records pertaining to older DC in DNS server (eg: CNAME). Later use repadmin /replsum to confirm replication works fine. Replication errors also relate to secure channel which might be corrupted at times. You can reset secure channel using netdom utility. You need to stop the KDC service and set it to manual, later run the following command Netdom resetpwd /server:<server ip> /userid:User /password:* and start the KDC service and test . Repadmin is an useful utility to check the replication between domain controllers and perform different actions such as creating replication topology, force replication etc. To understand different switches please follow the link http://technet.microsoft.com/enus/library/cc770963(WS.10).aspx . Make sure you run the utility with administrator priviliges. Another important switch not be missed with Repadmin is repadmin /replsummary. Lingering objects are one of the major source of replication issues, which results in users unable to login, universal group continue to exists in access token, exchange mail box cannot be created, users will be unable to access shares. Run repadmin /removelingeringobjects command which compares the objects on domain controllers and remove them successfully.
y
y
y
Above troubleshooting steps are extracted from real world scenarios which help administrators to resolve replication errors successfully.