I've got a situation with my SharePoint Indexes. Primarily, it takes about 15 hours to do a full crawl if something goes wrong with them. I am working on ordering new hardware so I can rebuild the farm and unleash about 12 64-bit cores on this task, but until then I have to occasionally deal with situations where the users will upload documents to a library and for some unknown reason, they will not get indexed when the next incremental crawl goes off. I can find no errors in the crawl log related to this, and the only way I can find to get these files to index is to either go in and manually edit the file in some way so that the modification date gets changes thus causing the next incremental crawl to collect them; or reset all crawled content and start a full crawl which as stated takes 15 hours. I've been looking for a better alternative to this, and I want to quote Joel Oleson's blog:
The most creative but ugly scenario for making your index redundant is to create an additional farm that is a single box that has indexing services on it. That single box farm would have a copy of the index or have it’s own indexing schedule. Thus if a failure were to happen you could update the connection of the SSP to the other farm. Not sure who’d do this, but it’s possible.
Well I may be one of the people who would do this, but before I resort to this extreme measure, I have tried another alternative. Good old fashioned backups! I have setup my farm to perform bi-weekly backups of the Index so that if something goes missing I can restore to three days ago or such, and allow the service to index the new content from the last few days rather than a full crawl.
SharePoint Index Facts
Until recently, I believed that all the information for searching was stored in the database, however the actual index is stored on the hard drive of your indexing server, and the actual crawled metadata properties are stored in the database. To see where these items are located:
- Open up Central Admin
- Click Shared Services Administration
- Click on the select menu for the Share Service Provider in question and select Edit Properties
On this page look at the section labeled Search Database. This specifies the location where the crawled metadata properties are stored. Now scroll down to the section labeled Index Server and look at the Path for index file location. This is the location that holds the index of the metadata in the Search Database. Together they work to provide the users with search results. Therefore no backup of just one of these alone will constitute a full backup of your search index. This rules out using SQL backups or file backups since the scope of the content spans across both mediums. The only option left to us is to use SharePoint's native backup capabilities.
How to Backup the Index
Central Admin has a friendly UI that is nice for executing backups, but this is not helpful when you wish to automate the process, so our option is to use the stsadm.exe tool. It is valid to point out that these directions could be easily modified to backup the entire farm in this manner rather than just the index. Something else interesting to note is that Microsoft will not let you backup the index by itself. You can only collect a backup of the index by backing up the entire Shared Services Provider. Hopefully in the future MS will allow for more granular backup options.
- Choose a location where you wish to store the backups, for me each backup requires about 5GB or space, however this will depend upon the amount of indexed content. Select a location that will have enough space to hold your backups. For this example, I will use this location:
\\MOSSDB\IndexBak
- Notice I have selected a network share for the path. This is because data will need to be written to this folder from the Indexing Server and the MOSS database. For this reason, you will want to run the SQL service hosting the SharePoint databases under credentials that have write access to this shared location. Likewise, the SharePoint system account will also need write access here.
- Create a folder to store script files on your index server, for this example I will use:
c:\BackupCmd
- In this example, I am assuming the name of your Shared Service Provider is SharedServices1. Naturally, you want to change this to your SSP's name. In the new folder make a text file named SSPBackup.bat. Then put this command into it:
"C:\Program Files\Common Files\Microsoft Shared\web server extensions\12\BIN\stsadm.exe" -o backup -directory "\\MOSSDB\IndexBak" -backupmethod full -item "SharedServices1"
- Be sure to replace the directory path and the SSP name with those used by your farm. Now that the bat file is created, let's test it. Click Start -> Run. Type cmd and press Enter. In the console window, type cd c:\BackupCmd and press Enter. Now type SSPBackup and press Enter. Make sure the command runs without error. If it errors out, make sure your permissions are set both in the \\mossdb\Indexbak file share and in the NTFS security panel.
How to Clean Up your Backups
Now let's discuss how to delete the old backups after they age. Someone at Microsoft has written a nice script to handle this for us. To take advantage of this:
- Create a new text file in the BackupCmd folder named DeleteOldBackups.vbs. Copy this code into it, or you can get the code direct from Microsoft in the link above:
' Title: BackupCleanUp ' Description: Deletes SharePoint 2007 backups that are older than a specified ' number of days and then removes the backups from the backup history. Setlocale(1033) Dim nNumberOfDays Dim strTOCFile Dim dtDeleteDate Dim sTemp Set objXML = CreateObject("Microsoft.XMLDOM") Set objFS = CreateObject("Scripting.FileSystemObject") Set objLog = objFS.OpenTextFile("BackupCleanUp.log",8,true) ' Validate command line arguments and initialize data. If WScript.Arguments.Count = 2 Then If IsNumeric(WScript.Arguments(0)) Then nNumberOfDays = CInt(WScript.Arguments(0)) dtDeleteDate = DateAdd("d",nNumberOfDays*-1,Now) Else WScript.Echo "<NumberOfDays> must be an integer value." End If strTOCFile = WScript.Arguments(1) Else WScript.Echo "Usage: BackupCleanUp <NumberOfDays> <PathToTOC>" WScript.Quit End If objLog.WriteLine(Now() &vbTab& "Start: Clean up backups older than " &nNumberOfDays& " days from " &strTOCFile& ".") ' Load the SharePoint backup and restore the TOC file. objXML.Async = false objXML.Load(strTOCFile) If objXML.ParseError.ErrorCode <> 0 Then objLog.WriteLine(Now() &vbTab& "Error: Could not load the SharePoint Backup / Restore History." &vbCrLf&_ Now() &vbTab& "Reason: " &objXML.ParseError.Reason& ".") WScript.Quit End If ' Delete backup nodes that are older than the deletion date. For Each objNode in objXML.DocumentElement.ChildNodes If CDate(objNode.SelectSingleNode("SPFinishTime").Text) < dtDeleteDate Then If objNode.SelectSingleNode("SPIsBackup").Text = "True" Then sTemp = mid(objNode.SelectSingleNode("SPBackupDirectory").Text,1,len(objNode.SelectSingleNode("SPBackupDirectory").Text)-1) 'objFS.DeleteFolder(mid(objNode.SelectSingleNode("SPBackupDirectory").Text),1,len(objNode.SelectSingleNode("SPBackupDirectory").Text)-1) objFS.DeleteFolder sTemp objLog.WriteLine(Now() &vbTab& "Deleted: " &objNode.SelectSingleNode("SPBackupDirectory").Text& ".") objXML.DocumentElement.RemoveChild(objNode) End If End If Next ' Save the XML file with the old nodes removed. objXML.Save(strTOCFile) objLog.WriteLine(Now() &vbTab& "Finish: Completed backup clean up.")
-
Now we need to write another bat file that will take advantage of this script. Create a file in the BackupCmd folder named SSPCleanUp.bat. Into this file, paste this command:
cscript DeleteOldBackups.vbs 13 "\\MOSSDB\IndexBak\spbrtoc.xml"
-
This command will delete all backups in the file share that are older than 13 days. It is easy to change this to any retention period you prefer.
- Be sure to test this bat file to make sure it is functional. If necessary, you can edit the dates in the spbrtoc.xml file so that some backups appear older.
How to Automate the Batch Files
Now that the batch files are working, it's time to automate the process. Windows has a tool to do this that is pretty easy to use called Scheduled Tasks.
- Click Start -> Control Panel -> Scheduled Tasks -> Add Scheduled Task
- Click Next -> Browse and drill down and select the SSPBackup.bat file we created in the c:\BackupCmd folder.
- Give the task a name like SSP Full Backup, select Weekly and click Next.
- Much of these options are up to you. For this example, select 7:00pm and Monday and Thursday and click Next.
- On this page you will be prompted for credentials to run this command under. Use your SharePoint system account here. Click Next and click Finish.
- In your scheduled tasks window, you can now test your task by right clicking on it and selecting run.
Use this same technique to automate the SSPCleanUp.bat task as well, you will likely want to run it about an hour after the backup process or in this example at 8:00pm.
This blog post is starting to run long so I will stop here and I will post again soon on how to perform a restore from this backup. Until then best of luck to you.
-Mark Gabriel