Generate a Bulk Database Update Script And Running It In Batches
Scenario
I have to update a table of users with the correct emails. The emails are located in another table. Some caveats:
- The number of rows affected are ~2.5 million.
- If the email is already present in the Users table, the update must not occur.
- The query must not lock the database.
Generating the Script
I used SQL to create the update script. I wanted a single UPDATE command for every row. First, I grabbed the IDs and emails for the affected rows and saved them into a temporary table using a SELECT...INTO statement. It takes a few minutes, but shouldn't be too long especially when the tables to be queried are indexed.
Here is the snippet for generating the UPDATE commands:
SELECT CONCAT('UPDATE Users SET Email = ''',
REPLACE(uam.Email, CHAR(39), CHAR(39)+CHAR(39)),
''' WHERE [Id] = ', temp.Id, '
AND NOT EXISTS
(SELECT Email FROM Users(nolock)
WHERE Email = ''',
REPLACE(temp.Email, CHAR(39), CHAR(39)+CHAR(39)),
''')')
as Update_Commands
FROM _TempAccountsToUpdate(nolock) temp
- The REPLACE bit is required to stop single quote characters (') in an email address from messing up the generated command.
- The NOT EXISTS clause ensures that if the email already exists in the Users database, the update command will not run.
The results of this query are saved into a SQL file.
Batching the Script
Next, I wanted to chunk the file into smaller files of 100 statements each. I figured that running 100 updates at a time wouldn't kill the database.
I installed Cygwin so I could use the split UNIX command. The snippet below takes the file C:\cleanup\bigfile.sql as input, and every 100 lines are saved into a new file with the filename fileXXX.sql.
For more explanation regarding the syntax, check out this StackOverflow answer.
split -l 100 -a 5 -d --additional-suffix=.sql C:\cleanup\bigfile.sql file
Running the Script
I wrote a simple Powershell script that loops through the files and executes the SQL commands. After a file is run, it sleeps for 2 seconds. I added the -o <OUTPUT_FILE> option so I had a log of which rows were updated successfully.
$counter = 1
while ($counter -lt 2500)
{
$file = "C:\cleanup\file$($counter).sql"
Write-Output "Running $($file)..."
# run script
& sqlcmd.exe -b -i $file -S "<SERVER_NAME>" -d "<DATABASE_NAME>" -U "<USERNAME>" -P "<PASSWORD>" -o "<OUTPUT_FILE>"
# increment counter
$counter++;
Write-Output "Sleeping for 2 seconds..."
# sleep
Start-Sleep -Milliseconds 2000
}
Write-Output "Finished."
Karsten Würth (@karsten.wuerth)