Generate a Bulk Database Update Script And Running It In Batches

code snippet May 07, 2019

Scenario

I have to update a table of users with the correct emails. The emails are located in another table. Some caveats:

  1. The number of rows affected are ~2.5 million.
  2. If the email is already present in the Users table, the update must not occur.
  3. The query must not lock the database.

Generating the Script

I used SQL to create the update script. I wanted a single UPDATE command for every row. First, I grabbed the IDs and emails for the affected rows and saved them into a temporary table using a SELECT...INTO statement. It takes a few minutes, but shouldn't be too long especially when the tables to be queried are indexed.

Here is the snippet for generating the UPDATE commands:


SELECT CONCAT('UPDATE Users SET Email = ''', 
REPLACE(uam.Email, CHAR(39), CHAR(39)+CHAR(39)), 
''' WHERE [Id] = ', temp.Id, ' 
AND NOT EXISTS 
(SELECT Email FROM Users(nolock) 
WHERE Email = ''', 
REPLACE(temp.Email, CHAR(39), CHAR(39)+CHAR(39)), 
''')') 
as Update_Commands
FROM _TempAccountsToUpdate(nolock) temp
  • The REPLACE bit is required to stop single quote characters (') in an email address from messing up the generated command.
  • The NOT EXISTS clause ensures that if the email already exists in the Users database, the update command will not run.

The results of this query are saved into a SQL file.

Batching the Script

Next, I wanted to chunk the file into smaller files of 100 statements each. I figured that running 100 updates at a time wouldn't kill the database.

I installed Cygwin so I could use the split UNIX command. The snippet below takes the file C:\cleanup\bigfile.sql as input, and every 100 lines are saved into a new file with the filename fileXXX.sql.

For more explanation regarding the syntax, check out this StackOverflow answer.


split -l 100 -a 5 -d --additional-suffix=.sql C:\cleanup\bigfile.sql file

Running the Script

I wrote a simple Powershell script that loops through the files and executes the SQL commands. After a file is run, it sleeps for 2 seconds. I added the -o <OUTPUT_FILE> option so I had a log of which rows were updated successfully.


$counter = 1

while ($counter -lt 2500)
{
	$file = "C:\cleanup\file$($counter).sql"

    Write-Output "Running $($file)..."

    # run script
    & sqlcmd.exe -b -i $file -S "<SERVER_NAME>" -d "<DATABASE_NAME>" -U "<USERNAME>" -P "<PASSWORD>" -o "<OUTPUT_FILE>"

    # increment counter
    $counter++;

    Write-Output "Sleeping for 2 seconds..."

    # sleep
    Start-Sleep -Milliseconds 2000
}

Write-Output "Finished."

unsplash-logoKarsten Würth (@karsten.wuerth)

Kristina Alberto

Software engineer from Sydney. I work at Domain Group.