Generate unique hash-based identifiers using the Python generator

You may need to populate a column with unique identifiers derived from existing column values—for example, when auto-incremented keys are not suitable because the IDs must reflect the actual data and remain consistent across repopulation.

This guide explains how to use the Python generator in dbForge Data Generator for SQL Server to generate such identifiers using a custom hash function.

Scenario

Suppose you have the actor_sample table and want to populate the Unique_ID column with identifiers derived from existing column values.

CREATE TABLE actor_sample (
  actor_id SMALLINT NOT NULL PRIMARY KEY
 ,first_name VARCHAR(45) NOT NULL
 ,last_name VARCHAR(45) NOT NULL
 ,last_update DATETIME NOT NULL DEFAULT (CURRENT_TIMESTAMP)
 ,Unique_ID VARCHAR(255) DEFAULT NULL
);

The Unique_ID column must meet the following requirements:

  • Each value must be unique for every distinct combination of first_name, last_name, and last_update.

  • Values must not be based on auto-increment, as they need to reflect the actual row data.

  • The same combination of column values must always produce the same identifier, even when the column is repopulated.

Solution

A hash function takes an input of arbitrary type and returns a fixed-length value—a hash code—that is unique to that input and always the same for the same input. This makes hash functions well-suited for generating identifiers that depend on row data and remain consistent across repopulation.

The following script uses Python’s hashlib module to compute a SHA-1 hash from the values of the three source columns.

import hashlib
t = "|%s|%s|%s|" % (first_name, last_name, str(last_update))
t1 = hashlib.sha1(t)
t1.hexdigest()

The script does the following:

  • Imports the hashlib module.

  • Converts non-string column values to strings and concatenates them into a single string using pipe characters as delimiters.

  • Computes a SHA-1 hash of the concatenated string and returns it as a hexadecimal string.

Generate data using the Python generator

1. On the toolbar, click New Data Generation.

2. On the Data Generator Project Properties page, specify the required connection and database, then click Next.

The Data Generator Project Properties page of the Data Generation Wizard with the connection and database specified

3. Optional: On the Options page, configure your data generation project, then click Open.

The Options page of the Data Generation Wizard with the data generation project configured

4. In the navigation tree of the data generation project, select the table you want to generate data for.

5. Select the column you need to assign the Python generator to (Unique_ID).

6. In the Generator list, select Python.

7. In the text box, enter the Python script and preview the generated data.

Data Generator project window with a table selected, the Unique_ID column highlighted, and the Python generator configured

8. At the top of the document, click the green arrow to populate the table with the generated data.

9. On the Output page of the Data Population Wizard, select Execute the data population script against the database, then click Next.

The Output page of the Data Population Wizard with the execute script option selected

10. On the Options page, configure data population options, then click Next.

The Options page of the Data Population Wizard with data population options configured

11. Optional: On the Additional Scripts page, specify scripts to run before or after data population, then click Next.

12. On the Summary page, review the action plan and warnings, then click Generate.

Summary page of the Data Population Wizard showing the Action Plan and Warnings tabs

After generation completes, the Unique_ID column is populated with hash-based identifiers derived from the source columns.