The topic provides the examples of IronPython scripts, which you can use to generate data with the Python script generator. You can use these scripts examples as a template for creating your own scripts.
The script generates life years based on the date of birth and date of death.
<?xml version="1.0" encoding="utf-8"?>
<Generators xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Generator>
<Category>Personal</Category>
<Name>Age</Name>
<ShortName>Age</ShortName>
<Description>generate number of years that are dependent on date of birth and date of death</Description>
<Type>SqlPythonGenerator</Type>
<ColumnTypes>
<Type>int</Type>
<Type>smallint</Type>
<Type>tinyint</Type>
</ColumnTypes>
<Matches>
<Match table=".*" column="^AGE$" priority="100" />
</Matches>
<Options>
<Expression># The generator calculates the age based on the date of birth and date of death.
import random
from System import *
from System import DateTime
from System import DBNull
from System.Globalization import CultureInfo
def main():
# DOB - name of the column in the current table that contains dates of birth.
# This name should be replaced with the actual name of the column of your table, for example DAYOFBIRTHDAY.
# DOD - name of the column in the current table that contains dates of death.
# This name should be replaced with the actual name of the column of your table, for example DAYOFDEATH.
bd = DOB
dd = DOD
if str(bd) == '' or bd == DBNull.Value or str(dd) == '' or dd == DBNull.Value:
return 0
birthdate = ''
deathdate = ''
if type(bd).__name__ == 'str':
birthdate = DateTime.ParseExact(bd,"dd/MM/yyyy", CultureInfo.InvariantCulture)
else: # else it's supposed to be DateTime
birthdate = bd
if type(dd).__name__ == 'str':
deathdate = DateTime.ParseExact(dd,"dd/MM/yyyy HH:mm:ss.fff", CultureInfo.InvariantCulture)
else: # else it's supposed to be DateTime
deathdate = dd
if deathdate < birthdate:
return 0
# Calculate the age based on the date of birth and date of death.
return Convert.ToInt32(Math.Truncate((deathdate - birthdate).TotalDays / 365))
</Expression>
<AllowNulls>false</AllowNulls>
</Options>
</Generator>
</Generators>
The generator generates dates of death based on the dates of birth with the possibility to set lifespan that depending on the weight list.
<?xml version="1.0" encoding="utf-8"?>
<Generators xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Generator>
<Category>Personal</Category>
<Name>Death Date</Name>
<ShortName>Death Date</ShortName>
<Description>fill with dates of death with different probability</Description>
<Type>SqlPythonGenerator</Type>
<ColumnTypes>
<Type>date</Type>
<Type>datetime</Type>
<Type>datetime2</Type>
<Type>smalldatetime</Type>
</ColumnTypes>
<Matches>
<Match table=".*" column="^Death.*Date$" priority="90" />
<Match table=".*" column="^Date.*Death$" priority="90" />
<Match table=".*" column="^Day.*Death$" priority="90" />
<Match table=".*" column=".*dead" priority="10" />
<Match table=".*" column="^DOD$" priority="20" />
<Match table=".*" column="^ddate$" priority="30" />
<Match table=".*" column="^dday$" priority="10" />
</Matches>
<Options>
<Expression>#This generator generates dates of death basing on the dates of birth with the option to set lifespan that is based on the weight list.
#It has been made by request of our user from forum http://forums.devart.com/viewtopic.php?f=23&t=33396
import random
from System import *
from System import DateTime
from System import DBNull
from System.Globalization import CultureInfo
def main():
# DOB - name of the column in the current table, that contains dates of birth.
# This name should be replaced with the actual name of the column of your table, for example DAYOFBIRTHDAY.
bd = DOB
if str(bd) == '':
return DBNull.Value
# "dd/MM/yyyy" - is the format of date of birth. If it differs, make the following modifications.
bdFormat = "dd/MM/yyyy"
x = random.randint(1, 100)
deltaDays = 0
# Setting of percentage ranges for ages. The sum of all ranges should be equal to 100%.
if x <= 5: # 5 %
deltaDays = random.randint(1, 3650) # 5 % - death at age 1..3650 days (1-10 years)
elif x > 5 and x <= 20: # 15 %
deltaDays = random.randint(3650, 18250) # 15 % - death at age 3650..18250 days (10-50 years)
else: # 80 %
deltaDays = random.randint(18250, 36500) # 80 % - death at age 18250..36500 days (50-100 years)
birthdate = ''
# Check of column type for the date of birth column (if it is a string column, the conversion to the DATE format should be performed).
if type(bd).__name__ == 'str':
birthdate = DateTime.ParseExact(bd,bdFormat, CultureInfo.InvariantCulture)
else: # else it's supposed to be DateTime
birthdate = bd
# Addition of lifespan (in days).
deathDate = birthdate.AddDays(deltaDays);
# If the date of death is bigger than the current date, the random quantity of days should be subtracted from the current date.
if deathDate >= DateTime.Now:
deathDate = DateTime.Now.AddDays(-random.randint(1, 300));
if deathDate < birthdate:
deathDate = birthdate
return deathDate.Date
</Expression>
<AllowNulls>false</AllowNulls>
</Options>
</Generator>
</Generators>
The script generates DUNS numbers using the Luhn algorithm.
<?xml version="1.0" encoding="utf-8"?>
<Generators xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Generator>
<Category>Business</Category>
<Name>Data Universal Numbering System Number (DUNS Number)</Name>
<ShortName>DUNS Number</ShortName>
<Description>481795235, 796124832, 213794852, 873491252, ...</Description>
<Type>SqlPythonGenerator</Type>
<ColumnTypes>
<Type>varchar</Type>
<Type>char</Type>
<Type>nvarchar</Type>
<Type>nchar</Type>
<Type>bigint</Type>
<Type>int</Type>
</ColumnTypes>
<Matches>
<Match table=".*" column="^DUNS$" priority="100" minlen="9" maxlen="10" />
<Match table=".*" column="DUNS.*code" priority="60" minlen="9" maxlen="10" />
</Matches>
<Options>
<Expression># This script generates DUNS numbers using the Luhn algorithm
import random
random.seed(config["seed"])
# Calculate checksum by mod 10
def luhn_checksum(input_number):
def digits_of(n):
return [int(d) for d in str(n)]
digits = digits_of(input_number)
odd_digits = digits[-1::-2]
even_digits = digits[-2::-2]
checksum = 0
checksum += sum(odd_digits)
for d in even_digits:
checksum += sum(digits_of(d*2))
return checksum % 10
def calculate_luhn(partial_input_number):
check_digit = luhn_checksum(int(partial_input_number) * 10)
return check_digit if check_digit == 0 else 10 - check_digit
def GetDigit():
value = random.sample(range(1, 10), 8)
return ''.join(str(x) for x in value)
def main():
i = 0
while i <= config["n_rows"]:
i = i + 1
d = GetDigit()
yield str(d) + str(calculate_luhn(d))
main()</Expression>
</Options>
</Generator>
<Generator>
<Category>Business</Category>
<Name>Data Universal Numbering System Number (DUNS Number)</Name>
<ShortName>DUNS Number</ShortName>
<Description>48-179-5235, 79-612-4832, 21-379-4852, 87-349-1252, ...</Description>
<Type>SqlPythonGenerator</Type>
<ColumnTypes>
<Type>varchar</Type>
<Type>char</Type>
<Type>nvarchar</Type>
<Type>nchar</Type>
<Type>text</Type>
<Type>ntext</Type>
<Type>varchar(max)</Type>
<Type>nvarchar(max)</Type>
</ColumnTypes>
<Matches>
<Match table=".*" column="^DUNS$" priority="100" minlen="11" />
<Match table=".*" column="DUNS.*code" priority="60" minlen="11" />
</Matches>
<Options>
<Expression># This script generates DUNS numbers using the Luhn algorithm
import random
random.seed(config["seed"])
def luhn_checksum(input_number):
def digits_of(n):
return [int(d) for d in str(n)]
digits = digits_of(input_number)
odd_digits = digits[-1::-2]
even_digits = digits[-2::-2]
checksum = 0
checksum += sum(odd_digits)
for d in even_digits:
checksum += sum(digits_of(d*2))
return checksum % 10
def calculate_luhn(partial_input_number):
check_digit = luhn_checksum(int(partial_input_number) * 10)
return check_digit if check_digit == 0 else 10 - check_digit
def GetDigit():
value = random.sample(range(1, 10), 8)
return ''.join(str(x) for x in value)
def formatOutPut(value):
s = '{0}{1}-{2}{3}{4}-{5}{6}{7}{8}'
return s.format(*list(value))
def main():
i = 0
while i <= config["n_rows"]:
i = i + 1
d = GetDigit()
yield formatOutPut(str(d) + str(calculate_luhn(d)))
main()</Expression>
</Options>
</Generator>
</Generators>
The script retrieves a Gravatar image url based on an email address.
<?xml version="1.0" encoding="utf-8"?>
<Generators xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Generator>
<Category>Personal</Category>
<Name>URL (Gravatar image)</Name>
<ShortName>URL (Gravatar img)</ShortName>
<Description>http://www.gravatar.com/avatar/6ffc45896e63ba040e769b8b9d150107?s=40&d=http%3A%2F%2Fwww.example.com%2Fdefault.jpg ;...</Description>
<Type>SqlPythonGenerator</Type>
<ColumnTypes>
<Type>varchar</Type>
<Type>char</Type>
<Type>nvarchar</Type>
<Type>nchar</Type>
<Type>text</Type>
<Type>ntext</Type>
<Type>varchar(max)</Type>
<Type>nvarchar(max)</Type>
</ColumnTypes>
<Matches>
<Match table=".*" column="^gravatar$" priority="100" minlen="150" />
</Matches>
<Options>
<Expression># This script retrieves a Gravatar image url based on an email address
import urllib, hashlib
# Set your variables here
email = "[email protected]"
# email = email_column_name
default = "http://www.example.com/default.jpg"
size = 40
# construct the url
gravatar_url = "http://www.gravatar.com/avatar/" + hashlib.md5(email.lower()).hexdigest() + "?"
gravatar_url += urllib.urlencode({'d':default, 's':str(size)})
gravatar_url</Expression>
</Options>
</Generator>
</Generators>
The script generates primary account numbers that have 19 digits long.
<?xml version="1.0" encoding="utf-8"?>
<Generators xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Generator>
<Category>Personal</Category>
<Name>Integrated Circuit Card ID (ICCID)</Name>
<ShortName>ICCID</ShortName>
<Description>8938605388545996967, 8937103007991618431, 8935602817534397104, ...</Description>
<Type>SqlPythonGenerator</Type>
<ColumnTypes>
<Type>varchar</Type>
<Type>char</Type>
<Type>nvarchar</Type>
<Type>nchar</Type>
<Type>bigint</Type>
<Type>decimal</Type>
</ColumnTypes>
<Matches>
<Match table=".*" column="^ICCID$" priority="105" minlen="19" />
</Matches>
<Options>
<Expression>import random
import csv
random.seed(config["seed"])
# Calculates luhn checksum
def luhn_checksum(input_number):
def digits_of(n):
return [int(d) for d in str(n)]
digits = digits_of(input_number)
odd_digits = digits[-1::-2]
even_digits = digits[-2::-2]
checksum = 0
checksum += sum(odd_digits)
for d in even_digits:
checksum += sum(digits_of(d*2))
return checksum % 10
def GetDigit(codes):
# Major industry identifier
MII = '89'
# Country code
CC = codes[random.randint(1, len(codes)-1)]
# Issuer identifier
randoms=[]
for i in range (4- len(CC)):
randoms.append(0)
II = ''.join(str(x) for x in randoms) + str(random.randint(1,5))
value = MII + CC + II
# Individual account identification
randoms=[]
for i in range (18- len(value)):
randoms.append(random.randint(0,9))
IAI = ''.join(str(x) for x in randoms)
value = value + IAI
return value
# Getting list of telephone country codes from file CountryCodes.csv
def countryCodes():
var_columnName = 'country_dial'
var_fileName = config["config_path"] +'\\'+ r"CountryCodes.csv"
with open(var_fileName,"rb") as csvfile:
var_reader = csv.DictReader(csvfile, delimiter=';', quotechar='"')
for row in var_reader:
yield str(row[var_columnName]).upper()
def main():
random.seed(config["seed"])
codes = list(countryCodes())
i = 0
while i <= config["n_rows"]:
i = i + 1
d = GetDigit(codes)
yield d + str(luhn_checksum(d))
main()
</Expression>
<AllowNulls>false</AllowNulls>
</Options>
</Generator>
</Generators>
The script reads data from a csv file, converts it to uppercase, and shuffles data in the list.
<?xml version="1.0" encoding="utf-8"?>
<Generators xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Generator>
<Category>Product</Category>
<Name>Language Code (ISO 639-1 alpha-2, UpperCase)</Name>
<ShortName>Language Code 2 (u)</ShortName>
<Description>ES, FR, ZH, UK, TH, ZU, EN, JA, NL, ...</Description>
<Type>SqlPythonGenerator</Type>
<ColumnTypes>
<Type>varchar</Type>
<Type>char</Type>
<Type>nvarchar</Type>
<Type>nchar</Type>
<Type>text</Type>
<Type>ntext</Type>
<Type>varchar(max)</Type>
<Type>nvarchar(max)</Type>
</ColumnTypes>
<Matches>
<Match table="Language" column="Lang.*id$" priority="21" minlen="2" />
</Matches>
<Options>
<Expression># This script reads data from a csv file, converts it to uppercase, and shuffles data in the list
import csv
import random
random.seed(config["seed"])
var_fileName = config["config_path"] +'\\'+ r"LanguageCodes.csv"
var_columnName = 'ISO639-1-Alpha-2'
def values():
with open(var_fileName,"rb") as csvfile:
var_reader = csv.DictReader(csvfile, delimiter=';', quotechar='"')
for row in var_reader:
yield str(row[var_columnName]).upper()
var_lst = list(values())
random.shuffle(var_lst)
var_lst</Expression>
</Options>
</Generator>
</Generators>
The script reads data from a csv file, converts it to uppercase, and shuffles data in the list.
<?xml version="1.0" encoding="utf-8"?>
<Generators xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Generator>
<Category>Product</Category>
<Name>Language Code (ISO 639-1 alpha-3, UpperCase)</Name>
<ShortName>Language Code 3 (u)</ShortName>
<Description>SPA, FRE, CHI, UKR, THA, ZUL, ENG, JPN, DUT, ...</Description>
<Type>SqlPythonGenerator</Type>
<ColumnTypes>
<Type>varchar</Type>
<Type>char</Type>
<Type>nvarchar</Type>
<Type>nchar</Type>
<Type>text</Type>
<Type>ntext</Type>
<Type>varchar(max)</Type>
<Type>nvarchar(max)</Type>
</ColumnTypes>
<Matches>
<Match table="Language" column="Lang.*id$" priority="31" minlen="3" />
</Matches>
<Options>
<Expression># This script reads data from a csv file, converts it to uppercase, and shuffles data in the list
import csv
import random
random.seed(config["seed"])
var_fileName = config["config_path"] +'\\'+ r"LanguageCodes.csv"
var_columnName = 'ISO639-1-Alpha-3b'
def values():
with open(var_fileName,"rb") as csvfile:
var_reader = csv.DictReader(csvfile, delimiter=';', quotechar='"')
for row in var_reader:
yield str(row[var_columnName]).upper()
var_lst = list(values())
random.shuffle(var_lst)
var_lst</Expression>
</Options>
</Generator>
</Generators>
The script dynamically generates placeholder images.
<?xml version="1.0" encoding="utf-8"?>
<Generators xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Generator>
<Category>IT</Category>
<Name>Lorem Pixel</Name>
<ShortName>Lorem Pixel</ShortName>
<Description>dynamically generates placeholder images (!time-consuming!)</Description>
<Type>SqlPythonGenerator</Type>
<ColumnTypes>
<Type>varbinary(max)</Type>
<Type>image</Type>
</ColumnTypes>
<Matches>
</Matches>
<Options>
<Expression>import clr
clr.AddReference("System.Net")
from System.Net import WebClient
import clr
clr.AddReference("System.IO")
from System.IO import MemoryStream, Stream
import clr
clr.AddReference("System.Drawing")
from System.Drawing import Image
clr.AddReference("System")
from System import Random
random = Random()
categories = ['abstract', 'animals', 'business', 'cats', 'city', 'food', 'nightlife' , 'fashion', 'people', 'nature', 'sports', 'technics', 'transport']
def getUrl():
varCategoryIndex = random.Next(len(categories))
varCategoryName =categories[varCategoryIndex]
varUrlTemplate = 'http://lorempixel.com/100/100/{0}/{1}'
varUrl = varUrlTemplate.format(varCategoryName,varCategoryName)
return varUrl
def main():
i = 0
while i <= config["n_rows"]:
webClient = WebClient()
dataImage = webClient.DownloadData(getUrl())
stream = MemoryStream(dataImage)
webClient.Dispose()
yield Image.FromStream(stream)
stream.Dispose()
main()</Expression>
</Options>
</Generator>
</Generators>
The script generates MD5 hash based on values in some string or a table column.
<?xml version="1.0" encoding="utf-8"?>
<Generators xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Generator>
<Category>IT</Category>
<Name>MD5 Hash</Name>
<ShortName>MD5 Hash</ShortName>
<Description>605669cab962bf944d99ce89cf9e58d9, ee3c6e8ed9c27c45f161ea416c997df8, 8625e1de7be14c39b1d14dc03d822497, ...</Description>
<Type>SqlPythonGenerator</Type>
<ColumnTypes>
<Type>varchar</Type>
<Type>char</Type>
<Type>nvarchar</Type>
<Type>nchar</Type>
<Type>text</Type>
<Type>ntext</Type>
<Type>varchar(max)</Type>
<Type>nvarchar(max)</Type>
</ColumnTypes>
<Matches>
</Matches>
<Options>
<Expression># This script generates MD5 hash based on values in some string or a table column
import hashlib
def HashIt():
# You can also use 'sha1', 'sha256', 'sha384', 'sha512'
m = hashlib.md5()
# You can use a column_name instead of 'Some Value'
m.update('Some Value')
return m.hexdigest()
HashIt()</Expression>
</Options>
</Generator>
</Generators>
For detailed instructions on how to generate related data elements, refer to our blog post.