LangGrant Windocks provides Windocks user guide for enterprise database teams, enabling fast database cloning and virtualization on standard infrastructure.
Introduction
Windocks Free and Standard editions deliver cross platform database subsetting, synthetic data generation, and database movement or migration. Get started with synthetic data, subsets, data move
Windocks Enterprise edition adds Windows SQL Server containers and database cloning (or virtualization), for delivery of complete writable database environments. These can include subsets, synthetic data, and database migration.
Get Started with deployment and installation
Articles on Synthetic data, Subsets and data move
Useful articles and sample code for synthetic data, subsets, copying data
Includes articles such as creating subsets or synthetic data yourself. Python samples for Synthetic Data Vault (SDV), bulk inserting into Snowflake, moving data from Azure to SQL Server instances, and more are available.
Additional resources
Best Practices
Troubleshooting
Configuration for large databases
Choosing the number of tables to process in parallel, and other parameters.
Release Notes
Docker containers
Editions and license limits
SQL Server to Snowflake microservice
Get Started
Subsetting, Synthetic data, and moving databases
Windocks is available for Windows, Linux, and Docker containers. It does not write to the source database but requires exclusive use of the source (no writes while being used by Windocks).
Subsetting is based on primary/foreign keys set in the source database. It handles circular dependencies and composite keys. Schema changes are detected between Transforms to allow database configurations to be updated for masking or synthetic data.
Synthetic data is based on Windocks fast synthetic engine along with SDV open source libraries. It is applied to a database subset. Support is available for customer provided Python libraries. Automated PII detection is not included in the initial release.
Windocks automates data type mapping between platforms and creates the target schema. Throughput reaches up to 20 million rows per minute. Incremental change capture is not included in the initial release. Repeated migration transform runs overwrite a previously written target database.
Choosing a Windocks deployment
Windocks involves a source and target database, and the Windocks service. All can reside on a single machine, on two, or on three separate machines. Working with larger databases (100 GB or more) benefits from a 3 machine architecture to maximize CPU and RAM for data operations.
Minimum resources for Windocks is 4 vCPU cores and 8 GB of RAM. For large databases we recommend 8 or more vCPU cores and 16 GB or more RAM. A detailed discussion on configuration for resource use is detailed in the large database configuration topic.
Installation
Windows
Download the installer zip file synth.zip and extract say to D:\synth
cd D:\synth
.\windockssyntheticinstaller.exe
Windocks is run as a service
Start using at http://localhost:5200 or http://<DnsOfServer>:5200
Change the port in windockssynthetic/config.json if you need
Set your administrator account name and password at
http://localhost:5200/firsttime
Linux
Recommended to use docker but also runs natively on Linux
Download the installer zip file synth.zip and extract
apt-get unzip
unzip the zip installer to a directory say ~/synth
cd ~/synth
./windockssyntheticinstaller
cd ../windockssynthetic
chmod 700 ./subsetsynthetic
# Run the application
./subsetsynthetic
# For background
nohup ./subsetsynthetic > outputlogfile &
Start using at http://<localhost:5200 or http://<DnsOfServer>:5200
Change the port in windockssynthetic/config.json if you need
Set your administrator account name and password at
http://localhost:5200/firsttime
Docker
Image is available on dockerhub
docker run –p 5200:5200 –d windocks/move_subset_synthetic_data
Start using at http://<localhost:5200 or http://<DnsOfServer>:5200
Set your administrator account name and password at
http://localhost:5200/firsttime
This release does not support environment variable for config variables. You have to docker exec into the container and see config/config.json
Kubernetes
In deployment yaml file, use the image windocks/move_subset_synthetic_data
After you deploy and run the service, set your administrator account name and password at
http://localhost:5200/firsttime
Uninstall steps
Windows
Open a command prompt as administrator
sc delete WindocksSubsetSynthetic
Delete the windockssynthetic and windocksdonotdelete directory
Linux
End the application
Ps –ef | grep subsetsynthetic
kill -9 <pid>
rm -rf windockssynthetic
Configuration
Check your configuration in config/config.json, you can just use the defaults to begin with.
For system resources, we recommend a minimum of 12GB RAM with 2 cores. We also recommend running the Windocks process on a machine different from the source database and target database instances. You can run them all on a single laptop if you wish.
If you are moving large databases refer to our system configuration guide online (under Resources). Consult it for multi-threading settings for parallel processing.
Restart the WindocksSubsetSynthetic service on Windows or the subsetsynthetic process / container on Linux after config changes.
Email support@windocks.com if you have any issues configuring
Usage
Windocks can move, subset, or generate synthetic data from these sources: SQL Server, Snowflake, Postgre, MySql, AzureSQL, Azure Managed Instance, RDS / Aurora. It can write to these targets: SQL Server, Snowflake, Postgre, MySql, AzureSQL, Azure Managed Instance, Amazon RDS / Aurora. You may use a different database type for the target. For example, customers use Windocks to move data from SQL Server to Snowflake and back, Azure to SQL Server instances, and SQL Server to Postgre.
- Start by creating a Connection, test it and save it
- You may create one or more data sources for a connection. Define a data source for your connection and save it. To generate synthetic data configure the details of what type of sensitive data you need.
- Decide if you would like to target the same or a different connection for copy/subset/synthetic. (Note that Snowflake for example is a single database connection and if that is your source, then you must define a different target connection. A SQL Server instance connection for example, can be used for both source and target)
- If targeting a different connection, create another connection, test it and save it
- Find your source connection, its data sources, and then create a transform for your data source. Specify if you want a subset (percentage of your source), full copy or generate synthetic data based on the source database.
- Run the transform. A 1TB database takes approximately 3-4 hours to copy completely. The same 1 TB database takes 2 to 3 minutes for a small subset. A 30 TB database (Snowflake Sample) takes 1 hour for a very small representative subset (0.0000001%) to be moved to SQL Server. Subsets and synthetic data have the same data distribution as the source and can be an easily generated representative database for testing.
- The report of the transform run is available in the transform details view. Select “More” in the list of transforms. You may also define WHERE clauses and primary key values to define your subset more narrowly. Data distributions are available from the Transform details view. config/config.json has details on the maximum row count for which distributions are generated. You can modify and restart the Windocks process / service.
- If using docker / kubernetes, the details of your connections and transform configurations will be written to a database in the container. If you delete the container, that data will be lost. The enterprise edition supports persistent storage on a volume you designate. Contact support@windocks.com
Security, permissions
Windocks provides its own authentication to use the Windocks service. You may choose to require https only (see config/config.json).
Source databases require read only permissions, target databases require create / write
Database authentication can be done at the operating system level. Alternatively you may save passwords on the server running Windocks. The encryption of these saved passwords is done by Microsoft middleware.
You may generate synthetic data for all your sensitive columns. This ensures your target databases do not have any sensitive data.
Transforms across different database types
While transforming a database from one type of system to another (such as SQL Server to Snowflake), Windocks takes care of mapping types and sizes for you automatically. However, there are a few things for you to check before running a transform like that:
- Size maximums of the values in column – Character and binary columns have maximum sizes that are different in different database platforms.
- Column name size maximums
- Row size maximums
If your target server cannot support the size of your source, then you have to truncate in your source.
Additional topics
Telemetry
Windocks collects the following data to enable customer technical support and for product and performance improvements. No data is shared with third parties.
- User identity
- User email
- Browser type
- Database schema (tables and relationships, without data)
- Errors
- Database type
- Features used
- Application version
- Edition
- Language
- Browser used to access the application
- Operating system
Windocks uses this telemetry data to:
- Provide product and technical support
- Fix errors in the software
- Improve the product
Windocks does not share this data with anyone outside of Windocks
Configuration for large databases
Windocks is a configurable asynchronous, multi-threaded service. The Configuration Guide (link below) provides guidance for the number of tables to process in parallel. It also covers the number of batches to run in parallel on these tables and the batch sizing.
Changes in the Windocks configuration are made in \Windockssynthetic\config\config.json. Restart the Windocks service following changes to the config.
Encryption
Managing encryption policies for lower level data environments is important. Windocks provides optional automated TDE encryption for SQL Server Transforms. Be sure to address encryption needs for other data types.
Licensing
Windocks free and standard editions are licensed according to the sum of source databases.
Best Practices
Start small for subsets and synthetic data. Initially target subsets of 100 MB or less and expand the size as needed.
When working with Subset Table filters and Where clauses extraneous data can be minimized. Select a minimum percentage (0.0000001).
For migrating large databases to Snowflake use Medium or larger data warehouse.
When running the Windocks service on the source database instance, it is important to limit RAM consumption by the source instance. A good rule of thumb is to divide RAM evenly between the source instance and the RAM available to the Windocks service.
Following a Transform review the post-transform report. Access it via the “More” button on the Transform tab. The report will include any of the following errors:
- Tables with rows missing in the target
- Tables where primary keys were not created
- Tables where foreign keys were not created
Release Notes
Subsetting
Circular dependencies, composite keys, and self referencing tables are handled automatically.
Subsetting is based on Primary and Foreign keys as set in the database.
Subsets of encrypted databases are delivered in unencrypted form. Exception: SQL Server where automated TDE encryption is available in the Transform.
History and Temporal tables are ignored.
External references and linked servers are ignored.
Indexes, stored procedures, views, and other objects are not included in the subset.
SQL Server Filestream data type is not supported.
Synthetic data
Automated PII data detection is not included in the initial release.
The fast Windocks synthetic data engine is configured by default. It is capable of populating databases of GBs in size. SDV and other Python engines are recommended for up to 3-5 tables of 50,000 rows or less.
Database Mover
It is designed to move schemas and data to the same database platform. You can move between any of the Windocks supported platforms (SQL Server, Postgre, MySQL, Snowflake, Aurora, RDS, Azure Managed Instance, and Azure SQL).
Primary keys in a source database may not map to the desired database target. If a source primary key does not have an equivalent in the target database, then the target database key is set as an nvarchar. As a result the system may not be able to create a corresponding primary key. Any errors are reported in the post-transform report accessed via the “more” button.
Column name and row length limits. Before performing a transform ensure that column name lengths in the source database are less than or equal to the max column name in target database. For example, PostGre max column name length is 59 which is less than SQL Server or Snowflake
SQL Server maximum column length is less than Snowflake.
MySQL max row length is 64K bytes. If your target is MySQL and source is not MySQL, make sure that the source has no rows in any table that exceeds the max row length of MySQL
Data types are mapped identically between source and target databases of the same type. If they are different, then the target data type will be set to the closest possible.
Column lengths for varchar and binary types are handled based on the supported lengths available in the target. Snowflake, for example, has large max varchar values. When moving data from snowflake to postgre, you must ensure that the actual values will fit in the postgre target.
Database moves include the same limitations as listed above in Subsetting. Schema and data only. No Filestream support. No TDE support initially.
MySql
In the target MySQL instance, please run
set global local_infile=true;
Troubleshooting
Error reporting: errors are summarized in post-transform report via the “more” button. The following errors are reported. The lack of these errors indicates a successful Transform.
- Tables with rows missing from the target
- Tables with Primary keys that were not successfully created
- Tables with foreign keys that were not successfully created
Logs: are available at \Windockssynthetic\bin\windocks
Insufficient memory: configurations that exceed system resources are reported in the post-Transform report. Review the configuration and reduce the number of tables in parallel. Also reduce the number of parallel batches per table and batch size.
More on Docker containers
Encrypted connections and records of transforms are stored in the container. Deleting the container will result in these details being lost.
SQL Server to Snowflake Microservice
A microservice is supported in Standard and Enterprise editions. It ensures high performance and reliable data movement. Contact support@windocks.com if this is needed.