Data Quality – Performance for Large Volume Datasets

How to Guide

Unmatched Performance
How we achieved

  • DQ rules are push down. We are not sucking the data out of warehouse for validating rules.
  • DQ rules are specially fine tune for snowflake
  • All the DQ rules execute natively on snowflake.
  • DQ rules are execute in parallel depending on the size of the hardware

Data sample

Item Description
Profiling 16 columns
Data Quality rules 18 rules

Execution Timings

Row count Size Compute Warehouse Time taken
6 M 160MB Snowflake x-small 20 secs
60 M 1.6GB Snowflake medium 1 min
600 M 16GB Snowflake large 4 mins
Menu