uArrow DQ – Data Quality Dimensions

How to Guide

uArrow Data quality helps customers to unlock the value and confidence of the data.

According to Gartner, Poor data quality is also hitting organizations where it hurts – to the tune of $15 million as the average annual financial cost in 2017. And as per IBM extracting business value from data, in the US alone, businesses lose $3.1 trillion annually due to poor data quality.

The Enterprise value is measured by the quality, performance and accuracy of its data. But the cost we pay to data quality is always undermined. Data Quality also leads to reputational and loss of opportunity which is intangible in this cloud world. It is noted in HBR , bad data cost 3 trillion per year in US alone!

Data Quality historically needs huge amount of investment, time and tools. uArrow Data Quality comes as niche player in Data Quality tools space to help define the business rules, measure , monitor and take actions to unlock the intrinsic value of the data.

And uArrow DQ comes with push down approach where our rule engine pushes the checks to the data source rather than the data moves to 3rd party cloud. by this way we don’t need to move the huge data across the network or cloud and enter the labyrinth of security hassles.

There are multiple lens to see and measure the data quality. One of the most common and industry practice is to measure the Data Quality score by Data Quality Dimensions.

Data Quality Dimensions

uArrow Data quality product has a wide range of checks across the various Data qaulity dimensions.

1. Completeness check.

This check ensure there are no gaps in the data from what was supposed to be collected and what was actually collected

some of the checks uArrow supports

  • Missing values in a column
  • Missing values in a table
  • Missing values values in a Json
  • Missing values w.r.t. a reference population

2. Consistency check.

This check ensure the data are consistent if they respect a set of constraints

some of the checks uArrow supports

  • Comparison of two fields

a) by relation operator (<,>,==,!=)
b) Infer relationship
c) mutually exclusive
d) based on set of values

  • Aggregation of a field in a table linked to a master table with a field in the master table.

3. Uniqness check.

This check ensure there are no duplicates in the data was received.

some of the checks uArrow supports

  • Duplicates on a single column
  • Duplicates on N number of columns

4. Validity check.

This check ensure, does information is in a specific format, does it follow the defined business rules, or is it in usable format?

some of the checks uArrow supports

  • data type check
  • range check
  • mandatory check
  • List of values check

5. Accuracy Check.

This check refer to the level to which data describes the real world scenario.

some of the checks uArrow supports

  • precision check, for instance Singapore phone number starts with +65
  • The Age of the person cannot be > 200 or <0
  • The length of the order number must be 10 digits only

6. Integrity Check.

Over the course of journey the data might have transformed by many downstream system. The Integrity check indicates that the attributes are maintained correctly, even as data gets stored, processed and used in diverse systems.

some of the checks uArrow supports

  • Find an external key in a single field of another table.
  • Find an external key in another table using a combination of n fields.
  • Find an external key in a flat file