Data Quality

唯一性 Uniqueness
Uniqueness ensures there are no duplications or overlapping of values across all data sets.

完整性 Completeness
Completeness is a measure of the data’s ability to effectively deliver all the required values that are available.

一致性 Consistency
Data consistency refers to the uniformity of data as it moves across networks and applications. The same data values stored in difference locations should not conflict with one another.

準確性 Accuracy
The data should reflect actual, real-world scenarios; the measure of accuracy can be confirmed with a verifiable source.

合理性(有效性) Validity
Data should be collected according to defined business rules and parameters, and should conform to the right format and fall within the right range.

即時性 Timeliness
Timely data is data that is available when it is required. Data may be updated in real time to ensure that it is readily available and accessible.

STEP1. 事前規範

STEP2. 事中清洗

STEP3. 事後監控

好的數據可以節省試錯的成本,對於質量很差的數據集,我們沒有必要花費太多的精力、或是我們可以對開發時間與工作階段做更準確的預估。可以縮短數據反饋流程,更及時的將過程中存在的問題反饋給上下游,提高協作效率。最後,可以避免錯誤的分析結論,如果我們能夠及時發現數據中存在的錯誤和失真,就能夠避免因為數據本身的問題而讓我們得出錯誤的結論。

——

Data quality dimensions violations (Watson Knowledge Catalog)

https://www.ibm.com/docs/en/cloud-paks/cp-data/4.5.x?topic=ar-data-quality-violations