Definitions
Data quality refers to the state of qualitative or quantitative pieces of information. Data is generally considered high quality if it is “fit for [its] intended uses in operations, decision making and planning”. Data is deemed of high quality if it correctly represents the real-world construct to which it refers.
The degree to which a set of inherent characteristics of data fulfills requirements. Data quality characteristics will be of varying importance and priority to different stakeholders.
Data quality is the capability of data to satisfy stated business, system, and technical requirements of an enterprise. Data is considered high quality if it is fit for its intended uses in operations, decision-making, planning, and management.
Common Dimensions of Data Quality
Data quality is typically characterized by multiple dimensions:
- Accuracy: The degree to which data correctly reflects the real-world entity or event it represents
- Completeness: The degree to which all required data is present
- Consistency: The degree to which data is free from contradictions and coherent with other data
- Timeliness: The degree to which data is up-to-date and available when required
- Validity: The degree to which data conforms to defined business rules or constraints
- Uniqueness: The degree to which data is free from redundancy at all required levels
- Integrity: The degree to which data maintains referential integrity and relationships
Relationship to Other Qualities
Data quality is foundational to many other system qualities:
- It directly impacts reliability as systems cannot function correctly with poor quality data
- It is essential for usability as users need trustworthy data to make decisions
- It affects security as data integrity is a key security concern
- It influences maintainability as poor data quality increases maintenance costs
- It is critical for functional suitability as functions depend on correct data