5 Steps to Improve Your Data Quality

In the age of big data, the quantity of data being generated each year is increasing exponentially. The opportunity to use data to transform how organizations operate and create value for the customers is growing at a similar rate.
The success of any industry lies in developing a proper infrastructure that can analyze and extract valuable details from vast amounts of data. Many decisions are made based on the extracted data and these decisions play a crucial role in planning out important strategies. The aspect of data quality becomes essential in such cases. If the data quality is poor, then the decisions made on this data will be affected and this, in turn, can change strategies entirely. In healthcare, poor data quality is significant as a decision sometimes may equal a life.
In several analytics projects, a major portion of the effort goes into correcting/compensating for poor/bad data. With good vision and early focus on data quality, organizations can overcome these hurdles and focus on creating better insights. In this article, we will discuss the five steps an organization should adopt to ensure good data quality – data that is appropriate, accurate, and complete.
The following fundamental principles will help organizations harness the power of data without getting lost in the data maze –
1. Analytics as a Core Business Strategy
Data analytics must be identified as one of the core strategies for the organizations. The senior management team must be able to clearly articulate the purpose of collecting, aggregating, and analyzing the data and then translate it into action. Data-driven organizations start with identifying the strategic priorities for the organization and then ask precise questions.
2. Create and Map a Data Dictionary
According to IBM Dictionary of Computing, the data dictionary or metadata repository is defined as a “centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format”. Improving data quality involves identifying which data needs to be collected and monitored. Having an accurate data dictionary can help understand data elements, find information, efficient use and reuse of information; and promote better data management.
3. Data Profiling
Data profiling is a process of examining the available data from the data source and collecting statistics about the data. This process helps us understand the anomalies, completeness, correctness, uniqueness, consistency, and reasonability in the data. Different data mining tools can be utilized to gain a good understanding of the data quality in this process.
4. Data Cleansing
Once the data profiling process is completed and a good baseline understanding is established on the existing data quality, the next step in this process is cleansing the data. This process is very time consuming and labor intensive, and cleansing ALL the data is sometimes not practical nor cost-effective. On the flip side, cleansing none of this data is undesirable. Therefore, based on the targeted Key Performance Indicators (KPIs), the critical data elements are identified and focused cleansing process needs to be applied. With a combination of extract/transform/load (ETL) tools, lookup tables, and scripts, this process could be achieved.
“Data cleansing is not a step, it’s a journey!”
5. Data Governance
The next step in the process is to prevent any “dirty data” being entered into the system. Using ETL tools and programming, known “dirty data” can be handled and corrected. However, there is a possibility that a new kind of data defect/anomaly could be entered. Therefore, a data governance team should be established to not only monitor the data quality, maintain standards, but also to facilitate the understanding of data and to encourage the use of this valuable information. KPIs need to be established to measure the data quality on an ongoing basis.
“Ongoing Data Quality Management is a key enabler for high performance”
Final Thoughts –
Whether you are collecting a time study data to understand wait times in your clinic or maintaining millions of patient records in a complex database, data quality is an essential step in your analytics journey. Making better decisions requires the right information and facts which are derived from good data. Data quality assessment and management should be considered as a strategic priority by the organizations.
About Authors:
Jaya Shankar Parimi, M.Tech, MBA – A technology strategist with experience in healthcare, insurance, and manufacturing sectors digitalizing data-driven business processes.
Vineeth Yeddula, CLSSMBB, PMP, CMQ/OE – Co-Founder, KPI Ninja. An entrepreneur and engineer by training with significant healthcare data analytics and performance improvement experience in multiple healthcare settings.