Data Quality: The Most Important Factor in Your Data Science Success

Samuel Odifa
6 min readAug 7, 2023

--

Photo by Carlos Muza on Unsplash

Last year, I messed up on a huge project. Here’s the gist.

My job was to analyze the Q1-Q3 sales and requisition data for a growing pharmacy store. The manager needed insights to boost sales and strategize for the upcoming quarter.

While I did an “Okay” job with the dashboards, my initial insights were terrible.

What do you think led to the mistakes?

Please take a wild guess before you continue.

I’m still really ashamed to say this, but here’s what I later discovered:

The quality of the data I was working with sucked!

It had 2 major problems, it was incomplete and inconsistent, even though I had “cleaned” it.

I was only able to figure this out because I have a background in pharmacy and some knowledge of how the business works.

Here’s the thing: the quality of the data you’re working on is super important. For your work to be useful and worthwhile, good-quality data is NON-NEGOTIABLE.

In this article, I’ll walk you through what data quality is, why it is important, how to ensure data quality, why data cleaning isn’t all there is to ensure data quality, and where you can learn a whole lot more about data quality.

So, what is data quality?

Okay, let’s start with the basics. Data quality is akin to having super clean and accurate data. In many cases, it might mean having every single piece of your “data puzzle” accounted for. Imagine having a treasure map that contains mistakes or missing parts; you might end up searching in the wrong place, right? That’s why you need to ensure that every piece of data you work with is correct, complete, and trustworthy. Good data quality means discovering the real treasure of insights hidden in the data!

Photo by Stephen Dawson on Unsplash

Why is Data Quality Super Important?

Here’s the deal: Your job as a data professional is to use data to help businesses and organizations make smart choices. But if the data you use is not top-notch, it can lead to big problems, for the company and for you too.
Imagine giving your boss the wrong information because of bad data — yikes! That could mean wrong decisions, wasted money, and unhappy customers.
Trust me when I say “You do not want that”

How do you ensure data quality, anyway?

You can’t clean data “just because” and think that’s all there is to ensure its highest quality. You must also make sure that your data exhibits some other characteristics.

REMEMBER: “Just cleaning your data won’t help you.”

Although I have to admit that cleaning your data well will significantly help you, there are 4 other things to ensure that you have data of super quality.

Data Accuracy

I want you to always remember that you are working with real data, which means that your insight and analysis might have real-life consequences. Even if it is for a portfolio project that might not have any real-life implications, you would benefit more from working with data that has a respectable level of quality. Think of it this way; creating valuable insights relies heavily on having accurate data.

Data Completeness

Remember the incident I had last year? It taught me about the importance of checking for the completeness of a dataset. Think of it this way; complete data means no missing puzzle pieces. As a data professional, you will always encounter incomplete data, and this can be due to retrieving it from multiple sources or encountering errors in the collection pipeline. Whatever the reason, make sure your data is complete before proceeding with your analysis or visualization.

Data Reliability

There are times when you have to work with data you didn’t generate, or the client didn’t generate. Whatever the situation is, always ensure that the data you are using is one that you can absolutely trust. Just like you wouldn’t trust medical advice from a doctor you don’t have faith in, make sure the data you work with is highly reliable. It’s better to not have a data-driven decision than to have one that is informed by an unreliable dataset.

Data Currency

Another major way to ensure that your data is of the highest quality is to know how recent the data set is. I mean, it wouldn’t be wise of you to inform a 2024 decision with a data set from the 90s, is it? No it isn’t.

3 Practical Ways to Ensure Data Quality

Since that incident last year, I have adopted three practical ways to ensure that I do not mess up another analysis ever again. These are the three practices I adopt consistently to ensure data quality.

Data Cleaning

First off, if you’ve been doing this for a while, you are familiar with the term “cleaning.” If you aren’t familiar with the term, it basically involves checking for and removing any mistakes or duplicates in the data. It’s like giving the data a nice bath! To ensure your data is squeaky clean, use all the tools at your disposal, such as spreadsheet packages, SQL, Pandas, etc. However, you must also learn how to use these tools properly. Don’t worry; I have resources that you can use to learn more about this further down.

Double-Checking

Just like superheroes double-check their gear before saving the day (I assume they do), you must verify the data from your sources to be sure it’s accurate, consistent, relevant, and complete.

Team Collaboration

Teamwork makes the dream work!

Work with your teammates, and make sure you get an extra pair of eyes on your analysis. Asking for advice, especially when you are just starting out, is crucial. Trust me; you will really need it. I had to learn this the hard way, but helping each other ensures data quality is at its best.

By following these three practices — data cleaning, double-checking, and team collaboration — you can significantly enhance the quality of your data analysis and ensure more accurate and reliable insights.

Resources

There are a number of books, articles, courses, and even YouTube videos that can help you learn more about data quality and practical approaches to ensuring it in your day-to-day as a data professional.

Books

If you like books, there are several available on data quality. Personally, I recommend you start with “Data Quality: The Accuracy Dimension” by Jack Olson.

Articles

One article that I would recommend you check out is “What is Data Quality” by IBM.

Online courses

There are a number of online courses available on data quality, such as “Data Quality Fundamentals” on Udemy and the “Total Data Quality Specialization” on Coursera.

Lastly, if you’re like me and you would rather watch someone talk about data quality, I would recommend that you go on YouTube and just search “data quality.” Come back and thank me later.

Data quality is essential for all data professionals. By taking steps to improve data quality, data analysts can make more informed decisions and deliver better insights.

Conclusion

So, there you have it — data quality is the superhero power that can make you awesome at your job! With accurate, complete, and reliable data, you can work wonders and help businesses succeed.

Remember, data quality is the secret sauce that makes data insights rock! Keep an eye on data quality, and you’ll be a data superhero!

I know I probably left some things out, or I have left you with a whole lot of questions. Please, feel free to leave a comment and I’ll answer as best I can.

Till next time,

Samuel.

If you enjoyed this article and want more from me:
Consider following me. Thanks for all your support.

Sign up to discover human stories that deepen your understanding of the world.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Samuel Odifa
Samuel Odifa

Written by Samuel Odifa

Content Strategist. Pharmacy background. Passionate about Marketing, Data, Languages.

No responses yet

Write a response