• This page, OIG Bulletin, February 2021: Data Visualization Techniques to Detect Fraud, Waste and Abuse, is   offered by
  • Office of the Inspector General

OIG Bulletin, February 2021: Data Visualization Techniques to Detect Fraud, Waste and Abuse

This article explores data visualization techniques that can help your jurisdiction achieve accuracy, transparency and accountability in government.

Table of Contents

Data analyst with laptop

Introduction

The August 2020 installment of our data series discussed how to collect data and provided examples of how municipal employees can use that data to identify waste and possible fraud. In this article, we delve into two specific techniques for creating and using data visualizations, which can further aid a jurisdiction in detecting waste or fraud.

Outlier Analysis

Outlier analysis, in its simplest form, compares different data points in order to identify data that stands out from the rest. For example, in the scatter plot below, each of the points represents one contract. The metrics plotted show the number of change orders (orders that modify contract terms) and the contract length in months. There is a visible correlation between the number of change orders and the contract length, which makes intuitive sense: generally, longer contracts have more change orders.

Scatter Plot entitled “Number of Change Orders by Contract Length”. Each of the 29 points on the scatter plot represents one contract over an 18 month period. The metrics plotted show the number of change orders from 0-8 (Y axis) per contract length in months (X axis). There is a correlation between the number of change orders and the contract length.  One contract shows a large number of change orders (8) when compared to the short length of the contract (3 months) and is an outlier.

However, there is one contract, indicated by the red dot, which has a large number of change orders when compared to the length of the contract. This outlier may warrant further investigation. There may be an acceptable explanation for the high number of change orders for that particular contract, but only further review can determine whether the change orders are reasonable.

Benford’s Law

Benford’s Law is a mathematical observation about the frequency distribution of the first digits within a set of naturally occurring numbers. Mathematicians studied collections of numbers and found that the first digit will be a 1 about 30% of the time and a 2 about 17% of the time, with subsequent numbers following a similar decreasing pattern, as shown in the data visualization below. Applying Benford’s Law is another way to find outliers in your data that should be flagged for further review.

Benford’s Law is best applied to large data sets (at least several hundred records) of naturally occurring numbers with some connection, such as population data, income tax data or scientific data. Benford’s Law should not be applied to data sets that have stated minimum and maximum values or are assigned numbers, such as interest rates, telephone numbers or social security numbers.

As an example, we applied Benford’s Law to a data set that contains the price of all items purchased by a city between 2009-2020. The following bar chart, which can be made using Excel, shows the distribution of the first digit of the purchase price for each item in the dataset, with all vendors grouped together. The bar chart shows that the purchase prices, when grouped together, closely follow the expected distribution of numbers according to Benford’s Law.

The bar chart shows the distribution of the first digit of the purchase price for each item in a dataset (X axis), with all vendors grouped together. The bar chart shows that the percent of total count of purchase prices (Y axis), when grouped together by the first digit, closely follow the expected distribution of numbers according to Benford’s Law. All vendor information- Digit 1-30%, digit 2-18%, digit 3-13%, digit 4-10%, digit 5-8%, digit 6-6%, digit 7-6%, digit 8-5%, digit 9-4%.

When grouping the pricing data by individual vendor, as shown in the charts below, and running the same analysis, you can see that Vendor 1 and Vendor 3 follow the expected distribution pattern. Vendor 2, however, deviates visibly at numbers 4 and 9, which could suggest falsified purchasing prices, and should be flagged for further investigation.

Comparison of three individual vendor’s pricing scenarios using Benford’s Law. Vendor 1 information-  Digit 1-33%, digit 2-15%, digit 3-10%, digit 4-9%, digit 5-8%, digit 6-7%, digit 7-7%, digit 8-5%, digit 9-5%. Vendor 2 information-  Digit 1-26%, digit 2-14%, digit 3-15%, digit 4-18%, digit 5-8%, digit 6-4%, digit 7-4%, digit 8-4%, digit 9-7%. Vendor 3 information-  Digit 1-27%, digit 2-18%, digit 3-13%, digit 4-10%, digit 5-8%, digit 6-7%, digit 7-5%, digit 8-6%, digit 9-6%.
Remember that “data” is a broad concept, and that analyzing the many different kinds of data your jurisdiction collects every day can enhance your decision-making processes.

We hope that this article, along with the other articles in our series on data analytics, helps you feel more confident in your ability to collect and analyze data. Now that you understand some specific techniques for visualizing and analyzing data, we encourage you to apply these techniques to procurements or other business decisions facing your jurisdiction. Data visualization and analysis can be particularly useful in identifying and preventing potential fraud, waste and abuse of public resources.

If data analysis leads you to believe that fraud has occurred, please contact the OIG’s Fraud Hotline at (800) 322-1323 or IGO-FightFraud@mass.gov, or fill out our online form.

Contact   for OIG Bulletin, February 2021: Data Visualization Techniques to Detect Fraud, Waste and Abuse

Date published: February 25, 2021
Image credits:  Shutterstock

Help Us Improve Mass.gov  with your feedback

Please do not include personal or contact information.
Feedback