The August 2020 installment of our data series discussed how to collect data and provided examples of how municipal employees can use that data to identify waste and possible fraud. In this article, we delve into two specific techniques for creating and using data visualizations, which can further aid a jurisdiction in detecting waste or fraud.
Outlier analysis, in its simplest form, compares different data points in order to identify data that stands out from the rest. For example, in the scatter plot below, each of the points represents one contract. The metrics plotted show the number of change orders (orders that modify contract terms) and the contract length in months. There is a visible correlation between the number of change orders and the contract length, which makes intuitive sense: generally, longer contracts have more change orders.
However, there is one contract, indicated by the red dot, which has a large number of change orders when compared to the length of the contract. This outlier may warrant further investigation. There may be an acceptable explanation for the high number of change orders for that particular contract, but only further review can determine whether the change orders are reasonable.
Benford’s Law is a mathematical observation about the frequency distribution of the first digits within a set of naturally occurring numbers. Mathematicians studied collections of numbers and found that the first digit will be a 1 about 30% of the time and a 2 about 17% of the time, with subsequent numbers following a similar decreasing pattern, as shown in the data visualization below. Applying Benford’s Law is another way to find outliers in your data that should be flagged for further review.
Benford’s Law is best applied to large data sets (at least several hundred records) of naturally occurring numbers with some connection, such as population data, income tax data or scientific data. Benford’s Law should not be applied to data sets that have stated minimum and maximum values or are assigned numbers, such as interest rates, telephone numbers or social security numbers.
As an example, we applied Benford’s Law to a data set that contains the price of all items purchased by a city between 2009-2020. The following bar chart, which can be made using Excel, shows the distribution of the first digit of the purchase price for each item in the dataset, with all vendors grouped together. The bar chart shows that the purchase prices, when grouped together, closely follow the expected distribution of numbers according to Benford’s Law.
When grouping the pricing data by individual vendor, as shown in the charts below, and running the same analysis, you can see that Vendor 1 and Vendor 3 follow the expected distribution pattern. Vendor 2, however, deviates visibly at numbers 4 and 9, which could suggest falsified purchasing prices, and should be flagged for further investigation.
Remember that “data” is a broad concept, and that analyzing the many different kinds of data your jurisdiction collects every day can enhance your decision-making processes.
We hope that this article, along with the other articles in our series on data analytics, helps you feel more confident in your ability to collect and analyze data. Now that you understand some specific techniques for visualizing and analyzing data, we encourage you to apply these techniques to procurements or other business decisions facing your jurisdiction. Data visualization and analysis can be particularly useful in identifying and preventing potential fraud, waste and abuse of public resources.
Boston, MA 02108
|Date published:||February 25, 2021|