Article

Big Data Analytics are Necessary to Fight Unique Fraud

Nov 15, 2017

When large corporations, agencies or insurance companies establish relief funds, fraudsters rationalize their thefts because they believe that these large entities can absorb the costs. Since huge settlements involve numerous transactions and claims, fraudsters often hope that their actions will be lost in an enormous sea of data. The allure of a big payout can be a draw for fraudsters. Luckily, there is an answer: big data analytics.

By: Katherine Larilee Moore, J.D.

President, Vector Analytics, Inc.

When natural disasters, mass tragedies or class action settlements occur, victim compensation funds are often created contemporaneously. These funds are created to benefit victims, but they are also an attractive target for fraud. When large corporations, agencies or insurance companies establish relief funds, fraudsters rationalize their thefts because they believe that these large entities can absorb the costs. Since huge settlements involve numerous transactions and claims, fraudsters often hope that their actions will be lost in an enormous sea of data. The allure of a big payout can be a draw for fraudsters. Luckily, there is an answer: big data analytics.

Analytics streamline the review process for both intake assessors and upper-level reviewers and investigative teams. These analytics are particularly useful in identifying two types of fraud: overstatement/exaggeration of loss and “ghost” claimants.

The BP oil spill is a paramount example of how big data analytics is used as a tool to fight fraud. The oil spill was the largest settlement of its kind in history. The data analytics and statistical trend regressions that I created during my time investigating claimant fraud are some of the most advanced that I’ve ever had the opportunity to employ.

Lost tips and overstatement fraud due to the BP oil spill

Overstatement or exaggeration of loss is common to anyone who has ever conducted a fraud review or audit of any kind. Overclaiming loss is probably as old as the first insurance claim itself. A claimant-friendly review recognizes that everyone is different and as such, injury can differ in type and magnitude across a population despite exposure to the same event. While reasonable variation is to be expected, extraordinary variation is not.

Typically, overstatement fraud is discovered when one juxtaposes similarly situated claimant files. Effective big data analytics can prescribe an objective, numerical standard of loss and define acceptable variation for a database. There is no replacement for a pair of knowledgeable human eyes conducting a thorough qualitative review, but an initial analytic review can detect anomalous submissions and isolate outlier populations before the reviewer even receives the file — thereby replacing a number of time-consuming assessments.  

One example of how overstatement fraud occurs in compensation funds is in settlements that provide compensation for loss of income. Inherent in these income claims are what I call “soft” claims. Soft claims are typically self-reported income that does not comport with general accounting principles and may lack any significant history of documentation. This type of claimant is often employed in a service industry where the bulk of income is derived from tips. It can also be a small business that does their own books and submits their own profit and loss statements. Both of these types of claimants are vital to any economy, and they are often the first and most severely damaged when a natural or man-made disaster befalls an area. Settlements recognize this factand make it a point to pay these claims expeditiously. This situation is ripe for the claimant who wants to exaggerate loss and overstate income.

Generally speaking, waiters and waitresses working in the same restaurant make the same average income over time when adjusting for variance. The amount that the wait staff earns in tips compared to the average daily income of the restaurant itself is also a relatively consistent ratio over time. A claimant seeking compensation for income plus $3,000 a day in tips for a restaurant that earns $10,000 a day is probably exaggerating their income. Any reviewer would question this assertion and a diligent fraud reviewer might cross-reference the claimant’s income assertion with the restaurant’s financials to determine plausibility.

A claimant may submit an income statement reflecting $300 a day in tips for a restaurant that earns $1000 a day. Three hundred dollars a day is not a terribly unreasonable tip income for a waiter/waitress in a moderately busy restaurant — this probably wouldn’t raise suspicion with most reviewers. However, this assertion is just as, if not more, egregious as the claimant who claimed $3,000 a day in tips. Three hundred dollars a day is 30% of the income for the entire restaurant in a day. This fraud is more subtle and may not alarm a reviewer. However, data analytics can recognize this pattern and flag claimant for further fraud review.

When I conducted fraud review for the BP oil spill, I created an algorithm that was used to define reasonable amounts of variance among similarly situated business claimants. This algorithm was used to accentuate those individual claimants whose stated incomes contained extreme, unjustifiable departures from the settlement-defined norm. The algorithms also compared data points such as employee-to-employer income ratios, business-to-business income ratios, and profit and loss statement versus tax return income ratios.

Phantom deckhands and other ghost claimants

Perhaps the most insidious type of fraud is the “ghost” claimant. Organized fraudsters had numerous employees working at the facility processing transactions for individuals who did not exist. The fraudsters conducted a multitude of fake, income-generating transactions. These fake transactions needed to be readily identifiable to the internal, co-conspirator employees. As I poured through volumes of data, I detected what I believed to be patterns in the transaction records. The name “Copernicus” appeared as a first, middle and sometimes last name among “client” transactions with unusual frequency. I created a basic regression to determine the probabilities of the name “Copernicus” occurring at such a high frequency in a mid-sized city where most of the population was Slavic speaking. The results were exceptional. Fraudsters assume that reviewers are not connecting claimants to each other, but data analytics does that.

Another example of a “ghost” claimant is the fraudster who files a claim for a nonexistent claimant. Seafood claims in the BP oil spill were “soft” claims and simply required a note from a boat owner or captain verifying employment. As fraudsters began to realize this, they could not create enough people to file claims for monetary compensation. Individuals would file deckhand claims on several vessels at a time. Submissions indicating that deckhands who had never fished before 2010 were making exorbitant amounts of money in the four months prior to the oil spill.

With more data analysis, a vessel that originally had two deckhands was suddenly sold to a neighbor, a cousin, or a friend who then employed five deckhands on the same vessel. Six months later, this same vessel would be sold to a spouse who in turn employed seven deckhands on the vessel. Vessels that had fished before 2010 suddenly ‘found’ deckhands with fishing tickets and incomes from 2008 that had not been filed with their respective state fisheries until 2012. Every iteration of this pattern repeated itself across many kinds of claims. The fishing industry is often conducted with a handshake between old friends, family members or neighbors who have known each other for generations, so individually these might not have raised red flags. But when analyzed with other information in the database of new claims and retroactively filed fishing receipts, they began to take on a fraudulent connotation.

As additional information was requested the fraudsters became more sophisticated and organized. Eventually, documents like tax returns were provided preemptively with claims that did not require it.  The more documentation claimants submitted, the more data points provided for analysis and cross-reference. Analytics performed on these files confirmed something amazing — many of these claimants did not exist. Many vessels “sold” in non-arm’s-length transactions to spouses or family members were simply to create new boat captains’ crew files. Often vessel names were changed. These vessels became new businesses with claims. The captains then employed new deckhand crews who also claimed for loss of income. Many deckhand claimants even had the same name as several other claimants living at the same address. When this was questioned, fraudsters began using P. O. boxes and creating driver’s licenses to comport with the new addresses. This basic blueprint was replicated a number of times and ultimately formed a scheme that created more than 44,000 fake claimants. With the addition of so many new data points, big data analytics and thorough qualitative reviews identified these patterns almost immediately.

Through the help of data analytics and many dedicated investigators, fraudulent findings were referred to several federal investigative agencies and to the Freeh Group, a global risk management firm, founded by former FBI Director Louis Freeh for further review. Fraudsters were then turned over to the U.S. Department of Justice for prosecution. In the aftermath of huge disasters like the BP oil spill, sorting through so many claims can seem impossible. Luckily, as more fraud examiners learn how to use big data analytics, they discover the mountains of data aren’t insurmountable.