Sunteți pe pagina 1din 17

Chapter 6

Data-Driven Fraud Detection


Discussion Questions
1. The North Carolina case at the beginning of the chapter illustrates why proactive fraud
detection is important. This type of detection can be done early and without previous indications
of fraud, so it is able to catch frauds early. While it can cost more to complete, it can save
organizations significantly on frauds caught early. It allows an investigator to proactively
manage the detection process rather than reacting to cases that come in.
2. Anomalies are unintentional errors caused by control weaknesses. When they occur, they
are generally spread evenly throughout a data set. Fraud is the intentional circumventing of
controls; perpetrators cover their tracks and eliminate indicators. Fraud is often found in very
few transactions rather than in a entire data set, similar to a needle in a haystack.
3. Since fraud is often found in very few transactions, sampling at a 5 percent rate guarantees
at a 95 percent level that the investigation will miss the fraud. Fraud requires full population
analysis whenever possible.
4.

Advantages include the following:


Software can look at full populations almost as easily as samples.
Software can conduct time-intensive analyses quickly.
Software contains routines like stratification, summarization, and fuzzy matching that
are specifically included for fraud detection purposes.
Software allows repeatability: once an algorithm is coded in a macro, it can be improved
and reused in future cases.

5. Statistical analyses, such as using Benfords distribution, can be performed on databases of


any size. They are most useful when a particular type of fraud, like phantom vendors, is
suspected.
6. The biggest disadvantage of statistical analysis is that without looking for specific types of
fraud, it tends to identify large numbers of symptoms. In addition, careful frauds can be lost in
the detail when significant amounts of data are involved.
7. Benfords law is a mathematical algorithm that accurately predicts that, for many data sets,
the first digit of each group of numbers in a random sample will begin with a 1 more than a 2, a 2
more than a 3, a 3 more than a 4, and so on. It will predict the percentage of times that each digit
will or should appear in a sequence of numbers.
8. Data-driven fraud detection determines what kinds of frauds can occur in a particular
situation and then uses technology and other methods to determine if those frauds exist. It
includes the following six-step process:
1. Understanding the business or operation to be studied.
Chapter 6

71

2.
3.
4.
5.

Understanding what kinds of fraud could occur (fraud exposures) in the operation.
Determining the symptoms generated by the most likely frauds.
Using databases and information systems to search for those symptoms.
Following up on symptoms to determine if actual fraud or other factors are causing
them.
Investigation into symptoms that are deemed representative of fraud.

6.

9. To detect fraud through financial statements, investigators focus on unexplained changes.


They can do this in three ways. First, they can focus on the three primary financial statements:
the balance sheet, income statement, and statement of cash flows. Changes in these primary
financial statements can be analyzed to determine whether they make sense or represent
symptoms that should be investigated. The second method is to convert balance sheet and
income statements to change statements by using horizontal and/or vertical analysis. The third is
to compute key financial ratios, such as the quick ratio, current ratio, and accounts receivable
turnover, and compare these ratios from period to period.
10. By focusing on the unexplained changes in financial statements from period to period, we
can identify areas that could possibly contain fraud. We can then focus on these areas and save
time and money in fraud prevention and detection.
11. The biggest difficulty in trying to correlate customers, vendors, and employees with known
problem people is that many cross-correlations contain formatting inconsistencies. Finding ways
to combine these non-obvious relationships is a large problem.
12. A data warehouse can structure data in ways that allow efficient and effective fraud
detection. Since data warehouses are directly within the investigators control, they can be
continually improved to support data-driven fraud detection without affecting production
systems.
13. ODBC is the standard way of connecting data analysis applications with databases. It is
useful because it provides a live connection and retrieves rich data, including data types, column
names, and relationships. Text file import is less effective because it operates in a batch format
and requires more preparation upon import.
14. The Soundex algorithm works well for some situations, but its scorings are best suited to
English names. It ignores vowels and scores only consonants. It loses effectiveness with
numerical values. N-grams are a more powerful approach because they simply measure the
number of matching runs of characters. They work best with larger sequences. Their larges
(largest or biggest??) drawback is the number of matches requires is exponential as words
become larger. NOTE: this sentence is very confusing. Consider using the wording on page 24
in the text.
True/False
1.

72

False: Unusual patterns may be caused by non-fraud factors.

Chapter 6

2.

True

3. False: Data-driven analysis involves using the companys database to search for unusual or
unexpected relationships between numbers.
4. False: The first digit of random data sets begins with a 1 more often than a 2, a 2 more often
than a 3, and so on.
5.

True

6. False: Understanding the kinds of frauds that can occur is one of the key elements when
using data-driven detection techniques.
7.

True

8.

True

9.

False: Unexplained changes are uncommon and often signal that fraud is occurring.

10. True
11. False: Horizontal analysis is a more direct method of focusing on changes.
12. True
13. False: Z-scores are useful in identifying indicators. They are not normally associated with
stratification.
14. False: Normally, multiple indicators or red flags are required to determine that a specific
scheme is occurring.
Multiple Choice
1.

2.

3.

4.

5.

6.

Chapter 6

73

7.

8.

9.

10. b
11. c
12. b
13. c
14. d
15. b
16. c
17. d
18. c
19. d
Short Cases
Case 1.
1.

Advantages of the data-driven approach:


Will not alert possible suspects
Identifies possible red flags
Can find fraud early before it gets too large and expensive
Places control of fraud detection in the investigators hands (rather than a reactive
approach)
Can be performed on any database
Can analyze full populations of data
Can be tailored to the exact needs and cultures of organizations
Disadvantages of the data-driven approach:
Higher cost than reactive approaches
Fraud can be lost in the detail of large data sets and analyses
Is general purpose, and it is difficult to refine results
May give mis-signals

74

Chapter 6

Very difficult to eliminate the mis-signals

2. The reactive approach is still useful and acceptable. This new approach simply provides
another technique for finding fraud. Students should offer commentary on the advantages and
disadvantages in (1) above.

Chapter 6

75

Case 2.
1. Benfords law is a rule of data sets. It holds that the first digit of natural data sets will begin
with a 1 more often than a 2, a 2 more often than a 3, and so on. Benfords law accurately
predicts for many kinds of financial data sets that the digits of each group of numbers in a set of
naturally occurring numbers will conform to certain distribution patterns.
2. Benfords law works with numbers of similar items with no predetermined pattern, but it
does not work with numbers such as personal ID numbers or lists where the numbers have a
built-in minimum or predetermined pattern.
Case 3.
Financial statement fraud is best detected by focusing on unexplained changes. To be used as
effective fraud detection tools, financial statements (specifically the income statement and the
balance sheet) need to be converted into percentage change statements. You could perform the
following procedures to accomplish this conversion:
1. Compare balances from one period to anotherthis might show some significant and
unusual changes.
2. Calculate key financial statement ratiosfor example, quick ratio, accounts receivable
turnover, debt-to-equity, profit margin, and others. By examining the changes in these
ratios and comparing them to previous periods, it is possible to determine if there are
any unexpected or unusual changes that might be indicators of possible fraud.
3. Perform vertical analysisthis analysis converts financial statement numbers to
percentages, thus making it easy to understand and identify possible fraudulent
activities. For example, if sales increased 20 percent, the logical expectation would be
that profits should also increase. If profits did not increase, something may be wrong
and an investigation should pursue.
4. Perform horizontal analysis, which also converts numbers into percentages. However,
horizontal analysis compares the percentage change from period to period, thus making
it easier to spot unusual and unexpected changes.
5. Analyze the cash flow statementbased on previous analysis of the income statement
and the balance sheet, certain expectations about cash flows can be developed. For
example, if sales increased and the accounts receivable turnover decreased, other things
constant, one would expect that cash inflows should increase. Discrepancies between
the balance sheet, the income statement, and the statement of cash flows could be red
flags indicating possible fraud.

76

Chapter 6

Case 4.
Commercial data-mining software. Two of the most popular data-mining packages are Audit
Command Language (ACL) and IDEA. Using ACL or IDEA, a fraud examiner can specify
queries to use in looking for abnormalities. For example, an examiner could look for fictitious
vendors by obtaining a list of vendors with post office box addresses and then investigate to
determine if those vendors really exist.
Statistical analysis on clients databases. Statistical analysis uses predetermined expectations
such as Benfords law to search the clients databases for abnormalities or unusual or unexpected
relationships between numbers. Benfords law predicts the percentages that certain digits in a
group of random numbers will appear. If numbers do not conform to these patterns, then further
investigation into the cause must be conducted.
Data-driven approach. This approach involves determining the kinds of fraud that can occur and
then using technology and other approaches to determine if fraud exists. This method is more
extensive and involves the following five-step process:
a. Understanding the clients business.
b. Understanding the types of fraud that can occur.
c. Listing the symptoms for each type of fraud that can occur.
d. Using technology to search for those symptoms.
e. Analyze the search results
f. Investigating those symptoms further and determining if they are caused by actual
fraud.
Case 5.
There are a couple of possibilities available to Bucket Corp.:
1. Commercial data-mining software: By running analyses of Buckets purchasing trends
with the various buyers, it would quickly come to someones attention that Harris
Lumber has high prices and high-volume purchases. While this could have something
to do with the quality of Harriss lumber, it merits further investigation.
2. Statistical analysis: According to Benfords law, a higher percentage of invoice totals
should start with the number 1 (about 30%), followed by the number 2 (about 17%),
etc. The numbers for Harris Lumber clearly do not follow this trend, while the other
three companies do follow this trend much more closely. This is a pretty good
indication that the invoices for Harris may not be totally genuine. Bucket should look
further to see if there are any abnormal relationships between Harris and Buckets
purchasing agent.

Chapter 6

77

Case 6.
You can detect fraud in the purchasing department by searching for employees with the same
address or telephone numbers as vendors. You can also sort purchases by vendor and purchase
volume and observe if the total purchases for any particular vendor increases or decreases at an
unexpected rate. Additionally, you could chart price increases of different products over time and
the number of each unit that is purchased over time.
Case 7.
1. Although fraud could be occurring, one should not jump to conclusions. There may be other
reasons why inventory has risen much more than sales, such as poor buying policies, increased
prices, or obsolete inventory.
2. As CEO, there are several different approaches you might take to investigate. Certainly, you
would not want to make any accusations, as there could be several nonfraud explanations. You
could consider purchasing some software to help analyze the inventory for factors such as dollar
amount of purchases by buyer, amount of purchases from each buyer, and inventory turnover.
You might also want to consider the possibility of an overstatement of inventory. Examining
purchasing records or recounting inventory could help determine if fraud is being perpetrated.

78

Chapter 6

Case 8.
1.
Benford's Law Test
30%
25%
20%
15%
10%
5%
0%

4
Data

Benford's

2. Based on a comparison of the sample data with Benfords law, there could be a possible
fraud problem. A much larger sample is required to be certain that the troublesome patterns are
persistent, and additional tests such as querying addresses of vendors or analyzing vendor
volumes over the past few months should be undertaken.
Case 9.
1. The chapter did not go into extensive detail on the steps of time trend analysis it is really
beyond the scope of a fraud textbook. Care should be taken on how much detail to expect on this
question. However, it is important because it makes students think through how a time trend
analysis should be done. The principles that were discussed in the text include the following:
a. Data preparation by standardizing the time axis (a summarization technique)
b. Some explanation that the investigator doesnt need to look at thousands of graphs
summary numbers like regression slopes can be used to get a single number
describing the time trend of a specific case.
c. Some account must be taken for periods of inactivity (the zeros in the time trend
graph shown in the chapter).
d. Advanced students may detail the importance of selection of a long enough period
to ensure that frauds can be seen starting and/or ending within the investigation
time.

Chapter 6

79

2. Many different types of frauds can be discovered, including rising costs of any product or
service, employees with increasing rates per hour, rising equipment usage, and rising charges to
accounts.
Case 10.
1. Some areas of concern are the rapid growth of receivables without a corresponding increase
in inventory. Adding to this concern is the increase in accounts payable. Normally, as accounts
payable increase, inventory should also increase.
2. Possible explanations for these trends exist. The company could have had a change in credit
policy, allowing for accounts receivable and accounts payable to increase. Also, the possibility
exists that the company has sold or discarded obsolete inventory to reduce the amount it carries
on the books.

80

Chapter 6

Case 11.
Benford's Law
12
10
8
6
4
2
0

First Digit

Even though the pattern does not exactly follow a normal Benfords law shape, it does not differ
from expectations by much. The only digit that looks out of the ordinary is 4, and testing could
be performed to determine if that is a problem. Most likely, however, fraud is not occurring and
no further investigation is necessary.
Case 12.
1. Advantages of ODBC:
a. Provides a real-time, live connection
b. Can connect to almost any database
c. Imports not only data but also metadata (column names, types, relationships)
d. Allows the use of the powerful SQL language
Disadvantages of ODBC
a. Can affect runtime systems
b. May not be available because of security or privacy concerns
c. Not available on older mainframe systems
2. The time card data needs to be imported using text file import routines. The columns
should be typed correctly (numbers, text, dates), and control totals need to be printed to
ensure all records imported correctly.
Extensive Cases
Extensive Case 1.
1. Student answers will vary based on their plan, but each plan should include the six steps to
the data-driven approach. Students should identify an investigative team that
includes business experts, technology experts, auditors, fraud investigators, and any others that might be
useful to performing the process.
Chapter 6

81

2. The chapter listed ACL, IDEA, Picalo, and ActiveData. Students may also decide that Excel or
Access is good enough for the investigation. Some students may like ACL/IDEA because they are
standard and company-supported. The companies offer training. Other students may like the opensource and free nature of Picalo.
3. The primary technique useful to this type of analysis is stratification and/or summarization.
Stratifying purchases against inventory levels by employee will allow the investigator to see if product
levels (having accounted for purchases during the time frame) are decreasing more during any specific
employees shift.
Extensive Case 2:
1. Use either the Soundex or Fuzzy Match functions to join the JanitorialPurchases to itself on the
Vendor column. In other words, use the software to join two tables, but use the JanitorialPurchases table
as both input tables. Join on the Vendor column. Record ID 1823 shows a purchase from Master
Cleaning Inc., a similar name to Master Cleaning Supply used in other purchases.
2. One way to do this analysis is to add a column for Benfords Law expectation for each record (on
the price column). Then summarize the records by vendor and calculate the average of the Benfords
column. Most companies show about a 15 percent probability level, but the single record from Master
Cleaning Inc. does not match.
3. One way to do this analysis is to stratify the data set by product, creating a number of subtables
(one for each product). Stratification is required so the means and standard deviations used in the zscore calculation are specific to each product. Then calculate the z-score for each record and sort by this
new z-score column. While the familiar Master Cleaning Inc. record comes up, a more subtle trend
shows in Master Cleaning Supply with Jose. Students may see the potential kickback scheme going
on as prices are increasing substantially between Master Cleaning Supply and Joses purchases.
4. A simple summarization by purchaser and product (calculating the average price paid) will
highlight problem combinations. A pivot table will also show this calculation in a nice two-dimensional
grid. The analysis doesnt show any specific problems.
5. Most of the analysis packages have a function for finding gaps in sequence. This shows that ID
1560 is missing from the list.
6. Use descriptives to get an overview of the dataset. This shows that one record in the quantity
column has a zero value. A sort or filter on this column will show that record ID 1677 has a quantity of
0.
Extensive Case 3
The assignments in this section require that the students use data analysis software to answer the
questions. It may be useful to split the class and have some students use ACL, some use Picalo,
82

Chapter 6

etc. Because each software package has its unique strengths, student presentations on the
differences may be a useful classroom activity.
Because the data sets need to stay small for class purposes, some of the questions can be
answered by manually looking through the data set. Be sure to grade students on their process
rather than just answers to ensure that their solutions can be used on much larger data sets as well
as smaller ones.
1. The simplest way to perform this analysis in any application (including a database like MS
Access) is to do a left join on the two tables by vendor code. This will join the two tables and
include all the records from the charges table and any matching records from the
approved_vendors table. Those records with an empty approved_vendors column were not
approved.
A more straightforward way to do this analysis is to use Picalos Find Nonmatching function.
This function returns two tables containing non-matching records from both tables, similar to a
filter for records that do not match in the other table.
The analysis will show that invoices 2535, 1065, and 922 with vendors AC1 and AC2 do not
match. All three purchases were made by Suzie. Follow up analysis should focus on indicators
of phantom vendors or kickback relationships with these vendors.
2. Again, a left join is will work in most applications (with the tables opposite of question 1).
This type of join will show all records from the approved_vendors table and matching records
from the charges table. The records with empty charges columns were not used.
If the student used Picalos Find Nonmatching function in question 1, it also provided the answer
to this question.
The analysis will show that four vendors, FPI09, NBV22, PSK34, QMI57 were not used. This
does not indicate fraud, but it highlights vendors that may need to be removed from the approved
list because the company no longer purchases from them.
Note that while the primary focus of this problem is on joining, teachers can also assign students
to do other analyses to this dataset, including Benfords Law analysis, duplicate invoice numbers,
gaps in invoice number, and trending analysis.

Chapter 6

83

Extensive Case 4.
ABC Company
Ratio Analysis
31-Dec-08
LIQUIDITY RATIOS:
Current ratio
current assets/current liabilities
Quick ratio
(current assets inventory)/current
liabilities
Accounts receivable turnover
sales/average accounts receivable
Days sales in accounts receivable
365/accounts receivable turnover
Inventory turnover
cost of goods sold/average inventory

12/31/08 12/31/07

Change % Change

3.79

2.26

1.53

67.47%

2.57

1.49

1.09

73.05%

4.93

5.19

-0.26

-5.09%

74.04

70.27

3.77

5.36%

1.18

1.36

-0.19

-13.63%

0.21

0.22

-0.01

-4.35%

0.62

0.61

0.01

1.88%

9.03

11.38

-2.35

-20.66%

0.73

0.70

0.03

4.01%

1.33

1.72

-0.39

-22.59%

PROFITABILITY/PERFORMANCE RATIOS:
Profit margin
net income/net sales
Gross profit margin (%)
gross profit/sales
Earnings per share
net income/number of share of stock
Sales/total assets
Sales/total assets
Sales/working capital
Sales/(current assets-curr. liabilities)
EQUITY POSITION RATIOS:
Owners equity/total assets
total stockholders equity/total assets
Current liabilities/owners equity
current liabilities/total stockholders
equity
Total liabilities/owners equity
Total liabilities/total stockholders
equity

84

Chapter 6

0.61

0.48

0.13

27.08%

.33

.68

-0.35

-52.06%

0.65

1.10

-0.44

-40.49%

ABC Company
Balance Sheet
As of December 31, 2008
ASSETS
Current Assets
Cash
Accounts receivable
Inventory
Prepaid expenses
Total Current Assets
Property, Plant, and Equipment
Accumulated Depreciation
TOTAL ASSETS
LIABILITIES
Current Liabilities
Accounts payable
Accrued liabilities
Income taxes payable
Current portion of LT debt
Total Current Liabilities
Long-term Liabilities
Long-term debt
TOTAL LIABILITIES
STOCKHOLDERS EQUITY
Common stock
Additional paid-in capital
Retained earnings
Total Stockholders Equity
TOTAL LIABILITIES AND
STOCKHOLDERS EQUITY

2008

% Total
Assets

$ 501,992
335,272
515,174
251,874
$1,604,312
765,215
218,284
$2,151,243

23.33%
15.59%
23.95%
11.71%

$ 248,494
122,192
10,645
42,200
$ 423,531

2007

% Total
Assets

$ 434,215
302,514
505,321
231,100
$1,473,150
735,531
196,842
$2,011,839

21.58%
15.04%
25.12%
11.49%

11.55%
5.68%
0.49%
1.96%

$ 366,864
216,533
25,698
42,200
$ 651,295

425,311
$ 848,842

19.77%

$ 370,124
29,546
902,731
$1,302,401

17.21%
1.37%
41.96%

$2,151,243

35.57%
10.15%

2006

% Total
Assets

$ 375,141
241,764
310,885
136,388
$1,064,178
705,132
175,400
$1,593,910

23.54%
15.17%
19.50%
8.56%

18.24%
10.76%
1.28%
2.10%

$ 322,156
215,474
22,349
42,200
$ 602,179

20.21%
13.52%
1.40%
2.65%

400,311
$1,051,606

19.90%

375,100
$ 977,279

23.53%

$ 356,758
24,881
578,594
$ 960,223

17.73%
1.24%
28.76%

$ 320,841
21,910
273,880
$ 273,880

20.13%
1.67%
17.18%

$2,011,839

36.56%
9.78%

44.24%
11.00%

$1,593,910

Chapter 6

85

ABC Company
Income Statement
As of December 31, 2008
Sales
Cost of Goods Sold
Gross profit
EXPENSES
Advertising
Depreciation
Bad debts
Legal
Miscellaneous
Rent
Repairs and maintenance
Salaries and wages
Utilities
Total Expenses
Net income before income tax
Income tax expense
NET INCOME

2008
$1,572,134
601,215
$ 970,919

2007
$1,413,581
556,721
$ 856,860

$
$
$
$

55,153
21,442
20,151
17,261
91,014
148,321
14,315
47,121
15,912
430,690
540,229
216,092
324,137

$
$
$
$

50,531
21,442
18,934
10,207
31,214
142,078
13,642
45,312
15,643
349,003
507,857
203,143
304,714

$ Change
$158,553
44,494
$114,059
$

4,622
0
1,217
7,054
59,800
6,243
673
1,809
$
269
$ 81,687
$ 32,372
12,949
$ 19,423

% Change
11.2%
8%
13.3%
9.1%
0.0%
6.4%
69.1%
191.6%
4.4%
4.9%
4.0%
1.7%
23.4%
6.4%
6.4%
6.4%

Internet Assignments
1. As discussed in the article, performing statistical analysis using Benfords law is a good
proactive method to detect fraud. The probability of a number beginning with 1 as the first digit
is not 1 in 9 as most people think. The law states that the probability of a number beginning with
1 as the first digit is about 30%, and the probability of the number beginning with 9 as the first
digit is actually 4.6%. Armed with this knowledge, a fraud examiner can detect fraud by looking
for exceptions to Benfords law. Perpetrators of fraud may not understand Benfords law and
incorrectly assume that the probability is 1 in 9. Benfords law helps identify those people that
fake transactions.
2.
a. A neural network is a computer-based model that patterns after the human brain. It is used to
model data, similar to regression in statistics.
b. Neural networks can model the normal patterns in the data and find transactional trends that
do not match the norm.
c. Many industries can benefit from this technology, but in particular, the credit card industry has
used it significantly to model consumer behavior and guess when a credit card is being used by a
different person.
86

Chapter 6

d. Student will name firms that they have found on the Internet.
e. Students will find different procedures and policies that help identify fraud. One example is a
representative calling the card owner to verify large transactions, overseas transactions, or other
risky transactions that seem different from that consumers normal behavior.
The controller might start by telling the CFO that data mining is a great idea and that they should
go forward with his suggestion. He should, however, inform the CFO of possible severe
limitations in detecting the fraud that has been committed within the company. These limitations
would include the large size of the company and the immense size of the databases to be
analyzed. The controller might also explain the importance of considering implementation of an
automated, real-time fraud detection system due to the large number of transactions that are
occurring between buyers and hundreds of vendors. Additionally, he might tell him that by using
this method, they could analyze company databases while not alerting possible suspects, change
and refine the queries as needed, and then automate the queries so they could be run at any time
in the future.
Debate 2
The debate should center on when sampling is most effective and when full population analysis
is required. As a general rule, todays software packages make full population analysis possible
for almost every fraud investigation. Sampling will always be used for things like accounts
receivable confirmation letters.

Answers to Stop and Think


1.

The six steps in the data-driven approach are different primarily because they are
proactive. They require no predication of fraud to be present. Instead, they provide a
hypothesis-testing approach to fraud detection.

2.

ODBC has several advantages over text import, including 1) they are faster and more
robust, they include data formats and column names, they automatically handle
character encoding, and they provide the powerful SQL language.

Chapter 6

87

S-ar putea să vă placă și