Reduce False Positives


Reducing False Positives

False positive tags can and will occur in the system. False positives mainly occur in files containing large sets of numbers such as Excel, Google Sheets, Apple’s Numbers or database files. It is possible for random data to match and validate through the Luhn Algorithm and thus appear as a false positive.

There are several methods that can be used to help reduce false positives in your search criteria results. The first method is to use combinations of keywords and/or queries along with tags. This method narrows your search resulting in files that are both auto-classified as containing a credit card and contains your keyword or query. In Figure 1, endpoints are searched for only files that are classified (tagged) with a credit card AND contains the word(s) “visa”, “mastercard” or “full name”. Your results will be much narrower because you are combining your search input to narrow the scope, resulting in more positive results.

A second method can be used by selecting specific file extensions versus “all” files types. In the Search Criteria page, either select or create groups of extensions for only specific, non-spreadsheet/database files. For example, you could create a group that consists of many file types such as word, PowerPoint, pdf, text, etc. In this scenario, you would simply not select excel files or any type of spreadsheet thus eliminating the chance of returning files with many potential false-positive numbers.