Data entry forms the most fundamental step in managing one’s personal or business finances. Any errors in this step would percolate through the entire process creating a chain reaction that requires significant effort to undo. For example, data entry with no verification steps has an error rate as high as 4 per cent. Once entered, the error rate for data, without any further verification, is 400 per 10K entries—a huge number affecting even small datasets.
Another instance includes paying all your employees a wrong salary, having multiple to and fro with your bank to revert the error, or dispatching a wrong amount of items and then having to recall shipments. The consequences are dire.
This is where Intuit’s FEDS (financial error detection service) comes to the rescue. It alerts the customers of any error made during the entry of financial data — be it creating invoices, payroll stubs, or even personal financial needs.
The team has developed a generalised solution that addresses financial error detection problems to quickly onboard an error detection use case for any new features in the product, thereby benefiting customers across multiple Intuit offerings.
Errors in business finance
Errors made during data entry are chiefly of two types:
- Typographical errors typically include inputting an extra digit or a digit on adjacent keys or swapping adjacent digits in the number. For example, when a customer is entering information into an invoice, they may intend to enter $1234.56 but may accidentally enter $1324.56 (digit swap) or $1237.56 (adjacent error).
- Contextual errors usually occur while copy-pasting the wrong value from a different source. For example, copying the salary amount from one employee’s paycheck to another.
Intuit believes such errors, a subclass of anomalies, can be identified and, most importantly, prevented at the source.
Inside Intuit’s FEDS
To tackle data entry errors, the Intuit team has developed a generalised AI service called FEDS (financial error detection service), which allows product owners to build AI-driven error detection models automatically via configuration. This offering helps scale the speed of integration with products to serve use-cases at scale, reducing the time and effort required compared to a bespoke model.
Intuit’s FEDS consists of two major components:
- Simulation Kernels: These kernels have been developed to simulate data entry errors according to real-world expectations for a particular use case. This helps in both training and evaluation of the final model.
- Error Detection Kernels: Several classification kernels have been built on modified Z-Test, likelihood ratio test and deep learning-based context embedding approaches to detect errors based on training data.
Z-Test kernel uses Z-score to detect large deviations in the input values of the customers from a computed central measure. In case of data insufficiency, an Empirical Bayes-based double shrinkage estimation approach is used to compute the central measure that minimises the estimation variance.
Following Neyman Pearson Lemma, the Likelihood Ratio Test has been shown to be the most uniformly powerful statistical test for a given false positive rate (alpha). LRT kernel uses the ratio between log-likelihoods under the null hypothesis and alternate hypothesis to arrive at an optimal threshold to perform classification.
The context embedding algorithm assumes that the data being entered in a given context isn’t i.i.d. (independent and identically distributed) data, i.e., for a given context, the data being entered is dependent on past value.
For instance, the salary or bonus of an employee is likely to be the same or similar to that of the last few weeks. Thus, users can take advantage of the context of nearby data entry points to evaluate the likelihood that a new entry is valid or not. A 1D Convolutional Neural Network is used for the character-level embedding of the input value to capture any possible typographical error. This embedding is concatenated with the overall summary embedding to create a joint embedding, which is used to train a gradient boosting classifier for error detection.
Both these components are extensible, and newer kernel approaches can be integrated easily. This also helps decouple the data scientist’s efforts from the deploying efforts, thereby removing any blockers.
The outcome
Intuit said while evaluating the performance of such a system, they generally look at two parameters—false positive rate (false alarms) and true positive rate (recall). Further explaining, the team said that they want their system to have a high recall while maintaining a low false alarm rate, creating a smooth experience while ensuring their customers feel confident in using the app features.
In use cases where the assumption of data being i.i.d. is violated, Intuit found that the contextual embedding approach outperformed the other numerical methods in detecting data entry errors. Consequently, they could confidently identify more errors while maintaining the same level of false alarms or interruptions due to false alarms that could be minimised for a certain preset recall value, ensuring a smoother experience.
Furthermore, generalized AI services reduced the time for productising models by nearly 15x. Developing this capability for a specific use case normally takes about 12+ weeks. On the other hand, FEDS reduced this development time to just a few days. This greatly reduced the turnaround time to enable error detection by the PD team for any new feature being developed.
Towards financial freedom
Intuit is a global technology platform that helps consumers and small businesses overcome their most important financial challenges. The company currently serves more than 100 million customers worldwide, where it looks to enable its customers to achieve financial freedom—from running their own businesses to managing their personal finances.
The team believes that their offerings are developed, keeping in mind the entire gamut of their customers’ needs, helping them support even in the areas they least expect. It believes this helps instil more confidence in their ability to conduct their business and improves the overall experience and productivity.
About the Team
Vignesh Subrahmaniam, PhD., is a Principal Data Scientist in Intuit India. He has 12+ years of experience in ML research with a focus on building innovative products and services that leverage AI / ML.
Arkadeep Banerjee is a Senior Data Scientist in Intuit India and has over 7 years of experience in building AI/ML solutions for Fintech, Retail, Pharma and Manufacturing domains.