Data science in financial services- Enhancing transaction monitoring through artificial intelligence

How Capco used machine learning to transform a Tier 1 bank’s transaction screening and monitoring process

In recent years, interest in data science and machine learning has increased significantly. The technology industry was the first to adopt these tech-driven approaches and, now that they have entered the mainstream, companies in all industries – including financial services – are keen to understand how they can best utilize their data. It is important, however, to distinguish potential use cases from actual applications. In our experience, data science and machine learning are underutilized in large financial institutions and in delivering tangible results. Here at Capco, we strongly believe that data science adds significant value to financial services across multiple functions and can offer a multitude of benefits.


Transaction screening and monitoring is an essential aspect of anti-money laundering (AML) practices within banks and financial institutions. It has been estimated that the amount of money laundered globally amounts to more than $3 trillion, much of which is used to fund illegal activities. Financial institutions have strong incentives to identify fraudulent transactions. Fines from regulators arising from inadequate controls can be substantial – not only financially, but also reputationally. Financial Crime (FC) units in financial institutions often have multiple layers and functions to monitor suspicious activity, ranging from Know Your Client (KYC) or Customer Due Diligence (CDD) to Name Screening (NS) and Transaction Screening 

The Challenges-

For most financial institutions, the typical AML workflow is usually a linear pipeline that connects customer transaction data sources to a simple model or rule-based system. If the transaction is found to be suspicious by the system, it is then flagged and passed through several screening levels involving human analysts who decide if the transaction is indeed fraudulent. To ensure that financial institutions have robust controls for AML, regulators are requiring firms to implement very stringent standards for screening and reporting. These controls are often very expensive and require multiple teams and considerable working hours, especially if the account base is very large.

High volumes of flagged transactions can often lead to the following issues:

  • Poor customer service: most flagged transactions are usually false positives (genuine transactions). There is often a delay to approve these transactions, leading to poor customer experience financial institutions need to manage the quality of service. Having customers potentially wait for a transaction to be processed could lead to poor service quality and could breach their SLA.
  • Increasing costs: costs associated with increasing the workforce required to review an increasing volume of alerts (which are often seasonal). The requirement for human validators will be immense for large financial institutions. In addition to this, staff responsible for validating usually have two or more levels of reviewers and thus require a significant number of working hours to maintain a high level of quality service. This problem is often exacerbated when there are seasonal peaks (e.g. towards the end of tax season).
  • Increasing errors: human error of passing a transaction which should be escalated to regulators. These false negatives result in fraudulent transactions being processed undetected and can be a source of substantial regulatory risk for any firm.


Approach: addressing duplication and complexity –

Capco was engaged by a Tier 1 investment bank to design, analyze, test, and implement a machine learning solution to optimize the transaction screening workflow. This solution would handle over 100,000 screening alerts monthly. Partly driven by regulatory requirements, the client had a complex process of approving these alerts, with multiple levels of human screening. Every additional screening requirement increased the volume of alerts for the teams conducting the reviews. Most of the alerts were false positives, and many of them were duplicative alerts that required the same type of review. The increasing alert volumes and time spent on reviews placed a strain on the bank’s resources. To address these challenges, the Capco team worked collaboratively across the Operations and Technology functions to define the following approach that would enable the bank to optimize resourcing and make significant cost savings. 

Model Workflow: automating reviews and escalations-

Once the initial screening is completed and an alert is created, it is fed into the machine learning model. This provides a confidence level for the outcomes that are expected based on the historical data from the screening workflow. The decision to systematically perform the necessary action is then based on the confidence level calculated by the model as below:

  1. Suggested Action: Provides suggestions for the outcome of the initial review (excludes approval, escalation, and/ or required comments) based on historical data; serves as interim phase before auto-escalation and approvals.
  2. Auto-Escalation: Automatically escalate transactions to secondary review (bypassing initial review) if the confidence level is above the pre-determined threshold.
  3. Auto-Approval: Automatically approves and adds comments to close out false positives

Methodology and Outcomes: benefits of a dual approach

A key accomplishment of this model has been automating over 60% of the screening workflow. This is done in two phases: auto-escalation and auto-approval.

The purpose of auto-escalation is to automatically escalate screening alerts from initial to secondary review, only if the outcome can be predicted at a high confidence level. The type of model that is used to achieve this is a Distance Stacked Ensemble Model. This model is broken down into the following:

  • Historical Data: Historical screened wire transactions for the previous 8 weeks.
  • Features: Includes transactional information sourced from the firm’s web-based workflow tool used to showcase potential sanction matches and keywords.
  • Distance Calculation: Each transaction (with screening details) is ‘flattened’ into one row and measured against the number of differences between the current transaction and previous alerts and auto-escalations.

During the pilot phase, machine learning was used to suggest actions and comments to the reviewers. After five months of tracking the accuracy of the predictions, the proposal was reviewed and approved by the firm’s Global Financial Crimes Unit and Operational Controls department. Using the metrics over the trial period, the projected benefits and savings were presented to stakeholders, and an appropriate confidence level was agreed upon.

The purpose of the auto-approval phase is to automatically approve and close out false positives at the initial review – only if the outcome is predicted at a high confidence level.

Since auto-approval has more associated risk, alerts must pass through two separate models to be auto-approved. The models used for this phase were a Distributed Random Forest Distance Classifier model (DRF) and a Natural Language Processing model (NLP). If the alert passes through the first DRF model with high confidence of approval, it is then passed on to the second NLP model for the final decision.

  • Distributed Random Forest Distance Classifier: This is a decision tree-based classifier, which uses the inputs of historical screened transactions of the previous 8 weeks, including information on the screening reason.
  • Natural Language Processing Classifier: This is the main classifier as it is the final decision maker. It converts the details of each transaction into human-readable text. This imitates the actual process of a human which imitates the process that a human would perform.


There are several applications of machine learning within financial services that can prove extremely beneficial to firms. Transaction monitoring models, like the one outlined in this publication, are one such use case. Deploying innovative machine learning techniques, the firm in question was able to significantly reduce costs, improve the accuracy of fraud detection, and more importantly, was able to improve the overall customer experience. Our Data Science in Financial Services series aims to highlight the solutions Capco has provided clients, and further demonstrate how these solutions can apply in your organization. Capco combines cutting-edge machine learning techniques with financial services expertise to help you achieve your goals

By- Jacqueline Gheraldi, Senior Consultant at Capco

Check Also

Are Equity Investments Safe During Turbulent Times?

High volatility can seem quite risky, but you can actually make decent profits if you …