top of page

What are the challenges of integrating machine learning in AML programs

Over the past five years, financial institutions (FIs) have assessed how they could implement and benefit from machine learning capabilities. Some FIs have already leveraged machine learning analytics to improve risk-score alerts, aggregate essential customer data, auto-close false positive alerts and hibernate low-risk alerts. The culmination of these applications will inevitably lead to more effective money laundering detection with lower overhead costs. The technology exists, the results are evident and FIs overwhelmed by their existing rule-based TM programs are racing to adopt this modern solution. However, the adoption has been slower than expected as there are multiple challenges to address when establishing an AML program that relies on machine learning for essential functions.


The Comfort of Rule-Based Alerts

Over the last decade, FIs and regulators have developed a comfort zone with rule-based TM: the models are transparent, the scenarios are designed for widespread coverage—even if proven ineffective—and investigators understand the often simple trigger events. A significant value proposition to rule-based alerts is trust, but how can trust be established in the ‘black box’ solution that is machine learning? Some types of machine learning, such as neural networks and deep learning, use methods to identify suspicious activity that is almost entirely opaque. The machine may determine that a subset of products or customers present nominal levels of risk and will not provide an alert for those. This may be perceived as a deficiency by the technology as well as the required product and customer coverage. Moreover, investigators will have to adjust from knowing the exact money laundering typology that triggered an alert, to speculating on what the machine found to be suspicious. These issues make regulators cautious of allowing FIs to add machine learning logic to AML detection models.

The simplicity of rule-based alerts decreases the burden of aggregating data. For example, if a scenario is based on limited primary criteria, such as geography and value, the data points required to generate a single alert are relatively limited when compared to analytics involving big data. Furthermore, current rule-based alerts are often segmented based on the product. Therefore, separate subsets of data are usually required for each rule-based scenario. For example, customers may trigger on a wire transfer from their business account and trigger separately due to a cash deposit to their retail account. In contrast, machine learning programs require vast amounts of historical data across multiple product lines to perform predictive analytics, holistically cluster customers and identify suspicious patterns. Therefore, the AML department, which is often not the gatekeeper of customer and product data, needs to find practical solutions to aggregate siloed data that will generate a holistic review of what is considered suspicious. Moreover, having a robust data governance program becomes a prerequisite for successful machine learning implementation.

When a traditional AML TM scenario is turned on, it is challenging to turn it off because the scenario was once deemed necessary to mitigate an identified area of risk. Even after this scenario proves to be ineffective in detecting a suspicious activity, its existence is required due to the scenarios perceived by the ability to meet regulatory obligations. In these situations, machine learning programs may prove very useful in deprioritising and hibernating these ineffective alerts. In instances when an FI attempts to risk-score a traditional rule-based scenario with a machine learning model, the challenge is less about transparency and more about integration as data can prove that the scenario is weak. Machine learning programs often require more output data than the traditional rule-based alerting system can provide. A machine learning program cannot risk-score an alert if the input data from the legacy TM system is, for example, a subset of data of individual customers who sent a wire over 20,000 US dollars. The machine learning program will require supplementary data, such as a more extended timeline of transactional data on the use of multiple products, to risk-score such an alert reliably. Whether an AML program plans to use machine learning to generate alerts or triage existing rule-based alerts, a transition toward big data analytics is mandatory.

Lessons from Healthcare

FIs can learn much from the healthcare industry, which faces challenges because of monitoring systems that generate alerts based on limited parameters. As with AML investigators developing a bias toward denoting TM alerts as false positives, hospital staff have become desensitised to patient monitoring alarms. Said research has shown to be between 72% and 99% unimportant.1 These false alarms can be attributed to monitoring systems that rely on highly sensitive single parameters. Therefore, machine learning applications present a solution because they can use physiological data to take multiple vital signs into greater context when identifying significant events, thus reducing the level of false alarms. Similarly, FIs can better mitigate the risk by having a holistic understanding of a client’s profile and activity. However, machine learning is not a panacea and the consequences of a machine learning application making the wrong decision remains high—human life. Machine learning is a source of severe liability and the potential loss of human lives draws attention to this fact. Nevertheless, the Food and Drug Administration (FDA) recognised the benefits of artificial intelligence (AI) and machine learning in the healthcare industry, such as optimising medical software through continuous learning, as well as the associated risks and the risk mitigators.2

The FDA will require premarket submissions when a manufacturer’s software modifications pose risks to end users. The FDA considers factors such as the intended use of the device (ie inform clinical management about non-serious situations [lowest risk] versus treating critical situations [highest risk]). Furthermore, given the nature of continuous learning and its ability to change its algorithm, the FDA deemed it necessary to implement a ‘Total Product Life Cycle’ regulatory approach. This allows the FDA to assess the excellence of the organisation by developing the AI and the procedures involving software development, testing, monitoring and evaluation of post-market performance. The FDA’s proposal has demonstrated a deep understanding of what measures are necessary to curb machine learning risks and perhaps the underlying fundamentals of this proposal can be adopted by regulators in the AML industry.

Machine Learning’s Unique Challenges

When it comes to using machine learning in the AML field, the approach encounters a number of unique challenges that are not applicable to other fields that use this technique. These challenges include human judgement (which is a big part of the process), uncertain data targeting techniques and a lack of complete information on which to base a full model.

One of the most fundamental principles behind supervised machine learning stems from having accurately labelled data, as well as having a large number of samples to use as ‘known positive’ targets and labelled ‘known negatives’ to train a machine learning algorithm to recognize patterns and correlations for future alerts. In AML, targets are based on human judgement, not on a verifiable answer. Evaluating credit risk can be determined when a client has defaulted or not. Also in AML, something can be deemed suspicious and indicative of money laundering by the judgement of the person who initiates the alert. On the other hand, it is possible that laundering could not be recognised by the investigator. Unlike evaluating credit risk or detecting a certain type of mineral to mine, there is no objectively accurate conclusion, only one that is based on human judgement. The resulting subjectivity leads to a supervised algorithm being trained with information that inevitably says something is a positive when it is not. In addition, it teaches the algorithm that a number of things that should be deemed positives are not.

Compounding this issue is an initial state of biased selection of data. Most entities responsible for AML will rely on some mix of rule-based alerting, customer risk scoring and first line judgement to alert on potential instances of money laundering. This means that customers and their transactions are only evaluated if they trigger one or more of these detection methods. The majority of the customers’ base is never actually examined and all known target data will be based on thresholds or criteria that already exist in the organisation. Therefore, if a detection system has a rule—such as a minimum threshold of 20,000 US dollars for wires—then the algorithm may completely dismiss the possibility that a smaller amount could be suspicious, since it has never witnessed an example where that was the case. Many algorithms are vulnerable to picking up some features as absolutes, especially if the training data contains some of these artificial cut-off points. Furthermore, time and scope play a vital role. For example, if a client is detected to be a bad actor in March 2020 but has been a client since 2015, an AI model must consider if all prior behaviour, including the beginning of the client’s relationship, is relevant to be categorised as a target, or if some subset of behaviour where only certain elements exist are sufficient. This balance can influence how strong or weak AI targets are as well as if the model may favour static client attributes or more specific transaction data that varies over time.

Large Data Sets and Human Judgement

Another obstacle that AML encounters as an extension of human judgement is that training sets are not as large as they can be in other fields, and the evaluation of output sample data requires human intervention and evaluation. Training machines to read text in a book or to recognise a stop sign has used ‘captchas’, which serve a dual purpose by helping websites to distinguish between human and robotic behaviour. In addition, captchas help train algorithms to recognize texts or photos as billions of human answers are submitted to captchas each day. It is rare to obtain a good training set exceeding tens of thousands entries in AML, let alone millions and billions. Thus, behaviour that is sometimes quite complex is often relying on insufficient examples to recognise true patterns and associations. Once a model has been trained, it still requires human feedback to evaluate its success, which can be very limiting. When Google DeepMind was working on their chess AI AlphaZero, it played 44 million games of chess against itself in only nine hours; every game contained a definite end state of win, lose or draw without the need for human confirmation.3 In the world of AML, once an algorithm produces a potential final product for testing, the decision needs to be evaluated by humans to determine the effectiveness of the results, particularly since it is expected that a good portion of what it entails are new positive matches previously believed to be negative matches. Part of this restriction in data size also extends to a type of problematic behaviour, which is rare or has never been detected, but does exist in the population. This may not be caught due to the lack of a strong example demonstrated in training data.

In addition, money laundering presents an incompleteness of data that goes beyond that of many other machine learning applications. A chess player can see the whole board and even a self-driving car is only concerned about the objects within a certain proximity. However, within AML, there is relevant information that can be overlooked by a specific institution. If someone deposits cash in 15 different banks, this would be impossible to know. In such cases, the only reliable indication of laundering activity may lie beyond the line of sight of any specific entity.

There are humans who are very adept at evading detection and deliberately simulate ‘normal’ behaviours they adapt over time, which presents a problem for both rule-based and dynamic systems of detection. It may be easy to teach an algorithm to recognize a dog or a cat, but it is much more difficult for an algorithm to recognise a chameleon actively blending in and changing its colours to evade detection and disrupt the recognised patterns.

These are certainly challenges to developing the application of this technology in the industry, but they are not insurmountable. New techniques and approaches must be developed over time to address uncertainty and adapt to these unique issues. Many of the problems that machine learning and humans encounter are the same, such as working exponentially slower and for a limited amount of hours per day. Therefore, well-trained AML professionals who understand machine learning and AML are indispensable assets to FIs. Despite the many challenges of machine learning techniques, these have proven to be a large step in the right direction that allows for more accurate and thorough detection of money laundering activity.


For information

Recent Posts

See All
bottom of page