ML Systems — Fraud Detection

Anirban Sen
2 min readDec 10, 2023
Photo by benjamin lehman on Unsplash

1. How we built it: Stripe Radar

  1. Radar is Stripe’s fraud prevention solution which assesses more than 1,000 characteristics of a potential transaction in order to determine the likelihood that it’s fraudulent. It does it accurately, in < 100 ms with a False positive rate (incorrectly blocks) of just 0.1%. Why this is challenging is only 1 out of every 1,000 payments is fraudulent.
  2. They had started with logistic regression and now a ResNeXt based deep neural network (DNN) model runs.
    a. Before this an ensemble “Wide & Deep model,” composed of an XGBoost model (the wide part — for memorization) and a DNN (the deep part — for generalization) was being used.
    b. XGBoost was incompatible at scale as it is not very parallelizable. However, increasing depth of the DNN too much to replace the XGBoost and have the same performance they ran the risk of overfitting, causing the model to memorize random noise in the features.
    c. ResNeXt’s architecture adopts a “Network-in-Neuron” strategy. It splits a computation into distinct branches. The outputs from the branches are then summed to produce an output. Aggregating branches expands a new dimension of feature representation which is more effective than increasing depth or width.
    d. Removing XGBoost component of the architecture, they reduced the time to train their model by over 85% (to less than two hours).
  3. As fraud is a ever-changing domain, every week, the Radar team also meets to discuss new fraud trends that emerge from research into activity on the dark web. They gather all of this information and ideate features that target the specific contours of each attack. They ran some experiments using more synthetically generated transaction data using LLMs and got encouraging results.
  4. They also invest a lot on model explainability as they have to explain why a transaction was marked fraudent (if it was not). They havenot clearly mentioned what techniques they use to understand the most important feature per case but most probably they use something like LIME/SHAP.

--

--