Slot Certification Explained: GLI-11, RNG Testing & Certified Slot Math

Introduction

You have built your slot. The PAR Sheet is complete. The RTP calculation checks out analytically. Your 10-million-spin simulation confirms 96.01% — one basis point above target. The code is clean, tested, and documented. You are ready to ship.

Then you submit to GLI.

Six weeks later, the report arrives. Three findings. One critical. The critical finding: your RNG seeding strategy incorporates Environment.TickCount as a secondary entropy source, which constitutes a predictable seed component under GLI-11 Section 4.3.2. The other two findings relate to a discrepancy between your documented Wild substitution algorithm and its actual implementation on a specific edge case, and a missing field in your audit log that prevents complete spin reconstruction.

None of these were bugs in your business logic. The game played correctly. The RTP was correct. But the game could not be certified.

This scenario plays out hundreds of times a year across the industry. The gap between "mathematically correct game" and "certifiable game" is not about the mathematics — it is about documentation fidelity, implementation transparency, and compliance with standards that have been written not by mathematicians but by regulators and auditors who think about games as systems of claims that must be independently verifiable.

This article is the practitioner's guide to closing that gap. We will walk through what the major certification laboratories — GLI, BMM, and iTech Labs — actually do when they receive your submission, how they test the mathematics specifically, what they are looking for in each phase of review, and how to prepare your game and your documentation to pass on the first submission.

Part I. The Certification Landscape

1.1 Why Certification Exists

Slot certification serves a function that is distinct from quality assurance and fundamentally different from unit testing. QA verifies that the game does what the developer intended. Certification verifies that what the developer intended is what the regulator permits — and that the game actually does it.

The distinction matters because regulators are not primarily concerned with whether your Wild symbol looks correct on screen. They are concerned with three things:

Fairness: Does the game perform as described to the player? If the paytable says 5× Diamond pays 1000× and the game occasionally pays 900×, that is a fairness violation even if it happens on 0.001% of spins.

Predictability: Is the outcome of every spin determined by an auditable, documented process that cannot be influenced by external parties? If a server administrator could theoretically adjust a database flag to lower the RTP during peak hours, that game is not certifiable regardless of what the normal RTP is.

Accountability: Can any individual spin, at any point in the game's operational life, be fully reconstructed from stored records? If a player disputes the outcome of spin number 4,281,042 in session XYZ, the operator must be able to reproduce that exact spin and verify its outcome. If the audit system cannot do this, the game cannot be deployed in regulated markets.

Certification is the formal process of an independent laboratory verifying all three properties hold.

1.2 The Major Laboratories

GLI (Gaming Laboratories International) Founded 1989, headquartered in New Jersey with offices in Las Vegas, London, Melbourne, Singapore, and elsewhere. The most widely accepted laboratory globally — a GLI certificate is recognised in over 480 jurisdictions. Publishes open standards (GLI-11 for RNG, GLI-16 for online games) that many other labs adopt or reference.

BMM Testlabs Founded 1981, headquartered in Las Vegas. Strong presence in North America, South America, Australia, and parts of Europe. Known for deep technical reviews and is often the lab required by specific state gaming commissions in the US.

iTech Labs Founded 2003, headquartered in Melbourne. Dominant in the European online gambling space and strong across Asia-Pacific. eCOGRA (a soft-certification body) often requires an iTech or GLI certificate as a prerequisite.

NMi Dutch laboratory, required for the Netherlands (KSA) market. Strict requirements around player protection features and responsible gambling mechanics — not just RTP and RNG.

Gaming Associates (GA) Strong in Australia and Asian markets. Required by Queensland, Victoria, and several Asian jurisdictions.

eCOGRA Technically a certification body rather than a testing lab. Issues a "Safe and Fair" seal based on ongoing audits rather than one-time certification. Works with operators rather than developers directly.

1.3 Standards Hierarchy

Understanding the hierarchy of standards is essential for knowing which rules apply to your game:

Level 1: Jurisdictional Law
  (e.g., UK Gambling Act, Malta Gaming Authority regulations)
  These have legal force. Non-compliance = loss of licence.
  Labs certify compliance with these.
            ↓
Level 2: Regulatory Technical Standards
  (e.g., UKGC Technical Standards, KSA Game Rules)
  Issued by the regulator. Specific, technical, often prescriptive.
  Labs test against these explicitly.
            ↓
Level 3: Lab-Published Standards
  (e.g., GLI-11, GLI-16, BMM Technical Standard)
  The lab's operationalisation of Level 2 requirements.
  Published, transparent, testable.
  This is where most developers focus their preparation.
            ↓
Level 4: Lab Internal Procedures
  How the lab actually runs tests in practice.
  Not published. Learned through experience and findings reports.
  This article addresses this level primarily.

Part II. The Submission Package

2.1 What You Submit

A complete certification submission contains far more than the game itself. The Technical Submission Package (TSP) typically comprises:

Technical Submission Package (TSP)
│
├── 1. Game Description Document (GDD for certification)
│       Full description of all game mechanics, rules, features
│       Every statement in here will be tested
│
├── 2. PAR Sheet
│       Complete mathematical documentation (see Articles 2–6)
│       Must include ALL bonus features, not just base game
│
├── 3. RNG Documentation
│       Algorithm specification (precise enough to reimplement)
│       Seeding strategy and entropy sources
│       Reseeding policy and schedule
│       Statistical test results (NIST SP800-22, Diehard)
│
├── 4. Pay Table Documentation
│       Every winning combination with exact payout
│       Wild substitution rules with all edge cases documented
│       Scatter rules with exact trigger conditions
│
├── 5. Audit Log Specification
│       Every field logged per spin
│       Enough to fully reconstruct any historical spin
│       Storage format and tamper-evidence mechanism
│
├── 6. Software Version Manifest
│       Exact version of every component being certified
│       Hash of each binary/source file
│       Change log from previous certified version (if applicable)
│
├── 7. Source Code or Binary Access
│       Lab accesses the actual running game via test environment
│       Some labs require source code; others accept black-box testing
│
└── 8. Test Environment Access
        Credentials and instructions to access the game
        Must be a stable, representative environment
        Must be identical to production intent

2.2 The Game Description Document: The Most Underestimated Deliverable

The GDD for certification is not your design document. It is a legal-style specification of the game's behaviour that the lab will treat as a binding claim. Every sentence in it will be tested.

The most common documentation failures:

Underspecified Wild rules. "Wild substitutes for all symbols" is insufficient. The lab will ask: does Wild substitute for Scatter? What happens when two Wilds appear on the same payline? Does Wild count toward a Wild-only combo? Does Wild evaluate as the highest-paying symbol it could substitute for, or the lowest? Every edge case must be explicitly documented.

Missing edge cases in Free Spins. "Free Spins play with enhanced reels" — but what happens if a Scatter appears during Free Spins? Is it a retrigger? Does it pay a cash prize? Is it even visible? What happens if the player disconnects mid-Free-Spins? What if the operator's wallet service times out on spin 7 of 10?

Ambiguous bonus trigger conditions. "3 or more Scatters trigger the bonus" — but 3 Scatters on which reels? On any reels? Only on reels 1, 3, 5? Only on a specific payline? Only when visible on the middle row? Labs have seen all of these variants; they will not assume the obvious.

RTP stated without configuration context. "RTP = 96%" — at what bet level? With which language/currency setting? In which jurisdiction configuration? If the game has Multi-RTP, which configuration is 96%?

Rule: If you can imagine the lab asking "but what happens when...",
      the answer must be in the GDD. If it's not written down,
      the lab will assume the worst-case interpretation and test for it.

2.3 Version Control and Hash Verification

Every file in your submission must have a documented SHA-256 hash. The lab will recompute these hashes on the files they receive and verify they match your manifest. If they don't match — even a single-byte difference — the submission is rejected.

/// <summary>
/// Generates a SHA-256 manifest for all files in a directory.
/// Run this immediately before packaging the submission.
/// The output becomes the "Software Version Manifest" in your TSP.
/// </summary>
public static void GenerateSubmissionManifest(
    string sourceDirectory,
    string outputManifestPath)
{
    var entries = new List<ManifestEntry>();

    foreach (var file in Directory.GetFiles(sourceDirectory, "*", 
                             SearchOption.AllDirectories)
                         .OrderBy(f => f))
    {
        byte[] fileBytes = File.ReadAllBytes(file);
        byte[] hash      = SHA256.HashData(fileBytes);
        string hashHex   = Convert.ToHexString(hash).ToLowerInvariant();
        string relative  = Path.GetRelativePath(sourceDirectory, file);

        entries.Add(new ManifestEntry(relative, hashHex, fileBytes.Length));
    }

    // Write manifest
    var lines = new List<string>
    {
        $"# Software Version Manifest",
        $"# Generated: {DateTime.UtcNow:yyyy-MM-dd HH:mm:ss} UTC",
        $"# Algorithm: SHA-256",
        $"# File count: {entries.Count}",
        $"",
        $"{"Hash",-64}  {"Size",10}  File",
        new string('-', 100)
    };

    foreach (var e in entries)
        lines.Add($"{e.Hash,-64}  {e.SizeBytes,10}  {e.RelativePath}");

    // Compute manifest's own hash (hash of the hash list)
    string manifestContent = string.Join("\n", lines);
    byte[] manifestHash    = SHA256.HashData(
        System.Text.Encoding.UTF8.GetBytes(manifestContent));

    lines.Add("");
    lines.Add($"# Manifest self-hash: {Convert.ToHexString(manifestHash).ToLowerInvariant()}");

    File.WriteAllLines(outputManifestPath, lines);
    Console.WriteLine($"Manifest written: {entries.Count} files, " +
                      $"manifest hash: {Convert.ToHexString(manifestHash)[..16].ToLowerInvariant()}...");
}

public record ManifestEntry(string RelativePath, string Hash, long SizeBytes);

Part III. The Mathematical Review Process

3.1 Phase 1: PAR Sheet Independent Verification

The first thing the mathematics team does is independently verify your PAR Sheet from scratch. They do not run your simulation — they build their own, from your documented reel strips, pay table, and game rules.

If their independently computed RTP matches yours within 0.1% (for a 10M-spin simulation), Phase 1 passes. If it doesn't, the discrepancy reveals one of three things:

Your PAR Sheet has an error (most common — this is why they do this)

Your game implementation differs from your PAR Sheet (critical finding)

Both are correct but one uses a different calculation methodology (e.g., you count scatter on lines, they count scatter anywhere)

The lab will tell you which discrepancy they found and request clarification or correction. This phase alone is the source of ~40% of first-submission mathematical findings.

What the lab builds:

# Pseudo-code for the lab's independent RTP calculation
# They write their own version of this from your documentation

def calculate_rtp_from_par_sheet(reel_strips, pay_table, paylines, wild_id, scatter_id):
    cycle = product(len(r) for r in reel_strips)
    total_win = 0

    for stops in all_combinations(reel_strips):
        grid = build_visible_grid(reel_strips, stops)
        spin_win = 0

        for payline in paylines:
            symbols = [grid[col][payline[col]] for col in range(5)]
            combo, count = evaluate_combination(symbols, wild_id)
            if count >= 3:
                spin_win += pay_table.get((combo, count), 0)

        # Scatter: count visible anywhere
        scatters = sum(1 for col in range(5) for row in range(3)
                      if grid[col][row] == scatter_id)

        total_win += spin_win
    
    return total_win / cycle

# If this result differs from your PAR sheet by > 0.1%: FINDING

3.2 Phase 2: Source Code Review (Mathematical Logic)

Not all labs require source code — some operate as black-box testers. But GLI and BMM for most jurisdictions require source code access, at least for the mathematical components.

The reviewer reads the code looking for discrepancies between the documented behaviour and the actual implementation. Common findings at this phase:

Wild substitution mismatch:

// Your GDD says: "Wild evaluates as the highest-paying substitute"
// Your code says:
private int ResolveWild(int[] lineSymbols, int wildId)
{
    // Returns the FIRST non-Wild symbol found — not the highest-paying
    return lineSymbols.FirstOrDefault(s => s != wildId);
    // ← This is "first non-wild", not "highest-paying"
    //    On paylines where the order is [WILD, DIAMOND, QUEEN, ...],
    //    this returns DIAMOND. But if the order is [QUEEN, WILD, DIAMOND, ...],
    //    it returns QUEEN — which might not be highest-paying on a given paytable
}

// The lab will construct a test case where these differ and verify which
// behaviour the game actually exhibits. If it differs from documentation: FINDING.

Off-by-one in reel wrap-around:

// Your reel strip has 32 positions (indices 0–31)
// Stop = 31 means:
//   Row 0 (top):    index 30
//   Row 1 (middle): index 31
//   Row 2 (bottom): index 0  ← wraps around

// Correct:
int GetSymbol(int[] strip, int stop, int row, int visibleRows)
{
    int index = (stop + row - (visibleRows / 2) + strip.Length) % strip.Length;
    return strip[index];
}

// Common bug: forgetting the + strip.Length before % strip.Length
// Result: negative index throws exception or returns wrong symbol on stop = 0
// The lab WILL test stop position 0 on all reels. This will be found.

Scatter counted on paylines, not on grid:

// GDD says: "Scatter pays when 3 or more appear anywhere on the visible grid"
// Code does:
int CountScatters(int[,] grid, int[] payline, int scatterId)
{
    // ← This counts scatters on a specific payline, not the whole grid
    return payline.Select((row, col) => grid[col, row])
                  .Count(s => s == scatterId);
}

// Correct:
int CountScattersOnGrid(int[,] grid, int scatterId)
{
    int count = 0;
    for (int col = 0; col < 5; col++)
        for (int row = 0; row < 3; row++)
            if (grid[col, row] == scatterId) count++;
    return count;
}

3.3 Phase 3: Empirical RTP Testing

The lab runs the game directly — either through an API or by playing it in a headless/automated mode — and records actual spin results.

The standard procedure:

1. Set bet to a known value (e.g., £1.00 total bet)
2. Execute 1,000,000 spins in automated mode
3. Record every spin: bet, win, balance delta, bonus triggered
4. Compute empirical RTP = Σwins / Σbets
5. Compare to documented RTP

Acceptable tolerance: ±0.5% for 1M spins (statistical noise)
If discrepancy > 0.5%: repeat with 5M spins
If discrepancy persists > 0.2%: FINDING

The statistical basis: for 1M spins with typical slot variance (σ ≈ 15×), the standard error of the RTP estimate is approximately:

SE(RTP) = σ / √N = 15 / √1,000,000 = 15 / 1000 = 0.015 = 1.5%

95% confidence interval: ±1.96 × 1.5% ≈ ±2.94%

This seems wide — 1M spins only narrows RTP to ±3% at 95% confidence? The key is that labs look for systematic bias, not just noise. If the empirical RTP is consistently below 95% across three independent 1M-spin runs, that's a 3σ signal of a real problem even if each individual run falls within the 95% confidence interval.

For the game to "pass" this phase:

No single 1M-spin run deviates by more than 2% from documented RTP

The mean across three 1M-spin runs deviates by no more than 0.5%

Specific combination hit frequencies (e.g., 5× Diamond) are verified against PAR Sheet values

3.4 Phase 4: Combination Frequency Verification

Beyond aggregate RTP, the lab verifies that individual combinations occur with the documented frequency. This catches bugs that cancel each other out in aggregate (e.g., Diamond 5× paying slightly too often while Diamond 3× pays slightly too rarely — the aggregate RTP looks fine but the combinations are wrong).

For each documented combination:
  Expected probability = value from PAR Sheet
  Observed frequency   = count / total_spins
  
  χ² contribution = (observed - expected)² / expected × total_spins
  
All contributions summed → overall χ² statistic
Compared to χ²(df, 0.01) critical value
If χ² > critical value: FINDING

In practice, the lab runs enough spins that even a 0.1% relative error in any significant combination probability will be detected. The verification target for each combination is:

For a combination with P = 1/46,600 (5× Diamond):
After 10M spins: expected = 215 occurrences
Standard error = √(10M × P × (1-P)) ≈ √215 ≈ 14.7

Lab flags if: |observed - 215| > 3 × 14.7 = 44  (3-sigma threshold)

This means 5× Diamond must occur between 171 and 259 times in 10M spins to pass this check. If it occurs 260 times, that's a finding — even if the payout is correct and the aggregate RTP is fine.

3.5 Phase 5: Boundary and Edge Case Testing

This phase is the one developers are least prepared for. The lab systematically tests every documented rule boundary:

Minimum and maximum win scenarios:

Test: minimum possible spin bet
Test: maximum possible spin bet
Test: win of exactly 0×
Test: win of exactly 1× bet (break-even)
Test: theoretical maximum win
Verify: payout at maximum does not exceed game's stated max win
Verify: all payouts scale correctly with bet

Reel boundary positions:

Test every reel at stop position 0 (boundary wrap-around)
Test every reel at stop position (length - 1)
Test: does reel 1 stop 0 × reel 2 stop 0 × ... produce a valid outcome?
Verify: no combination of stops causes a crash or invalid payout

Scatter boundary conditions:

Test: exactly 2 Scatters visible (should NOT trigger bonus)
Test: exactly 3 Scatters visible (SHOULD trigger bonus)
Test: all 5 reels show Scatter simultaneously
Verify: trigger fires exactly once per trigger event, not multiple times

Bonus retrigger during retrigger:

Test: 3+ Scatters appear on spin 3 of a 10-spin Free Spins block
Verify: 10 additional spins are correctly added
Verify: retrigger counter reset happens at the correct moment
Verify: multiplier (if any) correctly applies to retrigger spins

Session interruption recovery:

Test: player disconnects mid-Free-Spins
Reconnect: verify game resumes from correct spin within correct state
Verify: total payout matches what it would have been without interruption
Verify: no spins are double-counted or lost

Currency and bet precision:

Test: bet of exactly £0.01 (minimum)
Test: bet of exactly £500.00 (maximum if applicable)
Verify: all win calculations produce correct results at boundary bets
Verify: floating-point rounding does not cause ±£0.01 errors in payouts

Part IV. The RNG Audit

4.1 What the RNG Audit Covers

The RNG audit is a separate track from the mathematical review and is often conducted by a different team within the lab. It covers:

Algorithm documentation — is the algorithm precisely specified?

Seeding review — is the seed derived from a certified entropy source?

Output statistical testing — does the output pass the test battery?

Independence verification — is the RNG output independent of bet, player, session history?

Isolation verification — can external inputs influence the RNG output?

The RNG documentation must be precise enough to allow independent reimplementation. "We use a cryptographically secure RNG" is insufficient. The required level of detail:

ACCEPTABLE RNG DOCUMENTATION EXAMPLE:

Algorithm: OS CSPRNG-backed DRBG
Implementation: .NET 8 System.Security.Cryptography.RandomNumberGenerator
  - On Windows: BCryptGenRandom (Windows CNG)
  - On Linux: /dev/urandom (kernel CSPRNG)
  Both backed by hardware entropy sources (Intel RDRAND/RDSEED where available)

Range reduction: RandomNumberGenerator.GetInt32(n) method
  Uses rejection sampling to eliminate modular bias.
  Internal implementation: Lemire's multiply-and-shift with rejection threshold

Seeding: Not applicable — OS CSPRNG manages its own entropy pool,
  which is seeded from hardware entropy sources at OS boot and
  continuously refreshed during operation.

Per-spin usage:
  - 1 call to GetInt32(reelSize) per reel per spin
  - 5 calls total for a 5-reel game
  - Additional calls for bonus mechanics (as documented in section 4.3)

State isolation: Each call to GetInt32() is stateless from the
  caller's perspective. No game state influences RNG output.
  No caching of RNG output between spins.

Version: .NET 8.0.1, RandomNumberGenerator static class
  SHA-256 of System.Security.Cryptography.dll: [hash]

4.2 The Statistical Test Battery

For GLI-11 certification, the lab runs the NIST SP 800-22 test suite on your RNG output. They generate the test data themselves by calling your RNG in isolation — not by extracting output from game spins.

Test data generation:
1. Access the documented RNG function directly
2. Generate 10^9 bits (125 MB) of raw output
3. Run all 15 NIST SP 800-22 tests at significance level α = 0.01
4. For each test, run 100 sequences of 10^6 bits each
5. For each test, verify:
   a. P-value distribution is uniform over [0, 1]
   b. Proportion of sequences passing at α = 0.01 is ≥ 0.96 (96%)
   c. No individual sequence fails the test with p-value < 0.0001

Acceptable result: all 15 tests pass
Finding: any test fails (even one)

The lab also runs the Diehard battery independently. Some labs add their own proprietary tests. For high-value certifications (US state gaming commissions, Australian state regulators), the TestU01 BigCrush battery (106 tests) may be required.

4.3 The Independence Audit

This is the test that catches the most subtle bugs. The lab verifies that the RNG output is statistically independent of:

Bet size: No correlation between bet amount and symbol probabilities

Session length: No trend in symbol frequencies as session length increases

Previous outcomes: No serial correlation between consecutive spins

Player identity: No systematic differences between RNG output for different player IDs

Time of day/server load: No correlation between system metrics and symbol frequencies

// Test: serial correlation between consecutive reel stops
// If this correlation is non-zero, it means the RNG has memory
// (which is impossible for a correct CSPRNG but easy to introduce accidentally)

public static double ComputeSerialCorrelation(
    int[]  reelStops,          // Sequence of reel stop positions
    int    reelSize,
    int    lag = 1)            // Correlation at lag k
{
    int n = reelStops.Length - lag;
    
    double meanX = reelStops.Take(n).Average();
    double meanY = reelStops.Skip(lag).Take(n).Average();
    
    double numerator   = 0;
    double denomX      = 0;
    double denomY      = 0;
    
    for (int i = 0; i < n; i++)
    {
        double dx = reelStops[i]       - meanX;
        double dy = reelStops[i + lag] - meanY;
        numerator += dx * dy;
        denomX    += dx * dx;
        denomY    += dy * dy;
    }
    
    double correlation = numerator / Math.Sqrt(denomX * denomY);
    // For a good CSPRNG: |correlation| < 0.002 for n = 1,000,000
    // Finding threshold: |correlation| > 0.01
    
    return correlation;
}

A serial correlation greater than 0.01 at any lag from 1 to 100 is a finding. For a correct CSPRNG, this will never happen — but for a game that accidentally shares RNG state between spins (e.g., by caching the previous random value and using it as a seed for the next), it produces detectable correlation.

Part V. The Audit Log Review

5.1 Why Audit Logs Are as Important as the Math

Regulators need to be able to reconstruct any historical spin. This is not a theoretical requirement — it is exercised in practice:

Player disputes ("the game showed me a win but didn't pay")

Regulatory investigations ("show us all spins between 2:00 and 3:00 AM on this date")

Operator audits ("verify the RTP over the last 30 days")

Court cases (yes, these happen)

If your audit logs cannot support any of these scenarios, you will fail the audit review.

5.2 Minimum Required Fields Per Spin

GLI-16 (online gaming standard) specifies minimum audit log requirements. The following fields are required for every spin:

Mandatory fields per spin record:
──────────────────────────────────────────────────────────────────
Field               Type        Description
──────────────────────────────────────────────────────────────────
spin_id             UUID        Globally unique identifier
session_id          UUID        Player session identifier
player_id           STRING      Operator's player identifier
timestamp_utc       DATETIME    UTC timestamp to millisecond precision
game_id             STRING      Game identifier (version-specific)
game_version        STRING      Exact software version string
bet_amount          DECIMAL     Total bet placed (currency units)
currency            STRING      ISO 4217 currency code
lines_played        INT         Number of active lines
coin_value          DECIMAL     Coin value per line
rng_version         STRING      RNG algorithm version tag
reel_stops          INT[]       Stop position for each reel [R1,R2,R3,R4,R5]
reel_sizes          INT[]       Size of each reel strip [L1,L2,L3,L4,L5]
visible_grid        INT[][]     All visible symbols [col][row]
win_amount          DECIMAL     Total win from this spin
win_details         JSON        Breakdown: which combinations won, amounts
scatter_count       INT         Number of Scatters visible
bonus_triggered     BOOL        Whether bonus was triggered this spin
bonus_type          STRING      Which bonus type (if applicable)
balance_before      DECIMAL     Player balance before debit
balance_after       DECIMAL     Player balance after credit
──────────────────────────────────────────────────────────────────

For Free Spins, additional fields:

Additional fields for Free Spin records:
──────────────────────────────────────────────────────────────────
parent_spin_id      UUID        The base spin that triggered this FS block
fs_block_id         UUID        Unique ID for this Free Spins block
fs_spin_number      INT         Which spin within the block (1-based)
fs_total_spins      INT         Total spins in this block (incl. retriggers)
fs_multiplier       DECIMAL     Multiplier applied to this spin's wins
fs_reel_config      STRING      Which FS reel configuration used
fs_retrigger        BOOL        Whether this spin triggered a retrigger
──────────────────────────────────────────────────────────────────

5.3 The Reconstruction Test

The lab's standard audit test:

Take a random sample of 1,000 historical spins from your test environment

For each spin: read the reel_stops and reel_sizes from the audit log

Independently reconstruct the visible grid from these values

Independently evaluate all paylines against the documented pay table

Compare reconstructed win amount to win_amount in the audit log

If they differ by any amount: FINDING

/// <summary>
/// Audit reconstruction test: verifies every spin in the sample
/// can be independently reproduced from its audit log entry.
/// This is what the certification lab runs — run it yourself first.
/// </summary>
public class AuditReconstructionVerifier
{
    private readonly GameConfig     _config;
    private readonly WinCalculator  _calculator;

    public ReconstructionReport Verify(
        IEnumerable<SpinAuditRecord> auditRecords,
        bool stopOnFirstFailure = false)
    {
        var failures = new List<ReconstructionFailure>();
        long tested  = 0;
        long passed  = 0;

        foreach (var record in auditRecords)
        {
            tested++;

            // Reconstruct visible grid from stored reel stops
            var grid = ReconstructGrid(
                record.ReelStops,
                record.ReelSizes,
                record.ReelStrips);   // reel strip content, stored or referenced

            // Independently evaluate all paylines
            decimal reconstructedWin = _calculator.EvaluateAllLines(
                grid, _config.Paylines, _config.PayTable,
                record.BetPerLine);

            // Compare
            if (reconstructedWin != record.WinAmount)
            {
                var failure = new ReconstructionFailure(
                    SpinId:         record.SpinId,
                    AuditedWin:     record.WinAmount,
                    ReconstructedWin: reconstructedWin,
                    ReelStops:      record.ReelStops,
                    Delta:          reconstructedWin - record.WinAmount
                );

                failures.Add(failure);
                Console.WriteLine($"FAILURE: Spin {record.SpinId}  " +
                                  $"Audited={record.WinAmount:F4}  " +
                                  $"Reconstructed={reconstructedWin:F4}  " +
                                  $"Delta={failure.Delta:F4}");

                if (stopOnFirstFailure) break;
            }
            else
            {
                passed++;
            }
        }

        return new ReconstructionReport(
            TotalTested:  tested,
            TotalPassed:  passed,
            TotalFailed:  failures.Count,
            PassRate:     (double)passed / tested,
            Failures:     failures
        );
    }

    private int[,] ReconstructGrid(
        int[] stops, int[] sizes, int[][] reelStrips)
    {
        int cols = stops.Length;
        int rows = 3;  // standard visible rows
        var grid = new int[cols, rows];

        for (int col = 0; col < cols; col++)
        {
            int strip   = stops[col];
            int length  = sizes[col];

            for (int row = 0; row < rows; row++)
            {
                // Standard 3-row visible window: stop-1, stop, stop+1
                int index    = (strip + row - 1 + length) % length;
                grid[col, row] = reelStrips[col][index];
            }
        }

        return grid;
    }
}

public record ReconstructionFailure(
    string SpinId, decimal AuditedWin, decimal ReconstructedWin,
    int[] ReelStops, decimal Delta);

public record ReconstructionReport(
    long TotalTested, long TotalPassed, long TotalFailed,
    double PassRate, List<ReconstructionFailure> Failures)
{
    public void Print()
    {
        Console.WriteLine($"Reconstruction Audit: {TotalTested:N0} spins");
        Console.WriteLine($"  Passed: {TotalPassed:N0} ({PassRate * 100:F2}%)");
        Console.WriteLine($"  Failed: {TotalFailed:N0}");

        if (TotalFailed > 0)
        {
            Console.WriteLine("\nFailure details:");
            foreach (var f in Failures.Take(10))
            {
                Console.WriteLine($"  [{f.SpinId}] " +
                                  $"stops=[{string.Join(",", f.ReelStops)}]  " +
                                  $"delta={f.Delta:+F4;-F4}");
            }
        }
        else
        {
            Console.WriteLine("  ALL SPINS RECONSTRUCTED CORRECTLY ✓");
        }
    }
}

5.4 Tamper Evidence

Audit logs must be tamper-evident. If someone modifies a log entry after the fact, it must be detectable. Common implementations:

Hash chaining (blockchain-style):

public class TamperEvidentAuditLogger
{
    private string _previousHash = "GENESIS";  // Starting anchor

    public async Task LogSpinAsync(SpinAuditRecord record)
    {
        // Include previous hash in the new record's hash computation
        string recordJson = JsonSerializer.Serialize(record);
        string inputForHash = $"{_previousHash}:{recordJson}";

        byte[] hash    = SHA256.HashData(
            System.Text.Encoding.UTF8.GetBytes(inputForHash));
        string hashHex = Convert.ToHexString(hash).ToLowerInvariant();

        var chainedRecord = record with
        {
            PreviousHash = _previousHash,
            RecordHash   = hashHex
        };

        await _storage.WriteAsync(chainedRecord);
        _previousHash = hashHex;
    }

    /// <summary>
    /// Verifies integrity of the entire audit chain.
    /// Any modification to any record invalidates all subsequent hashes.
    /// </summary>
    public async Task<ChainIntegrityResult> VerifyChainAsync(
        IAsyncEnumerable<SpinAuditRecord> records)
    {
        string expectedPrevHash = "GENESIS";
        long   verified         = 0;
        long   tampered         = 0;

        await foreach (var record in records)
        {
            // Recompute expected hash
            string recordJson    = JsonSerializer.Serialize(record with
            {
                PreviousHash = null!,
                RecordHash   = null!
            });
            string inputForHash  = $"{expectedPrevHash}:{recordJson}";
            byte[] expectedHash  = SHA256.HashData(
                System.Text.Encoding.UTF8.GetBytes(inputForHash));
            string expectedHex   = Convert.ToHexString(expectedHash).ToLowerInvariant();

            if (record.PreviousHash != expectedPrevHash
                || record.RecordHash != expectedHex)
            {
                tampered++;
                Console.WriteLine($"TAMPER DETECTED: Spin {record.SpinId}");
            }
            else
            {
                verified++;
            }

            expectedPrevHash = record.RecordHash;
        }

        return new ChainIntegrityResult(verified, tampered);
    }
}

Part VI. The Findings Report and How to Respond

6.1 Understanding Findings Classifications

Certification labs classify findings by severity:

Critical Finding (Blocking) The game cannot be certified in its current state. The finding must be resolved and the affected components re-submitted and re-tested before a certificate can be issued.

Examples:

RNG uses predictable seeding

Payout calculation differs from documented pay table

RTP exceeds documented maximum or falls below documented minimum

Session interruption causes loss of player funds

Audit logs cannot reconstruct spins

Major Finding (Blocking) As critical, but typically applies to a specific edge case rather than the general case.

Examples:

Specific combination of Wild + Scatter produces incorrect payout

Retrigger count incorrect when exactly 3 Scatters appear during last spin of Free Spins block

Balance shown to player before wallet service confirms credit

Minor Finding (Non-Blocking) The game can be certified with the finding noted, but must be resolved before the next re-certification or within a specified timeframe.

Examples:

Documentation omits an edge case that the implementation handles correctly

Audit log missing a non-mandatory field

Statistical test result marginal (passes but close to threshold)

Advisory (Non-Blocking) A recommendation that does not prevent certification but represents best practice.

Examples:

Consider adding more detailed win breakdown to audit log

RNG documentation could more precisely describe the output function

6.2 The Findings Response Process

Finding received
      ↓
Classify: Is this a documentation issue or an implementation issue?
      ↓
Documentation issue:           Implementation issue:
Update GDD, PAR Sheet         Fix code, update documentation
Resubmit affected documents   Resubmit affected components + docs
Lab reviews documents         Lab retests affected area
      ↓                             ↓
Finding resolved? → No → Back to fix
      ↓ Yes
Certificate issued (or partial if other findings remain)

Response letter format that works with labs:

Don't just send a corrected file. Send a structured response that:

Finding Reference: GLI-XXXX-F001
Finding Classification: Critical
Finding Description: [exact text from lab's finding]

Root Cause Analysis:
[What caused this issue and why it wasn't caught in internal testing]

Remediation:
[Exactly what was changed, with before/after code comparison if applicable]

Files Changed:
  - SlotRng.cs (line 47: removed TickCount from seed)
  - SHA-256 of updated file: [hash]

Verification:
[How you verified the fix addresses the finding,
 what tests you ran, what results they produced]

Impact Assessment:
[Did this fix affect any other components? If so, which?
 Were any other spin outcomes affected by this bug?]

A well-structured response dramatically reduces the number of re-test cycles and shows the lab that you understand the regulatory framework.

6.3 Building a Pre-Submission Finding Checklist

Running this checklist internally before submitting reduces the finding rate dramatically:

RNG CHECKS
□ RNG algorithm is precisely documented (algorithm, seeding, version)
□ No System.Random used anywhere in the codebase
□ Seed is derived entirely from OS CSPRNG (no time, PID, or observable state)
□ RNG output has been tested with NIST SP 800-22 locally
□ RNG output is statistically independent of bet, player ID, prior outcomes
□ No caching of RNG output between spins
□ Reel stops are generated AFTER bet is deducted

MATHEMATICAL CHECKS
□ PAR Sheet RTP computed analytically via full enumeration (not simulation)
□ PAR Sheet values independently verified by a second person
□ Simulation of 10M spins confirms analytical RTP within ±0.1%
□ Wild substitution logic documented and tested for all edge cases:
    □ All-Wild combination
    □ Wild + Scatter on same payline
    □ Wild completing a combination that wasn't started by Wild
    □ Wild position: first, middle, last
□ Scatter counted on grid (not on payline) — tested for 2/3/4/5 Scatters
□ Free Spins RTP calculated from FS reels, not base game reels
□ Retrigger probability calculated using correct model (not simple approximation)
□ Bonus EV weighted by actual trigger distribution (by Scatter count)

PAY TABLE CHECKS
□ Every combination in the GDD has a corresponding entry in the code
□ Every entry in the code has corresponding documentation in the GDD
□ Payout at maximum bet does not exceed documented maximum win
□ Payout at minimum bet rounds correctly (no floating-point errors)
□ Multi-line win summation is correct (no double-counting)
□ Win display matches win calculation (no UI/logic mismatch)

AUDIT LOG CHECKS
□ Every mandatory field present in every spin record
□ Reconstruction test passes on 100% of sampled records
□ Tamper-evidence mechanism implemented and verified
□ Free Spins records include parent_spin_id, fs_block_id
□ Session resumption after disconnection logs correctly
□ Balance before/after is correct even when wallet service is slow

DOCUMENTATION CHECKS
□ GDD covers every mechanic that the code implements
□ No mechanic in the code is absent from the GDD
□ Wild rules: every edge case explicitly stated
□ Scatter rules: grid-based vs payline-based explicitly stated
□ Bonus trigger conditions: exact conditions, not vague descriptions
□ RTP configuration: which configuration is default, which are optional
□ Version manifest: SHA-256 of every file to be submitted
□ RNG documentation: algorithm, seeding, version, per-spin usage count

Part VII. Certified Math vs Game Math — The Core Distinction

7.1 Why They Can Diverge

"Certified math" is the mathematical model that was submitted to and approved by the certification laboratory. "Game math" is what the running code actually computes.

They can diverge in ways that are:

Immediately obvious — the game crashes on a specific combination because the certified version handled an edge case differently

Subtly wrong — the game pays 0.1% more RTP than certified because a Wild substitution rule was implemented slightly differently in a post-certification hotfix

Invisibly wrong — the game has the correct aggregate RTP but an individual combination's probability is off by a factor, compensated by another combination being off by the inverse factor

The third category is the most dangerous. It passes all aggregate checks but fails the per-combination frequency verification. Players who know the game's documented probabilities may notice that 5× Diamond "never seems to hit" even though the aggregate RTP looks right.

7.2 How to Keep Them in Sync

The only reliable way to keep certified math and game math in sync is to use the same computational code for both.

/// <summary>
/// The WinCalculator is the single source of truth for payout computation.
/// It is used by:
///   1. The PAR Sheet calculation tool (mathematical documentation)
///   2. The simulation verification tool (pre-certification testing)
///   3. The live game server (production spin evaluation)
///   4. The audit reconstruction tool (post-hoc verification)
///
/// Any divergence between these uses IS a certification finding.
/// </summary>
public sealed class WinCalculator
{
    private readonly GameConfig _config;

    // The SAME instance is used in all four contexts above.
    // It is pure (no side effects, no external dependencies).
    // Given the same inputs, it always produces the same outputs.

    public decimal EvaluateAllLines(
        int[,]  visibleGrid,
        int[][] paylines,
        Dictionary<(int symbolId, int count), decimal> payTable,
        decimal betPerLine)
    {
        decimal totalWin = 0;

        foreach (var payline in paylines)
        {
            var (symbolId, matchCount) = EvaluatePayline(visibleGrid, payline);

            if (matchCount >= 3 && payTable.TryGetValue(
                    (symbolId, matchCount), out decimal payout))
            {
                totalWin += payout * betPerLine;
            }
        }

        return totalWin;
    }

    private (int symbolId, int matchCount) EvaluatePayline(
        int[,] grid, int[] payline)
    {
        int? baseSymbol = null;
        int  count      = 0;

        for (int col = 0; col < payline.Length; col++)
        {
            int row    = payline[col];
            int symbol = grid[col, row];

            if (symbol == _config.ScatterSymbolId) break;

            if (baseSymbol is null)
            {
                if (symbol == _config.WildSymbolId)
                    { count++; continue; }      // Leading Wild — count but don't set base
                baseSymbol = symbol;
                count++;
            }
            else if (symbol == baseSymbol || symbol == _config.WildSymbolId)
            {
                count++;
            }
            else
            {
                break;  // Chain broken
            }
        }

        // If only Wilds were seen: Wild-only combo
        if (baseSymbol is null && count > 0)
            baseSymbol = _config.WildSymbolId;

        return (baseSymbol ?? 0, count);
    }
}

The architecture principle: WinCalculator must be the same class, with the same logic, compiled from the same source file, used in every context. If you have a "PAR Sheet calculator" class and a "game engine" class that both implement payline evaluation independently, they will eventually diverge. One code path, used everywhere.

7.3 The Version Pinning Discipline

Every code change that affects game mathematics requires:

1. Update PAR Sheet analytically with new values
2. Re-run 10M simulation to verify new PAR Sheet
3. Update GDD to document the change
4. Submit changed components to certification lab
5. Lab re-tests affected areas
6. New certificate issued (or certificate addendum for minor changes)

Only after step 6 does the new code go to production.

Any code that affects: reel strips, pay table values, Wild rules, Scatter rules, Free Spins reel configuration, multiplier values, or any mathematical mechanic — is a certifiable change that goes through this process.

Code that does NOT require re-certification: UI changes, sound changes, performance optimisations that provably do not affect game outcome, infrastructure changes that are isolated from game logic.

When in doubt: ask the lab. "Is this change a certifiable change?" is a legitimate question to send to your lab account manager. They would rather answer that question than discover the uncertified change during the next periodic audit.

Part VIII. Post-Certification Operations

8.1 Ongoing Compliance Obligations

Certification is not a one-time event. Ongoing obligations include:

Periodic Re-testing: GLI and BMM typically require annual re-certification of the RNG. Some jurisdictions require more frequent testing (quarterly). The re-test is usually faster than the initial certification because the baseline is established — but it must still be planned for.

Change Management: Any change to certified components requires either a full re-certification or a "change approval" process (sometimes called a "variant" or "minor change" submission). Most labs offer expedited change approval for small changes, but "small" must be formally agreed in advance.

Incident Reporting: If a bug is discovered post-certification that affects payout correctness — even if it was never triggered in production — it must be reported to the relevant regulators within the jurisdiction's required timeframe (typically 24–72 hours for player-impacting bugs).

RTP Monitoring: Most operators implement RTP monitoring dashboards. If the live game's RTP deviates significantly from certified values, the operator is required to investigate. As the developer, you should provide them with the tools to do this.

/// <summary>
/// Live RTP monitoring: computes rolling RTP and alerts when deviation
/// exceeds threshold. Operators run this continuously in production.
/// Developers provide it as part of the game package.
/// </summary>
public class RtpMonitor
{
    private readonly decimal _certifiedRtp;
    private readonly decimal _alertThreshold;  // e.g., 0.005 = 0.5%

    private long    _totalSpins;
    private decimal _totalBet;
    private decimal _totalWin;

    public void RecordSpin(decimal bet, decimal win)
    {
        Interlocked.Increment(ref _totalSpins);
        // Thread-safe decimal arithmetic
        lock (this)
        {
            _totalBet += bet;
            _totalWin += win;
        }
    }

    public RtpMonitorSnapshot GetSnapshot()
    {
        decimal observedRtp = _totalBet > 0 ? _totalWin / _totalBet : 0;
        decimal delta       = observedRtp - _certifiedRtp;
        bool    alert       = Math.Abs(delta) > _alertThreshold
                              && _totalSpins >= 10_000;  // Only alert after sufficient sample

        // Statistical confidence interval
        double se = _totalSpins > 0
            ? 15.0 / Math.Sqrt((double)_totalSpins)   // approximate σ/√n
            : 1.0;

        return new RtpMonitorSnapshot(
            TotalSpins:    _totalSpins,
            ObservedRtp:   observedRtp,
            CertifiedRtp:  _certifiedRtp,
            Delta:         delta,
            StdError:      (decimal)se,
            AlertActive:   alert,
            Significance:  delta != 0 ? Math.Abs((double)delta / se) : 0
        );
    }
}

public record RtpMonitorSnapshot(
    long    TotalSpins,
    decimal ObservedRtp,
    decimal CertifiedRtp,
    decimal Delta,
    decimal StdError,
    bool    AlertActive,
    double  Significance)
{
    public void Print()
    {
        string status = AlertActive ? "⚠ ALERT" : "✓ Normal";
        Console.WriteLine($"[{status}] RTP Monitor — {TotalSpins:N0} spins");
        Console.WriteLine($"  Observed: {ObservedRtp * 100:F3}%");
        Console.WriteLine($"  Certified: {CertifiedRtp * 100:F3}%");
        Console.WriteLine($"  Delta: {Delta * 100:+F3;-F3}%  ({Significance:F1}σ)");
    }
}

8.2 The Difference Between GLI, BMM, and iTech in Practice

While all three labs certify against similar standards, practitioners report differences in approach and emphasis:

GLI:

Most prescriptive — the GLI-11 and GLI-16 standards are detailed, specific, and publicly available

Tends to flag documentation gaps more aggressively than implementation bugs

Strong focus on audit log completeness

Fastest turnaround for straightforward submissions (large lab, dedicated teams)

The "safest" choice for a first-time certification due to clear standards

BMM:

More emphasis on source code review than GLI

Known for deep technical findings — they find bugs that other labs miss

US state commission work often requires BMM specifically

Slower turnaround but findings tend to be more precise and actionable

Recommended when the target market includes US states

iTech Labs:

Strong in UK and EU markets — close relationship with UKGC and MGA

More likely to flag responsible gambling-related issues (auto-play behaviour, session limits)

Thorough Free Spins and bonus round testing — known for retrigger edge cases

eCOGRA certification often requires an iTech base certificate

Good choice for games targeting UK/Malta/Curaçao markets

The practical advice: check which lab your target operators are contracted with and use the same lab. Operators have ongoing lab relationships, and a certificate from the "right" lab for their jurisdiction can accelerate commercial negotiations significantly.

Summary

The gap between "mathematically correct" and "certifiable" is bridged by documentation fidelity, implementation transparency, and audit completeness. A game can have perfect RTP, a correct RNG, and a beautiful user interface — and still fail certification because the documentation doesn't precisely describe an edge case, or the audit log is missing a field, or the seeding strategy includes an observable component.

The key principles from this article:

The GDD is a legal document. Every statement in it will be tested. Every omission will be treated as an unspecified case that may have an unexpected implementation. Write it like a contract, not a design document.

The PAR Sheet must be independently reproducible. The lab will build their own version from your documentation. If it doesn't match yours, there is an error somewhere. Find it before they do.

Source code and documentation must be in sync. The WinCalculator used in production must be the same class used in the PAR Sheet calculator, the simulation tool, and the audit reconstruction tool. One code path, everywhere.

Audit logs must enable complete spin reconstruction. Every field needed to recompute any spin outcome must be stored in tamper-evident form. Run the reconstruction test yourself before submitting — 100% pass rate, no exceptions.

Certification is not one-time. Any mathematical change requires re-certification. Build the discipline of change management into your development process from day one.

Choose your lab strategically. The right lab for your target market can make the commercial difference. GLI for global reach, BMM for US state markets, iTech for UK/EU-focused titles.

Certified Math vs Game Math — How Regulators Test Your Slot's Mathematics

Introduction

Part I. The Certification Landscape

1.1 Why Certification Exists

1.2 The Major Laboratories

1.3 Standards Hierarchy

Part II. The Submission Package

2.1 What You Submit

2.2 The Game Description Document: The Most Underestimated Deliverable

2.3 Version Control and Hash Verification

Part III. The Mathematical Review Process

3.1 Phase 1: PAR Sheet Independent Verification

3.2 Phase 2: Source Code Review (Mathematical Logic)

3.3 Phase 3: Empirical RTP Testing

3.4 Phase 4: Combination Frequency Verification

3.5 Phase 5: Boundary and Edge Case Testing

Part IV. The RNG Audit

4.1 What the RNG Audit Covers

4.2 The Statistical Test Battery

4.3 The Independence Audit

Part V. The Audit Log Review

5.1 Why Audit Logs Are as Important as the Math

5.2 Minimum Required Fields Per Spin

5.3 The Reconstruction Test

5.4 Tamper Evidence

Part VI. The Findings Report and How to Respond

6.1 Understanding Findings Classifications

6.2 The Findings Response Process

6.3 Building a Pre-Submission Finding Checklist

Part VII. Certified Math vs Game Math — The Core Distinction

7.1 Why They Can Diverge

7.2 How to Keep Them in Sync

7.3 The Version Pinning Discipline

Part VIII. Post-Certification Operations

8.1 Ongoing Compliance Obligations

8.2 The Difference Between GLI, BMM, and iTech in Practice

Summary