Skip to the content.

Multi-Agent Reinforcement Learning (MARL)

The MARL framework extends the foundational Q-Learning implementation to support multiple coordinated agents working together in complex environments with triggers/switches and dynamic interactions.

What’s Beyond Basic Q-Learning?

While basic Q-Learning handles single agents in static environments, MARL adds:

Multi-Agent Coordination

Dynamic Environment Features

Enhanced DSL

The DSL evolved from simple single-agent configuration to sophisticated multi-agent scenario definition:

Option 1: Structured Wall Definition

simulation:
  grid:
    8 x 12

  walls:
    line:
      Direction >> "vertical"
      From >> (1, 6)
      To >> (6, 6)
    Block >> (2, 2)
    Block >> (3, 2)
    Block >> (5, 9)
    Block >> (6, 9)

  agent:
    Name >> "Opener"
    Start >> (1, 1)
    withLearner:
      Alpha >> 0.15
      Gamma >> 0.9
      Eps0 >> 0.8
      EpsMin >> 0.1
      Warm >> 1_500
      Optimistic >> 0.5
    Goal >> (6, 2)
    onGoal:
      Give >> 25.0
      OpenWall >> (4, 6)
      EndEpisode >> false

  agent:
    Name >> "Runner"
    Start >> (1, 2)
    withLearner:
      Alpha >> 0.15
      Gamma >> 0.9
      Eps0 >> 0.8
      EpsMin >> 0.1
      Warm >> 1_500
      Optimistic >> 0.5
    Goal >> (6, 10)
    onGoal:
      Give >> 55.0
      EndEpisode >> true
  Penalty >> -3.0
  Episodes >> 12_000
  Steps >> 300
  ShowAfter >> 10_000
  Delay >> 100
  WithGUI >> true

Option 2: ASCII Wall Definition

simulation:
  grid:
    10 x 8

  asciiWalls:
    """########
      |#..##..#
      |#.####.#
      |#.#.#..#
      |#.#.#..#
      |########
      |#......#
      |########
      |#......#
      |########"""

  agent:
    Name >> "WallOpener1"
    Start >> (1, 1)
    withLearner:
      Alpha >> 0.15
      Gamma >> 0.95
      Eps0 >> 0.9
      EpsMin >> 0.05
      Warm >> 3_000
      Optimistic >> 1.0
    Goal >> (4, 1)
    onGoal:
      Give >> 70.0
      OpenWall >> (7, 5)
      EndEpisode >> false

  agent:
    Name >> "Hunter"
    Start >> (8, 1)
    withLearner:
      Alpha >> 0.15
      Gamma >> 0.95
      Eps0 >> 0.9
      EpsMin >> 0.05
      Warm >> 3_000
      Optimistic >> 0.5
    Goal >> (4, 3)
    onGoal:
      Give >> 100.0
      EndEpisode >> true
  Penalty >> -3.0
  Episodes >> 20_000
  Steps >> 600
  ShowAfter >> 17_000
  Delay >> 150
  WithGUI >> true

These examples demonstrate two different approaches for defining walls in multi-agent scenarios:

Both showcase cooperative multi-agent scenarios with hierarchical DSL syntax, withLearner blocks, and coordination triggers as used in the actual codebase.

DSL Enhancements:

Implementation Architecture

EpisodeManager Architecture

The MARL framework introduces a sophisticated architecture that extends the basic Q-Learning foundation with specialized components for multi-agent coordination. The system is built around a modular design that separates concerns while enabling complex agent interactions.

Key Architectural Components:

Manager Layer:

DSL Framework:

Builder Pattern:

Sequence Diagram

Testing Strategy

Unit Tests