Skip to the content.

Q-Learning: Foundation and Visualization

This section covers the foundational Q-Learning implementation that forms the core of the AgentCrafter framework, including both basic grid-based learning and advanced visualization capabilities.

Development Journey

The Q-Learning implementation evolved through several key phases:

  1. Grid Q-Learning - Core reinforcement learning algorithms and grid-based environments
  2. Visual Q-Learning - Real-time visualization capabilities
  3. First DSL Version - Initial domain-specific language for simulation configuration
  4. Unit Tests - Comprehensive testing to validate implementation correctness

Core Implementation

Grid-Based Learning

The foundation is built on a discrete grid world environment where agents learn optimal policies through Q-Learning:

Q-Learning Diagram

Key Components:

Learning Algorithm

The Q-Learning implementation follows the classical algorithm:

\[\text{newValue} = (1-\alpha)\,\text{currentValue} + \alpha\bigl(r + \gamma\,\text{maxNextValue}\bigr)\]

Parameters:

Learning Parameters:

Environment Dynamics

The grid world provides:

Visual Learning Enhancement

Real-Time Visualization

The visual component adds comprehensive monitoring capabilities:

Q-Learning Visualization

Visualization Features:

Interactive Debugging

The visualization system enables:

DSL Foundation

The initial DSL provided basic simulation configuration:

simulation:
  grid:
    5 x 5
  wall:
    Block >> (2, 2)
  agent:
    Start >> (0, 0)
    Goal >> (4, 4)
  Episodes >> 100
  Steps >> 50
  GoalReward >> 10
  WithGUI >> true

Early DSL Features:

Testing Strategy

Unit Tests, Behavior-drive Tests, and Property-based Tests