Features Overview
π Environment Features
Multi-Agent Simulation
- Boarding agents πΆββοΈ start in the platform area and navigate to the tram door
- Exiting agents πΆββοΈ start inside the tram and navigate to the exit
- Dynamic agent management with configurable agent counts
- Individual agent tracking with unique identifiers
Basic Collision Handling
- π‘οΈ Same-cell prevention only agents are not allowed to occupy the same grid cell
- No yielding/coordination agents do not respect each other explicitly; they can bump and block
Configurable Geometry
- ποΈ Customizable tram size adjustable width, length, and position
- Flexible door positioning configurable door location and width
- Environment scaling variable grid dimensions
- Division line customization tram/waiting area boundary
Ray RLlib Compatibility
- π MultiAgentEnv API compatible with Ray RLlib interfaces used in examples/tests
- Standard gym interface follows OpenAI Gym conventions
- Action space support discrete action spaces for all agents
- Observation space structured observations for each agent
Rendering Modes
- π¨ RGB visualization grid-based rendering for images
- ASCII rendering text-based visualization for terminals
- Simple coloring different colors for different agent types
- Step-by-step updates render after environment steps
π Reward System Features
Flexible Reward Strategies
- Default, simple distance, binary, and custom strategies
- Configured via
RewardConfigclasses; parameters depend on the chosen strategy
βΉοΈ Termination System Features
Configurable Termination Conditions
- All at destination, individual at destination, and custom policies
- Set via
TerminatedConfigclasses; keep it simple or extend as needed
β±οΈ Truncation System Features
Flexible Truncation Policies
- Max steps and custom truncation policies
- Controlled via
TruncatedConfigclasses
ποΈ Observation System Features
Configurable Observation Functions
- Agent positions each agent observes its own and other agents' positions
- Tram door information door boundaries and division line
- Environment geometry grid dimensions and tram parameters
- Gym-style spaces observation spaces provided per-agent
- Custom strategies can be implemented via new observation configs/functions
βοΈ Configuration Features
Type-Safe Configuration
- π Pydantic v2 integration runtime validation of configuration data
- Automatic validation errors raised at model construction time
- Immutable configurations frozen after creation
- IDE support full autocomplete and type hints
Comprehensive Validation
- Tram parameter validation ensures logical tram dimensions
- Boundary checking validates all coordinates within grid
- Agent count limits reasonable limits for performance
- Render mode validation ensures valid rendering options
- Parameter validation validates reward, termination, truncation, and observation parameters
Clear Error Messages
- π¬ Descriptive validation failures helpful error messages
- Context-aware errors specific to the validation failure
- Debugging support detailed error information
- User-friendly messages easy to understand and fix
Flexible Configuration
- Default values sensible defaults for common use cases
- Optional parameters only specify what you need
- Configuration inheritance extend existing configurations
- Environment-specific configs different configs for different scenarios
- Modular configuration separate configs for rewards, termination, truncation, observations
RLlib Integration
- See the RLlib compatibility guide: RLlib MultiAgentEnv Compatibility
ποΈ Architecture Features
Modular Design
- π§© Separated concerns distinct modules for different functionality
- Clean interfaces well-defined public APIs
- Loose coupling minimal dependencies between modules
- Extensible design easy to add new features
Private Encapsulation
- π Proper encapsulation private members where appropriate
- Public properties clean external interfaces
- Internal state management controlled access to internal data
- API stability stable public interfaces
Environment Extensions
- π Extensible configuration system modify environment behavior
- Custom functions implement custom reward, termination, truncation, and observation logic
Performance Considerations
- Reasonable execution speed suitable for small to medium grid sizes
- Straightforward Python clarity prioritized over micro-optimizations
- No heavy vectorization simple loops over agents and grid
π― Key Capabilities
Training Support
- Episode management proper episode termination and truncation
- Step counting track episode progress
- Seed management reproducible environments
- Flexible systems multiple reward, termination, truncation, and observation strategies
System Architecture
All systems (Reward, Termination, Truncation, Observation) feature: - Type-safe configurations π Pydantic-based configs - Automatic validation β parameter validation and bounds checking - Extensible design π§ easy to add new strategies - Performance optimized β‘ efficient computation and checking