Drones have found extensive use in military applications owing to their inconspicuousness and easy deployment. However, major challenges remain: maintaining seamless connectivity with the ground station to ensure command and control (C2) link continuity, and avoiding RF jammers that can saturate receivers and inhibit communication.
The training process can be completely accomplished pre-flight. The drone only needs to carry a lookup table (Q-table) during flight, dramatically reducing onboard computing requirements while maintaining sophisticated decision-making capabilities.
The Challenge: Dual Threats to UAV Operations
Modern drone operations face two critical challenges that must be addressed simultaneously for mission success.
Maintaining Wireless Connectivity
Maintaining a good quality C2 link over the drone route is critical for sensitive missions and may take priority over shortest distance. For surveillance missions, continuous streaming capability requires the drone trajectory to always lie in regions of good coverage.
Avoiding RF Jammers
Drones face constant threat from jamming—focused high-power antenna beams that saturate the front-end receiver, making it impossible to communicate with ground stations or GPS satellites. Jammer locations are often unknown a priori, requiring adaptive response.
Why Traditional Approaches Fall Short
Classical anti-jamming techniques include adaptive filters that attenuate jammer signals or Frequency Hopping Spread Spectrum (FHSS) where frequency dynamically varies to avoid corruption. However, such hardware implementations may not conform to the Size, Weight, Power, and Cost (SWaP-C) constraints posed by drone form factors.
The fundamental problem is the dynamicity of the wireless channel. Wireless channels are impacted by geography, weather, user traffic, and more—making it impossible to build standard channel models that compute optimal routes a priori. This challenge is further compounded by jammers whose locations might be unknown.
The PhoenixAI Solution: Reinforcement Learning
PhoenixAI proposes a reinforcement learning approach that allows drones to fly in regions of good connectivity while avoiding multiple jammers on their route. The algorithm deviates minimally from pre-planned paths in the vicinity of jammers—critical for sensitive missions.
Why Reinforcement Learning?
Model-Free Learning
RL doesn't require apriori knowledge about the environment, making it ideal for highly dynamic and unpredictable wireless environments.
Adaptive Intelligence
The drone learns key characteristics of the wireless channel, enabling it to fly in regions of good coverage while avoiding threats.
Pre-Flight Training
Training occurs on the ground where computation and power are less constrained. The drone carries only a lookup table during flight.
Minimal SWaP-C
Computational requirements for deployed algorithm are minimal and can run even on drones with modest computational capabilities.
Technical Approach: Wireless Link Modeling
To generate data for training and evaluating the RL algorithm, PhoenixAI developed a representative RF simulator that captures fundamental trends in signal power at drone flight altitudes.
Wireless Heatmap Generation
The link between drone and base station is modeled using the well-known Friis equation, which accounts for transmit power, antenna gains, wavelength, and distance. For the base station antenna, a 15-element linear array with 0.5λ spacing operating at 2.0 GHz provides peak directivity of 18 dBi with a 7-degree down-tilt.
Key Insight: Sidelobe Operations
A drone flying in the sky never sees the main beam of the base station antenna—only the sidelobes, which are at least 13 dB lower than the main beam. This makes drones extremely sensitive to high-power jammer signals that can shadow cellular signals.
For each drone position, the simulator evaluates power received from each sector of each base station. The maximum power becomes the received signal, with powers from all other base stations and sectors treated as interference. The Signal to Interference and Noise Ratio (SINR) is then calculated, accounting for receiver noise.
Jammer Modeling
Jammers are modeled as highly directive antennas pointed directly upward toward the drone. A 10×10 planar array with 0.5λ spacing generates a main beam with 25 dBi directivity and 5-degree beamwidth in both principal planes.
Reinforcement Learning Implementation
PhoenixAI's algorithm utilizes Q-based learning, a model-free RL technique. During training, an estimate Q of the optimal state-action function is generated as the drone explores the environment. The Q-function maps state-action pairs to the value of taking action a in state s.
Q-Learning Fundamentals
The training process consists of iterative updates to the Q-learning equation, derived from the Bellman optimality condition. At each iteration, if the agent is in state s and takes action a, causing state change to s', the state-action estimate is updated based on the immediate reward and discounted future rewards.
After training, a policy that selects an action given a state is defined by choosing the action that maximizes the Q-function for that state. The output of training is a Q-table that serves as a lookup table for executing the policy on the drone during flight.
Constrained Q-Learning Innovation
Smart Action Space Restriction
Unlike classic Q-learning with fixed action sets, PhoenixAI employs constrained Q-learning that restricts the action space at any iteration to only actions that won't increase distance from destination and respect geofencing. This leads to superior policies and ensures the drone will never fail to arrive due to poorly chosen actions.
Epsilon-Greedy Exploration
During training, actions balance exploration and exploitation. With probability 1-ε, the action maximizes the Q function; otherwise, a random action is chosen. Starting with ε near 1 for initial exploration, the value decreases as training progresses to favor convergence.
Intelligent State Space Design
The selection of state space, reward, and action space is the critical design decision. An intuitive approach might define state by position with SINR as reward—but this creates a fundamental problem.
The Generalization Challenge
If the state is simply position and SINR, the policy essentially tells the drone what to do at certain locations. If evaluation SINR differs from training, the drone takes actions optimized for the training environment—not necessarily optimal for deployment.
PhoenixAI's Enhanced State Space
To enable generalization, PhoenixAI's state space includes four parameters:
- CQI in current position: Channel Quality Indicator (quantized SINR) at current location
- CQI in previous position: Channel quality at the previous iteration
- Previous action: Action taken at the last iteration
- Distance to nearest base station: Proximity to infrastructure
This state space allows the drone to learn trends of antenna patterns and associated signal quality, making the algorithm applicable to a wide variety of environments. The use of CQI (quantized SINR) makes the approach more amenable to Q-learning by reducing dynamic range.
Performance Results: Dynamic Path Optimization
The algorithm was evaluated in a hypothetical urban environment with nine cellular base stations placed uniformly in a 5km grid, each with three sectorized antennas serving 120-degree sectors. A drone was trained to fly from bottom-left to upper-right corner at 150m altitude over 40,000 episodes.
Baseline Performance: Same Environment
When evaluated in the same environment used for training, the RL algorithm strives to keep drones in regions of good connectivity. The RL path achieved 0.2 dB better average SINR than the straight-line path. While modest, this demonstrates the algorithm's ability to optimize connectivity.
Dynamic Environment Performance
The true power of the RL algorithm emerges in dynamic environments. When the model trained on the original configuration was evaluated in a scenario where the center base station was removed (creating a low-coverage region), the results were dramatic:
Jammer Avoidance: Adaptive Response
To demonstrate jammer avoidance capabilities, three jammers (A, B, C) were introduced directly in the optimized path originally predicted by the RL algorithm. The jammers appear as deep blue spots in the SINR heatmap due to dramatically increased interference.
Intelligent Evasion Behavior
Initial Flight Path
The drone initially follows exactly the same path as if no jammer were present, making steady progress toward the destination.
Jammer Detection & Avoidance
As the drone senses SINR dropping near Jammer A, it immediately changes direction to avoid the main beam, flying through sidelobes instead—providing much lower interference.
Multiple Threat Navigation
The path change from Jammer A plus RL optimization enables complete avoidance of Jammer B. The algorithm ensures the drone only flies over regions of good SINR.
Path Convergence
Once past Jammer B, the drone almost immediately returns to the originally planned path, continuing until reaching Jammer C where it performs similar avoidance maneuvers.
Complex Scenario: Poor Coverage + Jammers
To fully demonstrate applicability to dynamically changing wireless environments, the same trained model was evaluated in an environment where the center base station was removed AND jammers were present.
Results showed the drone followed a path identical to the poor-coverage scenario until reaching jammer locations. Upon encountering jammers, it performed localized avoidance maneuvers while maintaining the overall strategy of avoiding poor coverage zones.
Comparison with Straight Path
The straight path in this scenario would avoid the jammer but fly directly through the poor coverage zone, leading to connectivity drops. The RL optimized path simultaneously optimizes wireless connectivity while avoiding jammers—a capability no simple heuristic can match.
Key Advantages and Strategic Value
- Dual Optimization: Simultaneously optimizes wireless connectivity and avoids jammers without requiring apriori knowledge of jammer locations.
- Environment Generalization: Intelligent state space design enables the trained model to perform well in environments different from training scenarios.
- Minimal Path Deviation: Jammer avoidance is highly localized—the drone quickly returns to planned paths after threat avoidance.
- Pre-Flight Training: All training occurs on the ground with unconstrained computation. Flight execution requires only a lookup table.
- SWaP-C Compliant: Minimal computational requirements during flight enable deployment on drones with modest processing capabilities.
- Mission-Appropriate: Suitable for highly sensitive operations where maintaining planned routes and continuous connectivity are critical.
- Adaptive Learning: The drone learns wireless channel characteristics and antenna patterns, enabling intelligent decisions in new environments.
Implementation Details
Training Configuration
| Parameter | Value |
|---|---|
| Training Platform | Apple MacBook Air with M1 CPU |
| Training Episodes | 40,000 episodes |
| Environment | 9 base stations in 5km grid |
| Drone Altitude | 150 meters |
| Bandwidth | 1 MHz |
| Base Station Frequency | 2.0 GHz |
| Antenna Array | 15-element linear array, 0.5λ spacing |
| Peak Directivity | 18 dBi with 7° down-tilt |
| Jammer Configuration | 10×10 planar array, 25 dBi, 5° beamwidth |
Future Directions
While the current results clearly illustrate the potential of PhoenixAI's algorithm, several areas remain for future exploration:
- Comprehensive Environmental Testing: Evaluation across diverse environments with varying jamming powers, antenna patterns, and jammer locations
- Multi-Drone Coordination: Extending the approach to coordinate multiple drones with shared RL models and distributed decision-making
- Real-World Validation: Testing with actual flight hardware in operational environments to validate simulation predictions
- Advanced Jammer Types: Addressing more sophisticated jamming techniques including directional sweeping, frequency hopping, and coordinated multi-jammer attacks
- Integration with Mission Planning: Incorporating RL trajectory optimization into broader mission planning frameworks with multiple objectives
Conclusion
Drone jamming poses a serious threat to strategic missions. Concurrently, many drone applications require continuous connectivity for data streaming and C2 link maintenance. PhoenixAI's RL algorithm equips drones with intelligence to avoid jammers while simultaneously flying only in regions of good coverage.
Critically, this RL model allows drones to be trained completely before flight. The result is a lookup table easily loaded into the drone, adhering to SWaP-C constraints. This represents a significant advancement in autonomous UAV operations, providing military and civilian operators with intelligent, adaptive trajectory optimization that maintains mission effectiveness even in contested electromagnetic environments.
Advancing UAV Autonomy Through AI
PhoenixAI continues to push the boundaries of reinforcement learning for autonomous systems. Our trajectory optimization algorithm demonstrates how AI can solve complex, multi-objective problems in dynamic environments—enabling safer, more effective drone operations in contested scenarios.