Cuphead is a run-and-gun platformer known for its intense boss battles and stunning hand-drawn animation. Players take control of Cuphead in order to defeat many enemies, utilizing precise movement and fast-paced combat. The game’s challenge is amplified by intricate platforming and an overwhelming barrage of projectiles, demanding high precision and skill, even from a human player. With its iconic cartoon art style, Cuphead is instantly recognizable and visually captivating, which we thought would make it well-suited for an object detection model. From the start, we knew we wanted to take on a challenging video game, and Cuphead was the ideal choice.
Our project focuses on developing and efficiently training an AI agent capable of defeating a Cuphead boss(The Root Pack) using deep reinforcement learning. Our method utilizes a two-stage machine learning approach: first, a computer vision component using YOLO (You Only Look Once) for real-time object detection and game state understanding, which is then followed by a deep Q-learning network (DQN) for action decision making. We would have to manually start the game and load up the level, but the agent would take over from there. The agent processes raw gameplay images to identify critical elements such as the player character, boss, projectiles, and health indicators, then uses this information to make optimal decisions for movement and dodging. After 2,650 training runs, our agent successfully defeated the boss with 2 HP remaining.
The challenge of defeating a Cuphead boss is non-trivial for several reasons:
To address these challenges, we implemented a two-stage machine learning approach:
Our system required no modification to the game, operating solely through screen capture and simulated keyboard inputs, making it applicable to other similar games without code access.
As a baseline, we implemented a random agent that selected actions uniformly from the available action space:
This random agent performed poorly, typically depleting less than 25% of the boss’s health before losing all player health. It had no strategy for dodging projectiles or positioning for attacks, leading to quick defeats. The random agent served as a baseline to measure the improvement of our learning-based approaches.
For the first stage of our approach, we used object detection to interpret the game state:
def _process_detections(self, results):
state = {
'player': None,
'enemies': [],
'projectiles': [],
'boss': None
}
# Process detections and identify phase transitions
for result in results:
for box, cls_id in zip(result.boxes.xyxy, result.boxes.cls):
class_name = result.names[int(cls_id)]
x_center = (box[0] + box[2]) / 2
y_center = (box[1] + box[3]) / 2
# Detect phase transitions
if class_name == 'onion_boss':
self.second_phase_reached = True
# Detect carrot for third phase
if class_name == 'carrot':
self.carrot_detected = True
# Map detections to state representation
if class_name == 'player':
state['player'] = (x_center, y_center)
elif class_name in self.hps:
new_health = int(class_name[2])
self.current_health = new_health
# ... additional detection processing
return self._vectorize_state(state)
Produced a compact state representation for the reinforcement learning agent:
\[s_t = \begin{bmatrix} \frac{x_{player}}{w_{screen}} & \frac{y_{player}}{h_{screen}} & \frac{x_{nearest\_enemy} - x_{player}}{w_{screen}} & \frac{y_{nearest\_enemy} - y_{player}}{h_{screen}} \end{bmatrix}\]For the reinforcement learning model, we decided to go with a Deep Q-Network (DQN) because it best for discrete action spaces, and we wanted to use an off-policy to decide our actions, since it works best with slow, non-parallel training simulations.
Input Layer (4 neurons) → Dense(128) + ReLU → Dense(64) + ReLU → Output Layer (4 actions)
Optimized using the Q-learning objective:
\[L(\theta) = \mathbb{E}_{(s,a,r,s')\sim D} \left[(r + \gamma \max_{a'} Q(s', a'; \theta^-) - Q(s,a;\theta))^2\right]\]where:
Adaptive Reward Structure: Our reward function evolved throughout development, with the final version incorporating:
def get_reward(self):
reward = 0
# Base survival reward
reward += 0.08
# Health change penalties
if self.current_health < self.last_health:
reward -= (self.last_health - self.current_health) * 10
# Position-based rewards and penalties
if self.second_phase_reached:
# Phase-specific rewards
if self.last_action == 2: # Penalize jumping in later phases
reward -= 1
# Third phase specific rewards
if self.carrot_detected:
# Reward horizontal movement patterns
if self.consecutive_moves >= 3 and self.consecutive_moves <= 15:
reward += 0.01 * self.consecutive_moves
# Reward for staying in middle of screen
if 0.2 <= self.current_state[0] <= 0.8:
reward += 0.1
# Larger edge penalty for third phase
if self.current_state[0] < 0.05 or self.current_state[0] > 0.95:
reward -= 0.1
return reward

Deep Q-Learning Advantages:
Deep Q-Learning Disadvantages:
Computer Vision Advantages:
Computer Vision Disadvantages:
Our DQN agent showed significant improvement over the baseline random agent. After 2,650 training episodes, the agent successfully:


Through qualitative analysis of the agent’s gameplay, we observed several emergent behaviors:
We identified several common failure scenarios that represent areas for improvement:
We compared our agent’s performance to novice and experienced human players:
| Metric | AI Agent | Novice Player | Experienced Player |
|---|---|---|---|
| Boss Health Depleted | ~60% | ~40% | 100% |
| Average Survival Time | 170s | 90s | 240s |
| Success Rate (Phase 2) | 60% | 40% | 100% |
| Success Rate (Phase 3) | 10% | 5% | 90% |
While our agent outperformed novice players, it still fell short of experienced human performance, indicating room for further improvement.
Mnih, V., Kavukcuoglu, K., Silver, D., et al. (2013). “Playing Atari with Deep Reinforcement Learning.” arXiv preprint arXiv:1312.5602.
Mnih, V., Kavukcuoglu, K., Silver, D., et al. (2015). “Human-level control through deep reinforcement learning.” Nature, 518(7540), 529-533.
Jocher, G., et al. (2023). Ultralytics YOLO (Version 8.0.0). https://github.com/ultralytics/ultralytics
StudioMDHR. (2017). Cuphead [Video game]. StudioMDHR.
Stable Baselines3 Documentation. https://stable-baselines3.readthedocs.io/
PyTorch Documentation. https://pytorch.org/docs/stable/index.html
Throughout this project, we utilized several AI tools to assist with development and documentation:
None of the core algorithms or project concepts were directly generated by AI tools. Rather, these tools served to accelerate implementation, assist with technical challenges, and help articulate our approach in documentation.