Hello,
While using the BBC training and evaluation policies in this project, I encountered an issue related to the continuous style latent variable latent_eps. I would like to ask how it is expected to work and whether I may have misunderstood or used it incorrectly.
Background
I mainly referred to the bbc module in the repository. Based on my understanding of the code, the policy input includes:
latent_c: a discrete gait category, such as walk / pace / trot / canter / jump
latent_eps: a continuous style latent variable in the range [-1, 1]
During training, latent_eps is randomly sampled:
self.latent_eps[env_ids, :] = torch.tensor(
np.random.rand(len(env_ids), 1) * 2. - 1.,
dtype=torch.float32,
device=self.device
)
At the same time, the discriminator contains an encoder_eps, which predicts latent_eps from motion clips:
eps = self.encoder_eps(x)
reward_us = -self.L1Loss(eps, label_eps)
The training loss also includes:
us_loss = self.L1Loss(eps, policy_latent_eps)
loss = ... + self.us_coef * us_loss + ...
Therefore, my understanding is that latent_eps should control continuous style variations within the same gait category, such as stride length, step frequency, body posture amplitude, or the degree of bouncing.
Issue
After training a policy with bbc, I manually adjusted latent_eps in play.py, for example from -1 to +1. However, the resulting changes in the robot's motion were not visually obvious.
I also tested the provided policy:
but the style variation was still not very noticeable.
More specifically:
- Switching
latent_c produces clear changes in gait category.
- Adjusting only
latent_eps results in relatively weak style changes within the same gait.
- It is difficult to observe stable and interpretable differences between
latent_eps = -1, latent_eps = 0, and latent_eps = 1.
I will appreciate for your help !
Hello,
While using the BBC training and evaluation policies in this project, I encountered an issue related to the continuous style latent variable
latent_eps. I would like to ask how it is expected to work and whether I may have misunderstood or used it incorrectly.Background
I mainly referred to the
bbcmodule in the repository. Based on my understanding of the code, the policy input includes:latent_c: a discrete gait category, such as walk / pace / trot / canter / jumplatent_eps: a continuous style latent variable in the range[-1, 1]During training,
latent_epsis randomly sampled:At the same time, the discriminator contains an
encoder_eps, which predictslatent_epsfrom motion clips:The training loss also includes:
Therefore, my understanding is that
latent_epsshould control continuous style variations within the same gait category, such as stride length, step frequency, body posture amplitude, or the degree of bouncing.Issue
After training a policy with
bbc, I manually adjustedlatent_epsinplay.py, for example from-1to+1. However, the resulting changes in the robot's motion were not visually obvious.I also tested the provided policy:
but the style variation was still not very noticeable.
More specifically:
latent_cproduces clear changes in gait category.latent_epsresults in relatively weak style changes within the same gait.latent_eps = -1,latent_eps = 0, andlatent_eps = 1.I will appreciate for your help !