machine-learing
resnet
residual block(skip-connection)
transformer
attention
self attention
transformer encoder
cross attention
transformer decoder
causal attention
CLIP
class Clip(nn.Module):
def __init__(self,motion_dim=75,music_dim=438,feature_dim=256):
super(Clip, self).__init__()
self.motion_encoder = MotionEncoder(input_channels=motion_dim,feature_dim=feature_dim)
self.music_encoder = MusicEncoder(input_channels=music_dim,feature_dim=feature_dim)
self.motion_project = nn.Linear(feature_dim, feature_dim)
self.music_project = nn.Linear(feature_dim, feature_dim)
self.temperature = nn.Parameter(torch.tensor(1.0))
self.criterion = nn.CrossEntropyLoss()
def forward(self, motion:Tensor, music:Tensor):
assert motion.shape[1] == music.shape[1]
b,s,c= motion.shape
motion_features = self.motion_encoder(motion)
music_features = self.music_encoder(music)
motion_features =F.normalize( self.motion_project(motion_features),p=2,dim=-1)
music_features = F.normalize( self.music_project(music_features),p=2,dim=-1)
# relation=(motion_features@music_features.T)*(1.0 / math.sqrt(c))
# batch matrix multiplication and .mT is batch transpose matrix
logits=torch.bmm(motion_features,music_features.mT)*self.temperature
labels=torch.arange(s).repeat(b,1).to(motion.device)
loss_motion = self.criterion(logits, labels)
loss_music = self.criterion(logits.mT, labels)
loss=(loss_motion+loss_music)/2
return (motion_features,music_features),(loss,loss_motion,loss_music)
CAN(GAN base)
AE

VAE

reparameterization trick
Assuming the distribution of the output is a Gaussian distribution, the model only predicts the mean () and std (). We then sample the latent variable from this Gaussian distribution. The sample latent distribution parameters should match the true distribution, which is enforced using the KL divergence.
VAE loss
Evidence Lower Bound (ELBO)
We want replace with since we dont have GT of
base on difference condition KL divergence can simplify to difference term
- Variational Inference
- Importance Sampling to ELBO
- Variational EM
refs
VQVAE(d-vae)
Variational Autoencoder (Kingma & Welling, 2014)
quantise bottleneck

- random init centroids
- find the nearest centroids of each unquantise vector
- if quantise have low usage then random init a new centroids
- calculate average center of unquantise vector
- Exponential Moving Average Update between new centroid and old centroid
loss
- VQ loss: The L2 error between the embedding space and the encoder outputs.
- Commitment loss: A measure to encourage the encoder output to stay close to the embedding space and to prevent it from fluctuating too frequently from one code vector to another.
- where is the
stop_gradientoperator.
CVAE
reinforce learning
actor-critic
origin
- : state at time t
- : value function predict reward with
- : action function return action probability base on
- top1: select the action with max probability
- : reward function input a sequence of action out float
- advantages value: if the can get more reward then positive value else negative value
with Temporal Difference error(TD-error)
- Hope can predict reward that may get in future with proportion
- note: you should add
stop gradientto (aka detach in pytorch)