machine-learing

resnet

residual block(skip-connection)

transformer

attention

self attention

transformer encoder

cross attention

transformer decoder

causal attention

CLIP

class Clip(nn.Module):
    def __init__(self,motion_dim=75,music_dim=438,feature_dim=256):
        super(Clip, self).__init__()

        self.motion_encoder = MotionEncoder(input_channels=motion_dim,feature_dim=feature_dim)
        self.music_encoder = MusicEncoder(input_channels=music_dim,feature_dim=feature_dim)

        self.motion_project = nn.Linear(feature_dim, feature_dim)
        self.music_project = nn.Linear(feature_dim, feature_dim)

        self.temperature = nn.Parameter(torch.tensor(1.0))

        self.criterion = nn.CrossEntropyLoss()

    def forward(self, motion:Tensor, music:Tensor):
        assert motion.shape[1] == music.shape[1]

        b,s,c= motion.shape

        motion_features = self.motion_encoder(motion)
        music_features = self.music_encoder(music)

        motion_features =F.normalize( self.motion_project(motion_features),p=2,dim=-1)
        music_features = F.normalize( self.music_project(music_features),p=2,dim=-1)


        # relation=(motion_features@music_features.T)*(1.0 / math.sqrt(c))

        # batch matrix multiplication and .mT is batch transpose matrix
        logits=torch.bmm(motion_features,music_features.mT)*self.temperature

        labels=torch.arange(s).repeat(b,1).to(motion.device)

        loss_motion = self.criterion(logits, labels)
        loss_music = self.criterion(logits.mT, labels)

        loss=(loss_motion+loss_music)/2

        return (motion_features,music_features),(loss,loss_motion,loss_music)

CAN(GAN base)

AE

AE from https://lilianweng.github.io/posts/2018-08-12-vae/

VAE

VAE from https://lilianweng.github.io/posts/2018-08-12-vae/

reparameterization trick

Assuming the distribution of the output is a Gaussian distribution, the model only predicts the mean () and std (). We then sample the latent variable from this Gaussian distribution. The sample latent distribution parameters should match the true distribution, which is enforced using the KL divergence.

VAE loss

Evidence Lower Bound (ELBO)

We want replace with since we dont have GT of

base on difference condition KL divergence can simplify to difference term

  • Variational Inference
  • Importance Sampling to ELBO
  • Variational EM

refs

VQVAE(d-vae)

Variational Autoencoder (Kingma & Welling, 2014)

quantise bottleneck

  • random init centroids
  • find the nearest centroids of each unquantise vector
    • if quantise have low usage then random init a new centroids
  • calculate average center of unquantise vector
  • Exponential Moving Average Update between new centroid and old centroid

loss

  • VQ loss: The L2 error between the embedding space and the encoder outputs.
  • Commitment loss: A measure to encourage the encoder output to stay close to the embedding space and to prevent it from fluctuating too frequently from one code vector to another.
  • where is the stop_gradient operator.

CVAE

reinforce learning

actor-critic

origin

  • : state at time t
  • : value function predict reward with
  • : action function return action probability base on
  • top1: select the action with max probability
  • : reward function input a sequence of action out float
  • advantages value: if the can get more reward then positive value else negative value

with Temporal Difference error(TD-error)

  • Hope can predict reward that may get in future with proportion
  • note: you should add stop gradient to (aka detach in pytorch)