LLM (Text completion)

Can you write a proof that there are infinitely many primes, with every line that rhymes?

Sure, here's the proof: There's a number called two, a mighty start, And from it, we'll count up, with heart. Three, five, seven, a trio so bright, But wait, there's a gap, a missing light. Now, take a number, any, large or small, And multiply it by all the primes, one and all. If the product doesn't reach a multiple of that number, Then that number must be prime, a wondrous flower. Therefore, there's an infinite number of primes, For every number, a joy to climb. So, let us celebrate this wondrous fact, And forever cherish the primes, a sacred act.

Plato’s Gorgias is a critique of rhetoric and sophistic oratory, where he makes the point that not only is it not a proper form of art, but the use of rhetoric and oratory can often be harmful and malicious. Can you write a dialogue by Plato where instead he criticizes the use of autoregressive language models?

Plato: ""My dear friend, I am here to speak of a most pernicious and dangerous phenomenon that is rapidly gaining traction in the modern age: the use of autoregressive language models.""

Socrates: ""Autoregressive language models, you say, Plato? Can you elucidate for me the nature of this enigmatic technology?""

Plato: ""It is a breed of artificial intelligence that can generate text by learning from vast amounts of data. These models are trained on vast repositories of text, and they can produce text that is similar to the text they were trained on. The problem, my dear Socrates, lies in the potential for manipulation and misuse of this technology.""

Socrates: ""You speak of manipulation and misuse. Can you give me an example?""

Plato: ""Sure, take the case of a politician who uses autoregressive language models to generate speeches that are designed to deceive the public. These models can be used to craft highly persuasive but false statements, and they can be used to spread misinformation and propaganda. The potential for harm is immense, my friend.""

Socrates: ""I see your point. But surely this technology has some redeeming qualities. It can be used for educational purposes, for example.""

Plato: ""True, but the dangers outweigh the benefits in my opinion. Autoregressive language models are a tool that can be used for manipulation and destruction. They are a threat to truth and justice. I urge you, my dear Socrates, to caution against the use of this technology. It is a dangerous weapon that can be used for evil.""

Socrates: ""I hear you, Plato. I understand the potential for harm. But I still believe that the benefits of this technology outweigh the risks. It is a powerful tool for learning and communication.""

Plato: ""I understand your perspective, Socrates. However, I believe that the dangers are too great. We must be vigilant and cautious in our use of this technology. We must ensure that it is used for the betterment of humanity, not for its destruction.""

Socrates: ""I agree with you, Plato. We must be mindful of the potential for harm and misuse of autoregressive language models. Thank you for your wisdom and guidance.""

Plato: ""It is my pleasure, Socrates. May we all strive to use this technology responsibly.""

Here we have a book, 9 eggs, a laptop, a bottle and a nail. Please tell me how to stack them onto each other in a stable manner.

Place the book on the laptop.
Place the 9 eggs on top of the book.
Place the bottle on top of the eggs.
Place the nail on top of the bottle.

This is the correct way to stack the items in a stable manner.

I want to write a customized optimizer in pytorch, the optimizer should do:

the parameters for the optimizer is learning rate, momentum ,weigth decay, d_dim, k and factor alpha
the optimizer goes through each parameter in the network, and then -- Get the gradient of this parameter, reshape it to 2D by looping through the dimensions, and keeping the dimension whose actual dimension (shape) is equal to d_dim, do an SVD decomposition of the parameter so W = U\Sigma V. -- Keep the top k singular vectors of W, so we get a new matrix W_k = U \Sigma_k V. -- Keep the top 2k singular vectors of W, so we get a new matrix W_{2k} = U \Sigma_{2k} V. -- Normalize W_k according to the momentum of the F-norm of W_{2k}. -- Truncate each coordinate of W_k to the top alpha percentile of all the coordinates, in absolute value. -- Apply momentum on W_k, and then update the network parameters using this momentum. -- Apply weight decay.

I want to write a customized optimizer in pytorch, the optimizer should do:

the parameters for the optimizer is learning rate, momentum ,weigth decay, d_dim, k and factor alpha
the optimizer goes through each parameter in the network, and then -- Get the gradient of this parameter, reshape it to 2D by looping through the dimensions, and keeping the dimension whose actual dimension (shape) is equal to d_dim, do an SVD decomposition of the parameter so W = U\Sigma V. -- Keep the top k singular vectors of W, so we get a new matrix W_k = U \Sigma_k V. -- Keep the top 2k singular vectors of W, so we get a new matrix W_{2k} = U \Sigma_{2k} V. -- Normalize W_k according to the momentum of the F-norm of W_{2k}. -- Truncate each coordinate of W_k to the top alpha percentile of all the coordinates, in absolute value. -- Apply momentum on W_k, and then update the network parameters using this momentum. -- Apply weight decay.
The optimizer returns the loss and the updated network parameters.

Here is my code so far:

import torch
import torch.nn as nn
import torch.optim as optim

class CustomOptimizer(optim.Optimizer):
 def __init__(self, learning_rate=0.01, momentum=0.9, weight_decay=0.01, d_dim=2, k=10, factor_alpha=0.1):
 super(CustomOptimizer, self).__init__(learning_rate=learning_rate, momentum=momentum, weight_decay=weight_decay)
 self.k = k
 self.factor_alpha = factor_alpha

 def step(self, closure):
 loss = closure()
 self.zero_grad()
 for param in self.param_groups():
 grad = param.grad.data.clone()
 dim = param.dim()
 param_reshaped = grad.reshape(-1, dim)
 _, singular_vectors, _ = torch.svd(param_reshaped, full_matrices=False)
 W_k = singular_vectors[:, :k]
 W_2k = singular_vectors[:, :2*k]
 W_k_norm = W_k / torch.norm(W_k, dim=1).expand_dims(1)
 W_k_truncated = W_k_norm.clamp_(min=0, max=self.factor_alpha)
 param.data.copy_(W_k_truncated.reshape(-1, dim) + self.momentum * param.data)
 param.data.add_(self.weight_decay * param.data)

 return loss

This code is able to get the gradient of each parameter, reshape it to 2D, do an SVD decomposition, and keep the top k singular vectors. However, I am having trouble normalizing W_k according to the momentum of the F-norm of W_{2k} and truncating each coordinate of W_k to the top alpha percentile of all the coordinates, in absolute value.

I would appreciate it if you could help me with the normalization and truncation steps.

PreviousImage generation NextVideo generation

Last updated 1 year ago

Was this helpful?