🫖 Prepare the inference

Inferring is asking your model to "guess" or "generate" something new based on a prompt.

Reupload your files

⚠️ ONLY IF YOU RESTARTED YOUR SESSION ⚠️

ONLY IF you exited your previous session, you must reupload the model and reinstall the packages. You can do so by playing the hidden cells below.

If you are still in the same session, and your distilgpt2-finetuned folder is still there, then you can skip this part.

Prepare your prompt

Remember when all your dataset started with I heard that...? Now we use this as a sentence starter, so that our model will more probably imitate our dataset.

Also, the model needs numbers, not words, so we need to tokenize the prompt.

This process will create input_ids: our text turned into numbers; and an attention_mask: a map indicating what is text and what is padding.

prompt = "I heard that"

inputs = tokenizer(prompt, return_tensors="pt", padding=True)
input_ids = inputs["input_ids"]
attention_mask = inputs["attention_mask"]

✒️ Generate Output!

Set the instructions for your generation model.

max_length → max amount of words generated. Keep between 25-50

temperature → the highest the value the more random the output. Keep between 0.7 and 0.9

num_return_sequences → total number of generations. Make more if you want to produce more outputs at once.

Decode your result from numbers to words using tokenizer.decode(), then write it below using print().

output = model.generate(
    input_ids,
    attention_mask=attention_mask,
    max_length=25,
    do_sample=True,
    top_p=0.95,
    temperature=0.8,
    num_return_sequences=1,
    pad_token_id=tokenizer.eos_token_id
)

generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)