Before we get started, have you tried our new Python Code Assistant? It's like having an expert coder at your fingertips. Check it out!
Paraphrasing is the process of coming up with someone else's ideas in your own words. To paraphrase a text, you have to rewrite it without changing its meaning.
In this tutorial, we will explore different pre-trained transformer models for automatically paraphrasing text using the Huggingface transformers library in Python.
Note that if you want to just paraphrase your text, then there are online tools for that, such as the QuestGenius text paraphraser.
To get started, let's install the required libraries first:
Importing everything from transformers library:
In this section, we'll use the Pegasus transformer architecture model that was fine-tuned for paraphrasing instead of summarization. To instantiate the model, we need to use PegasusForConditionalGeneration
as it's a form of text generation:
Next, let's make a general function that takes a model, its tokenizer, the target sentence and returns the paraphrased text:
We also add the possibility of generating multiple paraphrased sentences by passing num_return_sequences
to the model.generate()
method.
We also set num_beams
so we generate the paraphrasing using beam search. Setting it to 5 will allow the model to look ahead for five possible words to keep the most likely hypothesis at each time step and choose the one that has the overall highest probability.
I highly suggest you check this blog post to learn more about the parameters of the model.generate()
method.
Let's use the function now:
We set num_beams
to 10 and prompt the model to generate ten different sentences; here is the output:
Outstanding results! Most of the generations are accurate and can be used. You can try different sentences from your mind and see the results yourself.
You can check the model card here.
This section will explore the T5 architecture model that was fine-tuned on the PAWS dataset. PAWS consists of 108,463 human-labeled and 656k noisily labeled pairs. Let's load the model and the tokenizer:
Let's use our previously defined function:
Output:
These are promising results too. However, if you get some not-so-good paraphrased text, you can append the input text with "paraphrase: "
, as T5 was intended for multiple text-to-text NLP tasks such as machine translation, text summarization, and more. It was pre-trained and fine-tuned like that.
You can check the model card here.
Finally, let's use a fine-tuned T5 model architecture called Parrot. It is an augmentation framework built to speed-up training NLU models. The author of the fine-tuned model did a small library to perform paraphrasing. Let's install it:
Importing it and initializing the model:
This will download the models' weights and the tokenizer, give it some time, and it'll finish in a few seconds to several minutes, depending on your Internet connection.
This library uses more than one model. It uses one model for paraphrasing, one for calculating adequacy, another for calculating fluency, and the last for diversity.
Let's use the previous sentences and another one and see the results:
With this library, we simply use the parrot.augment()
method and pass the sentence in a text form, it returns several candidate paraphrased texts. Check the output:
The number accompanied with each sentence is the diversity score. The higher the value, the more diverse the sentence from the original.
You can check the Parrot Paraphraser repository here.
Alright! That's it for the tutorial. Hopefully, you have explored the most valuable ways to perform automatic text paraphrasing using transformers and AI in general.
You can get the complete code here or the Colab notebook here.
Here are some other related NLP tutorials:
Happy learning ♥
Take the stress out of learning Python. Meet our Python Code Assistant – your new coding buddy. Give it a whirl!
View Full Code Generate Python Code
Got a coding query or need some guidance before you comment? Check out this Python Code Assistant for expert advice and handy tips. It's like having a coding tutor right in your fingertips!