In brief, a transformer is a neural network architecture designed to model sequences of data, making them ideal for tasks such as language translation, sentence completion, automatic speech recognition and more. Transformers have really become the dominant architecture for many of these sequence modeling tasks because the underlying attention-mechanism can be easily parallelized, allowing for massive scale when training and performing inference.... Depending on the application, a transformer model follows an encoder-decoder architecture. The encoder component learns a vector representation of data that can then be used for downstream tasks like classification and sentiment analysis. The decoder component takes a vector or latent representation of the text or image and uses it to generate new text, making it useful for tasks like sentence completion and summarization. For this reason, many familiar state-of-the-art models, such the GPT family, are decoder only.
No comments:
Post a Comment