Amberscript is going to build an auto summarization model for the VRT, to summarize Dutch new articles into short summaries. There are two approaches in this: extractive, filtering the most relevant sentences from the text, and abstractive, forming entirely new sentences that summarize the text. Amberscript will chose to build the first.
The first step in the creation of such an Extractive Auto Summarization model will be to train an unsupervised extractive summarization model. In the presence of high-quality human-generated abstractive summaries, this approach could be further refined to train an abstractive model using any of the state-of-the-art sequence-to-sequence transformer models.
Usage of Standards for data interoperability:
For the implementation of this, Amberscript makes use of Python, PyTorch and Hugging Face transformers package as well as use pretrained models from Hugging Face and other open source platforms. The models will be fine-tuned on VRT data and other open source datasets using GPUs to allow large-scale and efficient training.