CADChain has taken on the Auto-Summarization Challenge created by the Flemish Radio and Television Organization (the VRT) and will build an auto summarization model for extractive summarizations of media articles. The main goal is to create a system that can automatically summarize news articles without the key phrases and sentences being modified. CADChain’s approach is based on linguistic knowledge and uses existing AI models that can be trained further.
The first step is to use our linguistic knowledge to define what aspects in a sentence can be removed when a short summary is requested and what would be considered ‘additional detail’. This is called our Nearest Neighbor expansion.
The next step is to train AI models to value each derived sentence for importance and to use the importance classification to include more sentences in the summary.
The third step is to wrap this in a user-friendly interface which makes the process understandable for the writers and at the same time allow custom entries or changes to the result of the algorithm. The journalist can adjust more specific parameters for their article’s summarization and directly see the result. Summarization length can be variably defined based on parameters such as article length, requested duration, method of increasing the size of the summary.
Usage of Standards for data interoperability:
The implementation of the web applications and connectors for VRT to embed this in their own systems are built using web-frameworks based on Javascript/Typescript. The media samples are using JSON which is also the format of the standardized responses of our connector to VRT.
CADChain is relying on pre-existing models that understand language such as “OpenAI’s GPT3” (a model that understands the meaning of texts, can answer questions and can be taught what the desired response is), but are also using custom algorithms made using Amazon Sagemaker software for parts that don’t require the intricate detail of the (more expensive) openGPT3 model.