Large Language Models (LLMs) have demonstrated remarkable capabilities in numerous natural language processing tasks, including text summarization. Processing large amounts of textual data is ever more critical in a highly digitized setting with increasingly easy access to fast amounts of information. However, identifying the critical aspects of long texts is challenging and time-consuming for humans without adequate text preparation. This is where LLMs can be used to significantly support human text understanding by harnessing their capabilities for automating text summarization. Models of various companies like OpenAI, Google, and Meta have proven to be very effective information retrievers. We will focus on how the open-source model Llama 3.1 performs in long text summarization.
Model benchmarks are a valuable tool for evaluating the performance of LLMs, ensuring accurate and consistent representations of the information provided in the input text. Many such benchmarks apply metrics like ROUGE, F1, or BLEU scores that determine how well models can imitate human written summaries, testing for lexical alignment to the reference. These metrics, however, struggle to provide a comprehensive interpretation and reconstruction of how well LLMs can utilize the information distributed over their input data. This thesis will explore alternative approaches for model evaluation for long context window summarization. We will go into more detail about how the model uses the provided information inside the context window, which information it uses, and which information it potentially neglects. Our approach will focus on testing three context window extension techniques that help LLMs process more input data: ALiBi, YaRN, and LongRoPE. We will integrate automated evaluation metrics with human evaluation to achieve a more nuanced scoring. Ultimately, our evaluation pipeline will test whether the model, in combination with the different techniques, can handle large amounts of data and examine if more information equals better summaries.
Name | Type | Size | Last Modification | Last Editor |
---|---|---|---|---|
Clemens Magg Bachelor's Thesis Kickoff.pdf | 1,57 MB | 18.08.2024 |