Abstract
In the contemporary era, great quantities of legal texts are produced, stored digitally, and retrieved for work later, to the extent that manual classification of these documents, and the manual processing of the content, has become unfeasible. This study provides support for this business need by implementing a microservice (LexML) for legal document and norm classification, which applies the concept of active machine learning. Following the evaluation of possible solutions for (legal) text classification and (active) machine learning in the existing literature, LexML was implemented using Apache Spark MLlib as the machine learning framework. Within the scope of this study, the existing functionality of the legal data-science environment called Lexia was utilized. Various cllabelledassifiers and query strategies were implemented and evaluated using German legal data. Overall, active learning strategies outperform traditional machine learning in terms of the speed of learning and maximum accuracy. The results of the document and norm classification experiments vary greatly: while for document classification, Naïve Bayes and Multi-Layer Perceptron outperform Logistic Regression, the latter is undoubtedly superior to the other two for norm classification.
Name | Type | Size | Last Modification | Last Editor |
---|---|---|---|---|
2017-02-13_Johannes_Muhr_kick-off_presentation.pdf | 2,91 MB | 07.07.2017 | ||
2017-07-10_Johannes_Muhr_final_presentation.pdf | 3,38 MB | 07.07.2017 | ||
2017-07-15_Johannes_Muhr_Master_Thesis.pdf | 3,52 MB | 07.07.2017 |