Various research methods are summarised under the terms text and data mining (TDM). In data mining, the focus is on data that is usually already available in a structured form. In text mining, the focus is on textual data, e.g. full texts from scientific journals or the entire novel production of a century.
These data sets and text collections are first prepared systematically and in a machine-readable format so that computer-aided analyses can then be used to automatically identify patterns or correlations or, for example, to summarise large quantities of documents with their central statements.
Text and data mining has been legally permitted for researchers since the amendment to the Copyright Act (UrhG) in 2018 with § 60d UrhG. However, legal and licensing requirements must still be observed.
The right to TDM also includes the storage and processing of data and texts for analysis as well as the necessary digitisation, normalisation, structuring, categorisation, annotation, combination, etc. After completion of the research, the underlying corpus may be handed over for permanent storage for preservation and quality control (see also research data management).
Although TDM is generally permitted, there are certain limits:
Since such a mass download can lead to the blocking of the publisher's content for the entire university, please inform yourself in advance about alternative interfaces and contact the publisher or us at ub-publizieren@uni-passau.de.
The DOI registry Crossref and some publishers offer special interfaces where you can obtain full texts for your TDM projects:
In addition to content that requires a licence, there are also freely accessible databases that allow the use of TDM, including