Appendix A: Voyant Tools Dedicated OER Policy Corpus Results

Voyant Tools processed a dedicated OER policy corpus of 28 plain text files (Sinclair & Rockwell, 2022a). The Voyant Tools results were compiled into a LibreOffice Calc spreadsheet within a downloadable package for review and sharing at:

This study used the Voyant Tools default settings and Stopwords list for text analysis, which was included with the spreadsheet file. An exception to application defaults was the creation of a custom Stopwords list due to requiring keywords in the default list. The Voyant Tools corpus is scalable for the import of dedicated OER policy documents in future research, and factors such as document qualities (e.g., content, file format) may need to be considered for the consistency of the data inputs. Voyant Tools is interactive such that terms, phrases, and documents are interlinked, thereby offering a zooming of text between a micro to a macroscale. Individual document results were readable within the same multi-document corpus results without having to restart a new text analysis process for a single document view. See Appendix D as an example of scalability that incorporated a recently published OER policy document to the research baseline dedicated OER policy corpus.


The Voyant Tools text analysis output was made freely accessible online at the following URL for the purposes of open research:

Table 11 identified the document characteristics for each post-secondary institution in descending citation order. In Table 11, the copyright column represents clearly indicated copyright information found within the dedicated OER policy document. Copyright that was not found in the dedicated OER policy document metadata or directly in the text was assumed to be “not indicated” in Table 11 Copyright column. A research assumption was that a dedicated OER policy document without a copyright statement could have a stricter Creative Commons copyright of no charge and sharealike. However, no standard copyright choice or document format for dedicated OER policies was found in the literature review. The extraction of OER policy embedded in an institutional policy context was problematic, requiring researcher’s choices of relevant content. For example, institutions such as the Central Virginia Community College and Washington State University had their OER policy text embedded as sections within institutional policies, whereas institutions such as the Kwame Nkrumah University of Science and Technology (KNUST) published a stand-alone document.

Table 11
Internet Collection of Post-Secondary Institutional Dedicated OER Policies

Institution Citation Copyright Text Source
(Africa Nazarene University, 2015) CC-BY 4.0 PDF
(African Virtual University, 2011) Not Indicated PDF
(Central Virginia Community College, 2019) Not Indicated HTML
(Delft University of Technology, 2021) CC-BY-NC-SA 4.0 PDF
(Glasgow Caledonian University, 2020) CC-BY-NC-SA 4.0 PDF
(Heinrich Heine University Düsseldorf, 2021) CC-BY 4.0 PDF
(Kwame Nkrumah University of Science and Technology (KNUST), 2010) CC-BY PDF
(Netaji Subhas Open University, 2017/2018) CC-BY-SA PDF
(Northern Virginia Community College, 2018) Not Indicated PDF
(Odisha State Open University, 2016) CC-BY 4.0 PDF
(Open University of Sri Lanka, 2020) CC-BY-SA 4.0 PDF
(Open University of Tanzania, 2016) CC-BY 4.0 PDF
(Queensland University of Technology, 2021) All rights reserved HTML
(Reutlingen University, 2019) Not Indicated HTML
(Southern Alberta Institute of Technology, 2018) CC-BY 4.0 PDF
(SRI Ramachandra Institute of higher education and research, 2019) Not Indicated PDF
(Tamil Nadu Open University, 2020) CC-BY-SA 4.0 PDF
(Technical University of Graz, 2020) CC-BY 4.0 PDF
(University of Edinburgh, 2016) CC-BY-NC-SA 4.0 PDF
(University of Graz, 2020) CC-BY-NC-SA 4.0 PDF
(University of Kelaniya, 2020) Not Indicated PDF
(University of Leeds, 2017) CC-BY-NC-SA 4.0 PDF
(University of Passau, 2020) CC-BY 4.0 PDF
(University of the South Pacific, 2017) Not Indicated PDF
(Uttarakhand Open University, 2016) Not Indicated PDF
(Washington State University, 2018) Not Indicated PDF section
(Wawasan Open University, 2012) CC-BY-NC-SA HTML
(ZHAW University, 2020) CC-BY 4.0 PDF

Note. Source is the original file format before plain text conversions or translations.

Hermeneutics was conducted in the English language, and documents in other languages were translated to English, for consistency and comprehension by the researcher. Table 12 identified the institutions and OER policy document language sources in German that required translation with the assumption that there could be errors in translation from the original authors intended meaning. The assumption in English translations is that the documents have coherence with the original texts.

Table 12
Institutions With Non-English Dedicated OER Policy Documents

Institution Language
Heinrich Heine University Düsseldorf German
Technical University of Graz German
University of Graz German


According to Rockwell and Sinclair (2016), “insofar as occurrence indicates importance, frequency counts can provide a sense of what a text is about” (p. 58). The corpus of 28 documents generated 48,114 total words and 4,208 unique word forms. The summary panel for Voyant Tools produced a list of the five most frequent words in the corpus. The five most frequently occurring words (and count) were: oer (954), open (571), university (532), resources (419), and policy (380) (Sinclair & Rockwell, 2022u). The OER policy with the longest number of words was the OUT Policy on Open Educational Resources (OER) from the Open University of Tanzania (5517), and the shortest number of words was the Open Educational Resources Policy from the Central Virginia Community College (98) (Sinclair & Rockwell, 2022u). The document with the highest average words per sentence was the Open Educational Resources (OER) Policy, the Open University of Sri Lanka (revised 2020) from Open University of Sri Lanka (50.1), and the lowest average words per sentence was the OER Policy of Hochschule Reutlingen from Reutlingen University (20.8) (Sinclair & Rockwell, 2022u).

Corpus Terms

The terms panel resulted in 3,925 terms for the entire corpus ranging in frequency counts (i.e., raw frequency) from 1 to 954. The corpus terms count was after stopwords processing, whereas the unique word forms count in the summary panel was the result before default stopwords processing (Swettenham & MacDonald, 2022a). Table 13 is a sample of concordances from order of highest frequent terms.

Table 13

Sample of Highest Frequent Terms in the Dedicated OER Policy Corpus (Sinclair & Rockwell, 2022y)


In Documents Count

Raw Frequency

















Sinclair and Rockwell (2016b) stated that the cirrus or word cloud panel “visualizes the top frequency words of a corpus.” Voyant Tools reads the word list and renders the most frequent words centrally and proportionally in size (Sinclair & Rockwell, 2016c). The Cirrus tool panel displayed placed smaller words (i.e., less frequent terms) within spaces left by larger words that do not fit tightly together. According to Sinclair and Rockwell (2016b) “the colour of words and their absolute position are not significant.” However, the cirrus terms colour corresponds directly to the terms in the trends panel. Figure 6 word cloud emphasised the highest frequency terms. Notably, terms in the word cloud change position if the browser is reloaded, or panel sizes are changed, whereas term colour assignments and sizes remained the same between previous and current displays.

Figure 6
Word Cloud of the 50 Most Frequent Terms in the Corpus (Sinclair & Rockwell, 2022c)



The context or keywords-in-context tool provided a way to explore how terms are used in different contexts across the corpus, with a default context of five words per side (Sinclair & Rockwell, 2016d). Context outputs were collected for the five most frequent corpus terms: 954 items for OER, 571 items for open, 532 items for university, 419 items for resources, and 380 items for policy (Sinclair & Rockwell, 2022u). Table 14 is a sample output of five items from the index 0 document (Africa Nazarene University Policy on OER) for the OER term in the default order of “the most frequent in the term corpus” (Sinclair & Rockwell, 2016d).

Table 14
Sample of Contexts for OER in Sequential Order From the Beginning of the Africa Nazarene University Policy on OER (Sinclair & Rockwell, 2022q)

Left Term Right
AFRICA NAZARENE UNIVERSITY Policy on OER integration into ODeL and campus
role that Open Educational Resources ( OER ) can play in supporting this
Nazarene University (ANU) Policy on OER integration into ODeL and campus
any practical misuse of the OER materials or their content. Citation
University (ANU). 2015. Policy on OER integration into ODeL and campus


Corpus collocates are the terms that “appear more frequently in proximity to keywords across the entire corpus” (Sinclair & Rockwell, 2016e). According to Seale (2018), “in statistical terms, collocates are therefore words that occur together with a higher frequency than would be expected by chance alone” (p. 418). The dedicated OER policy corpus produced 6,939 collocates with a contextual frequency of 1 to 284. Table 15 is a sample of the five highest contextual frequencies from the corpus collocates results.

Table 15
Sample of a Corpus Collocates in Descending Contextual Frequency (Sinclair & Rockwell, 2022g)

Term Context Contextual Frequency
educational resources 284
open educational 276
resources open 268
open resources 264
oer open 190


Rockwell and Sinclair (2016, p. 150) asserted that repeated phrases are an indication of text with greater emphasis. In Voyant Tools, “the Phrases tool shows repeating sequences of words organized by frequency of repetition or number of words in each repeated phrase” (Sinclair & Rockwell, 2016k). Voyant Tools found 1,539 phrases, with frequencies ranging in length between 2 to 46 words (Sinclair & Rockwell, 2022t). Table 16 was a sample of five phrases in descending order of phrase length (i.e., the number of words in each phrase), each with a frequency of two occurrences in the corpus.

Table 16
Sample of Corpus Phrases With the Highest Phrase Length in Descending Length (Sinclair & Rockwell, 2022t)

Length Phrase


of 7 this work is licensed under the creative commons attribution 4.0 international license and is a derivative work by the southern alberta institute of technology of bcit’s open education best practices and guidelines by british columbia institute of technology used under cc by 4.0 d


as teaching learning and research materials in any medium digital or otherwise that reside in the public domain or have been released under an open license that permits no cost access use adaptation and redistribution by others with no or limited restrictions open


use creation and publication of oers are consistent with the university’s reputation values and mission to make a significant sustainable and socially responsible contribution to scotland the uk and the world promoting health and economic and cultural wellbeing


the oer policy of heinrich heine university düsseldorf is licensed under cc by 4.0 see https licenses by 4.0 legalcode except for the hhu düsseldorf logo hhu official notices no 58 2021 page


where students create oers as part of their programme of study or within a staff directed project staff supervising the creation of such material must ensure compliance with these guidelines before external publication


The trends tool provided a line graph of the relative frequencies of terms across documents in the OER policy corpus. In Figure 7, each term is represented by a unique colour that is identical to the term colour in the cirrus panel.

Figure 7
Screen Capture of Trends for the Five Most Frequent Corpus Terms (Sinclair & Rockwell, 2022aa)

Trends Trends Screen Capture

Note. Figure 7 is the default Trends panel display with a legend and Display dropdown options to show labels and graphs types.

Figure 7 was the screen display of the export of the trends panel from the initial display illustrated in Figure 3. The trends output was interactive and corpus terms filtering was initiated from lower left field in Figure 7 or by selecting a term in another panel.

Share This Book


Leave a Reply

Your email address will not be published. Required fields are marked *