Chapter Three: Methodology

My study adapted a hermeneutics theoretical framework for the methodology in collecting and analyzing the data (Gillo, 2021, p. 44). In the hermeneutics theoretical framework, the literature review established a forestructure of understanding toward distinguishing dedicated OER policy texts, followed by the decision of data representation of digital texts, then a worldwide compilation of a dedicated OER policy document from each post-secondary institution for a dedicated OER policy corpus, and subsequent data analysis with close and distant readings (Gillo, 2021). Hermeneutic cycles of text analysis furthered the exploration and understanding of texts within a dedicated OER policy corpus of official online published documents from 28 post-secondary institutions. The dedicated OER policy source formats were 24 PDF and 4 HTML pages (Table 11). Although institutions such as Glasgow Caledonian University (2020) offered PDF and DOCX file versions, the PDF was used for consistency with close reading and file conversion to plain text to preserve header and footer text. Additionally, policy text embedded in the body of HTML was extracted as plain text for distant reading and converted to PDF for close reading. The cycles of close and distant readings contributed to interpretations, reflections, discovery of patterns, and development of insights towards understanding how OER was articulated in the dedicated OER policy corpus.


My research methodology involved a hermeneutic approach with text analysis of a dedicated OER policy corpus in English, via cycles of close readings from the researcher, intertwined with distant readings using a specific CAQDAS known as Voyant Tools (Sinclair & Rockwell, 2022a). According to Boyles and Scherer (2012), “close reading means reading to uncover layers of meaning that lead to deep comprehension” (p. 37). Further clarification by PARCC (2012, p. 7), stated that “close, analytic reading stresses engaging with a text of sufficient complexity directly and examining its meaning thoroughly and methodically,” for deliberate reading and rereading. Harter (2022) summarised close reading as “an activity that keeps you focused on and within a text—appraising individual words, shapes of thought, rhetorical devices, patterns of description and characterization, and so forth, in order to understand the text’s artistic achievement” (sec. About Close Reading). In the context of the researcher position in this study, close reading was the face-to-text reading of dedicated OER policy texts. Close readings adapted the cyclic process of a hermeneutic circle to examine dedicated OER policy context towards understanding the text that changed understanding of the context; a process that continually informs the interpretation of the dedicated OER policy corpus (Cunff, 2020). In the context of this study and researcher position on the text, close reading by the researcher involved direct eye-to-text viewing of words, syntax, order, and how the content was being stated (Wikipedia contributors, 2022a). In distant reading, the intermediary reading by computer of a single or multi-document corpus placed a virtual distance between the researcher and the original dedicated OER policy text rather than the actual direct human-to-text reading and analysis. Inspired from distant education between learner and teacher positions, this inquiry reconciles close and distant reading concepts from the perspective of the actual or virtual distance of the researcher’s position to the texts, as the virtual distance between the researcher and texts was mediated by CAQDAS, in reading, decomposing, and recomposing of dedicated OER policy texts for interpretation by the researcher.

Moretti (2000) noted distant reading as “a condition of knowledge: it allows you to focus on units that are much smaller or much larger than the text: devices, themes, tropes—or genres and systems” (p. 57). In conceptualizing the scale in close and distant reading Jin (2017) asserted that distant reading was analogous to an aerial view or macroanalysis, whereas close reading focused on the text in a microanalysis. However, depending on the CAQDAS, applications such as Voyant Tools can bridge close and distant readings. For example, Voyant Tools can interactively select the whole or parts of a corpus and simultaneously view a document within the corpus (e.g., zoom into a multi-document corpus from a visual overview to a document, sentence, and word). Thus, for the purposes of this study, adopting the researcher’s actual or virtual position to the text delineated the terms “close” and “distant” reading in the hermeneutic exploration of a dedicated OER policy corpus.

Jin (2017) asserted that circular and iterative processes between close and distant readings promoted the discovery of meaning and interpretation. Furthermore, Jin (2017) suggested that “reconciliations between ‘close reading’ and ‘distant reading’ must occur not only on methodological terms, but rhetorical ones as well” (Jin, 2017, sec. What’s “distant” about “distant reading”?).

Furthering the hermeneutic circle of understanding, regarding the cyclic development of interpretations from meanings in close and distant readings (i.e., interplay of co-created insights), there was a shift in horizons of understanding for the researcher (Gadamer, 1975, p. 273) that led to the emergence of patterns in the text and notions of the meaning of OER in dedicated OER policy, as discussed in Chapter 4.

In the process of readings, distant reading of the dedicated OER corpus by Voyant Tools analysis preceded the close reading of each policy due to the convenience of rapidly generating results; whereas close reading analysis was a labour-intensive and time-consuming process that involved four complete iterations. Voyant Tools was selected primarily on the following criteria: current, free, open source, stable, multi-platform, user-friendly (e.g., internal, and external help), in active development, and a recognised text analysis tool set that can generate consistent textual and visualisation results.

Rockwell and Sinclair (2016, p. 143) distinguished text analysis tools for hermeneutics as an “interactive” modelling of the original that invites exploration and checking of the interpretation of the model against the digital text. Rockwell and Sinclair (2016, p. 23) argued that hermeneutica is part of a movement that integrates method and interrogation. According to Vieira and De Queiroz (2017, p. 9), “hermeneutic is an art of interpretation of understanding.” Hermeneutics is advocated as a credible research approach that is open to exploration and interpretation of text (Addeo, 2013; Gillo, 2021; Paterson & Higgs, 2015). This study adapted Gillo’s (2021, p. 44) hermeneutic methods model of collection and data analysis with close reading (including annotation and inline notes on meanings), distant reading with Voyant Tools, and development of interpretations and insights for the dedicated OER policy corpus.

The distant reading portion of the study collected textual and visualisation results for concordances, word clouds, term frequencies, contexts, collocates, phrases, and trends. Other tools in Voyant, such as Correlations and Bubblelines, could be useful in future text analysis but were not employed for the following considerations: time management, the quantity of corpus data, consistency in generating results across the corpus, and alignment of concordance tools to support close readings, while limiting issues such as acontextual counting (see Scope and Limitations section). Additionally, a visualization tool such as the Terms Radio order of documents was not used for the corpus analysis (Sinclair & Rockwell, 2016j).

Rockwell and Sinclair (2016) asserted that computer text analysis had the advantages of:

  • “formalizing claims, or parts of claims, so they can be shared and verified” (p. 260)
  • facilitating “interpretive negotiation in new ways” and “enlarge a dialogue by providing formalizations for negotiation” (p. 261)
  • “is an imaginative practice that makes use information technology” (pp. 261–262)

The hermeneutic system of interpreter, text, and meaning (Demeterio, 2001, p. 3) was applied to close readings of each dedicated OER policy document, whereas distant reading employed Voyant Tools software for the dedicated OER policy corpus followed by interpretation. Thus, distant readings of the OER corpus resulted in disassembled texts, whereas close readings followed a linear left to right viewing of assembled words into sentences and paragraphs to understand contexts and texts towards revealing patterns. Close and distant readings informed each other for interpretations of the texts, leading to a greater understanding of OER in dedicated OER policy.

Scope and Limitations

The hermeneutic approach focused on a worldwide dedicated OER policy corpus from online published documents of post-secondary institutions. An aim of my study was an interpretation of these policy texts to reveal insights and patterns from a holistic perspective, rather than a dissection of text into categories and themes to create new interpretations that were beyond the original macro interpretation of the text itself (i.e., an exclusive hermeneutics of the policy texts). Thus, close, and distant readings were designed to explore a combined rich ‘aerial view’ and granular view of texts for the dedicated OER policy corpus interpretations. This study considered the content analysis of coding, categories, and theme development to embed text into models of communication to be outside the scope of exploring a hermeneutics of the original dedicated OER policy texts (Mayring, 2004). Furthermore, Mayring (2004) concluded that qualitative content analysis was less appropriate to holistic, explorative, and open-ended investigations. Additionally, Seale (2018, p. 404) stated that content analysis was subject to criticism in focussing on the what (i.e., description) of text rather than the how (i.e., interpretation of meaning). This study adopted a holistic position regarding the exploration of an interpretation of meaning from dedicated OER policies rather than an inquiry criticising the texts. Consequently, close, and distant readings were appropriate to this exploratory hermeneutic study in the generation of broad interpretations and insights towards understanding OER in a dedicated OER policy.

A discourse analysis approach was considered unsuitable for a preliminary open-ended investigation of a dedicated OER policy corpus where there could be multiple understandings and interpretations. In addition, a hermeneutic approach aligned with an interpretivist paradigm that was adopted by this research. Discourse analysis diverges from hermeneutics in rejecting “the idea that texts are open to any number of different, and equally plausible, readings” (Seale, 2018, p. 485). Furthermore, the hermeneutical situation is determined by the preconceptions or prejudices that constitute a horizon (Gadamer, 1975, p. 272). According to (Gadamer, 1975) “the horizon is the range of vision that includes everything that can be seen from a particular vantage point” (p. 269). Thus, a holistic view with forestructure of understanding and preconceptions were congruent with a hermeneutic approach.

Distant readings adapted Rockwell’s (2003, p. 213) hermeneutic tenets of text analysis to reveal patterns of coherence and meaning in the dedicated OER policy corpus. Rockwell (2003, p. 214) identified three underlying hermeneutic principals of concordances and text analysis tools applicable to distant reading of the dedicated OER policy texts:

  • Use of a concordances presumes unity of the text and a consistent use of words.
  • Concordance is a new combination of the parts of the original text.
  • Concordance is generated by procedures in software, initiated by word or pattern queries.

An assumption was that institutional dedicated OER policy documents articulated one or more constituents of the definition of OERs (e.g., copyright), and policy aspects such as, guidance on the creation and use of OER’s for the academic community.

Limitations of dedicated OER policy documents included formats and the relevant parts of HTML text to be considered for the corpus. Since the documents were predominantly in PDF and convertible to plain text, the PDF was selected as the baseline file format for close readings and plain text format for distant reading. Four policy documents were embedded in HTML format that created a challenge in decisions on what parts of the webpage were relevant to the study. Hence, a limitation of the HTML format was the exclusion of top, bottom, and sides of pages with content not exclusively part of the post-secondary institutional dedicated OER policy texts. Future research may include the surrounding webpage content as relevant data for text analysis; however, for my exploratory study, the body of the dedicated OER policy webpage was converted to PDF followed by plain text format. Further limitations are discussed in the corpus preparation section.

Hetenyi et al. (2019, p. 393) stated, that although Voyant Tools yielded valuable information in the form of visualisations, negative aspects were the possibility of arriving too quickly at conclusions from false interpretations of visual data and that different input parameters could affect the results. A further limitation of Voyant Tools generation of numbers is Sandelowski’s (2001) assertion of complications with counting, such as overcounting and acontextual counting. Sandelowski (2001, pp. 237–239), cautions representational overcounting whereby numbers detract from findings, and analytic overcounting by adding numbers that will not add further understanding of the findings. However, numbers leading to visualisations in Voyant Tools are important toward identifying patterns (J. A. Maxwell, 2010, p. 479). Thus, a limited set of tools in Voyant Tools was used to support understanding in close readings in the discovery of patterns and keywords, such as the term liability.

Close and Distant Reading

Close and distant readings used different software tools for analysis of the dedicated OER policy texts. Close reading used PDF for annotation in Skim (A. Maxwell et al., 2023), whereas distant reading used the plain text format as a simplified structure for document processing by Voyant Tools (Sinclair & Rockwell, 2022b). Standardizing the policy documents in a digital format for text analysis required conversions when the source files were a different file format (e.g., HTML). Furthermore, non-English language texts were converted to English for ease of interpretation and consistency of the language within the dedicated OER policy corpus.

Close Reading With Skim

Close reading used a compiled PDF corpus of the dedicated OER policy documents for annotation using a PDF reader annotation software. The criteria for selecting PDF software included: annotation (e.g., highlights and notes), annotation cataloging and export, open source, free, stable, supported, and compatible with existing operating systems. Furthermore, the PDF reader annotation software must organise the annotations and notes inline with the text during close reading of the dedicated OER policies. Various free online and offline applications for PDF reading and annotation exist, such as Google Drive ( (Friedman, 2018), eMargin ( (Kehoe & Gee, 2012), and ( (“Annotating Locally-Saved PDFs,” n.d.). Offline PDF reading and annotation functionality is built into Zotero (Zotero Contributors, 2011/2022). However, currently Zotero only supports imported or manually created annotations in a PDF and excludes a function to export shareable mark-up notes across applications (Dstillman, 2021). Hence, in close readings, the free and open source standalone application Skim for Macintosh OSX (A. Maxwell et al., 2022) was used for annotating and generating an exportable catalogue of shareable mark-up notes that were linked to their respective positions within a PDF compilation of dedicated OER policy texts. According to the Skim website (Skim, n.d.), this PDF application is dedicated to reading and annotating scientific papers, and therefore was considered an applicable tool for this study. Limitations of the Skim application include exclusively Mac-based operating system and the shareability of its native PDF annotations for use in other PDF readers across different platforms. Currently, Zotero has minimal annotation tools such as text highlighting, whereas Skim includes advanced tools to add boxes, circles, freehand, underline, strike out, and lines to a PDF. Furthermore, Skim has an advanced notes organiser that is exportable in different formats, such as the “.fdf” format included with the close readings file in the research website ( to mitigate the shareability of the Skim annotations for other free and open source PDF readers such as Zotero (Swettenham, 2023).

Preparation of the PDF compilation of dedicated OER policy involved converting the HTML sources to an intermediary LibreOffice text document followed by export to PDF, then combining the individual PDF files into a PDF corpus for a centralised and consistent approach to close readings. File naming of the source documents used the post-secondary institutional names followed by the dedicated OER policy titles in descending order for both the PDF compilation and plain text files.

Distant Reading With Voyant Tools

Voyant Tools, a free and open source CAQDAS, “is a web-based text analysis, reading, and visualization environment” (Sinclair & Rockwell, 2016a), that facilitated distant reading, such as terms counts and phrases to support close readings, and online sharing of results. In the context of this hermeneutic inquiry, distant reading was conducted on a dedicated OER policy corpus to generate text, values, and visualisations using the following tools in Voyant Tools: Cirrus (i.e., word cloud), Corpus Terms, Contexts, Corpus Collocates, Phrases, and Trends. The selected tools provided a deconstructed view of the dedicated OER policy with numbers and visual outputs that were used to support understandings about words, phrases, and patterns from close readings.


The Cirrus tool provided a visualisation of words having the highest frequency in the dedicated OER policy corpus (Sinclair & Rockwell, 2016c). According to Sinclair and Rockwell (2016b), “word clouds can be effective at very quickly drawing attention to high frequency terms” (2016c, sec. Additional Information). DePaolo and Wilkinson (2014) asserted that “the effectiveness of the word cloud is theoretically grounded in the learning model of graphical organizers” (p. 38). A graphic organizer is a tool for visualizing “symbols to express ideas and concepts, to convey meaning” (Saskatoon Public Schools, 2004). According to DePaolo and Wilkinson (2014), “a graphical organizer can provide an assessment picture of individual concepts along with a “big picture” assessment of the interrelationships of the individual concepts” (p. 38). DePaolo and Wilkinson (2014) maintained that word clouds were useful for assessing keyword usage and comparisons between related data. Mackintosh (2017) employed a corpus word cloud to visualise terms from distance education literature to confirm word frequency queries. Similarly, the Cirrus tool was useful as a visual representation of the most frequently used terms in the dedicated OER policy corpus. The most frequent words highlight the dominant terms in the corpus. However, to gain further understanding necessitated the use of other tools in distant reading intertwined with close reading for a richer picture of dedicated OER policy.

Corpus Terms

The Cirrus tool aided in a visual overview of term occurrences of greatest frequencies, whereas the Corpus Terms tool presents a table view of all terms sorted in descending order of frequency in the corpus, which aligns with the hermeneutic principles of concordances for the interpretation of text discussed previously in the scope and limitations section (Sinclair & Rockwell, 2016f). According to Rockwell (2001, p. 6), a concordance is a hybrid text that is meaningful in the context of the dialogical process between the researcher and the original text. Rockwell (2003) noted that concordances are a way “to discover patterns of coherence in a text or textual corpus” (pp. 5–6) and engage in a dialogic process with the text. Furthermore, Wynne (2007) asserted that “it is through concordancing that the patterns of usage and the paradigms are revealed” (sec. Introduction). Concordances that identify the terms and counting of occurrences thereby provide identification of frequent and distinctive terms that characterised the dedicated OER policy corpus (Rockwell & Sinclair, 2016, p. 55). According to Rockwell and Sinclair (2016), “insofar as occurrence indicates importance, frequency counts can provide a sense of what a text is about” (p. 58). Thus, patterns in the corpus texts and keywords that emerged from close readings were supported by the distant readings.


A concordance that listed and counted all words that appeared in the dedicated OER policy corpus enabled the exploration of term contexts (Seale, 2018, p. 407). Furthermore, although the Cirrus tool visually summarized the most frequent terms and aided in patterns recognition by the researcher, the Context tool complimented concordances by grounding the “analysis in the manifest content of the texts” (Seale, 2018, p. 408), rather than in the latent content from the researcher’s close reading. The Context (or Keywords in Context) tool lists each term with a default setting of 5 words before and after the term (Sinclair & Rockwell, 2016d). According to Sinclair and Rockwell (2016c), the context tool “can be useful for studying more closely how terms are used in different contexts.” The context of terms aided in building meaning that furthered the researcher’s horizon of understanding within the hermeneutic circle of context and text. The Context tool was critical to listing, organising, and furthering understanding the terms in the context of dedicated OER policy texts.

Corpus Collocates

Collocates are two or more terms that occur more frequently near each other than by chance (Essberger, n.d.). Firth (1968) stated that “the collocation of a word or ‘piece’ is not to be regarded as mere juxtaposition, it is an order of mutual expectancy. The words are mutually expectant and mutually prehended” (p. 181). The Collocates tool-listed corpus terms that “appear more frequently in proximity to keywords across the entire corpus” (Sinclair & Rockwell, 2016e). According to Anagnostou and Weir (2011, p. 13), collocations have application in word sense disambiguation and identification of linguistic habits of using the same term for different contexts. Furthermore, as noted by McEnery and Wilson (n.d.) “information about the delicate differences in collocation between two words has a potentially important role,” that was applicable to developing new understanding and insights into the characteristics of dedicated OER policy texts. Although Essberger (n.d.) asserted that there is predictability in strong and weak collocations, this notion could be challenging in a dedicated OER policy whereby terms such as ‘Creative’ and ‘Commons’ appear regularly together. Terms that are associated (e.g., Creative Commons) could be considered a non-collocation as noted by Essberger (n.d.), “when a sequence of words is 100% predictable, and allows absolutely no change except possibly in tense, it is not helpful to treat it as a collocation” (n.d., sec. When is a collocation NOT a collocation?). Moreover, the Context tool was complimentary to understanding the meaning of the collocate, or non-collocate, such as an organisation or a licence (e.g., Creative Commons).

According to Evert (2008), collocations are the most often a measure of positive associations between terms, whereas negative associations may exist or “anti-collocations” (Pearce, 2001, p. 41). Evert (2008) concluded that new association measures with novel properties were an important consideration for investigation (e.g., improvements in collocation algorithms for Voyant Tools), is beyond the scope of this dedicated OER policy text analysis.


Phrases research has been used in education (Granger, 1998; Oakey, 2020) and data mining (Liu et al., 2015) that have revealed the benefits and complexities of discerning phrases in a corpus. My text analysis employed the Phrases tool, which listed “repeating sequences of words organized by frequency of repetition or number of words in each repeated phrase” (Sinclair & Rockwell, 2016k). Sinclair and Rockwell (2016i) noted that the Phrases tool was limited by the exclusion of stopwords, and repeating phrases were counted within documents not across the corpus (i.e., a single occurrence of a phrase that recurs across the corpus won’t be included in the phrase list). Although the Phrases tool has limitations, repetitions of variable-length phrases, such as concepts and products, reduce semantic ambiguity, delineate important phrases, and aid in the discovery of emerging patterns (Liu et al., 2015; Sinclair & Rockwell, 2016k).

An additional application of phrases and terms, is the research by the University of Helsinki (2017) declaring that “short repetitive exposure to novel words induced a rapid neural response increase that is suggested to manifest memory-trace formation.” According to Kimppa (2017) “words with high frequency of occurrence elicited greater neural responses than low frequency words or meaningless pseudo-words” (p. 4). Although outside the scope of my research, phrases and terms occurring with different frequencies in dedicated OER policies could be an avenue of future research from a neural perspective.


The Trends tool is similar to the Cirrus tool in providing an organisational visualisation and overview with the additional ability to export both graphical and numeric data. The Trends tool default display is a line graph with stacked bars depicting “frequencies of terms across documents in a corpus or across segments in a document, depending on the mode” (Sinclair & Rockwell, 2016m). The default relative frequencies line and stacked bar graph represent the terms relative to other terms in that document, and the calculation is the value of the term count in the document divided by the total tokens in the document multiplied by one million tokens (MacDonald, 2022b; Smith, 2014). The relative frequency Trend display has an expanded view of each term in the graph. The raw frequency option displayed a compressed view of each term, which was particularly evident for documents with a low total token count. Relative frequencies were useful for comparing documents in the corpus having different total token sizes (Mihaescu, 2010; Smith, 2014). Thus, the default relative frequency trend graph was used to visualise the spread of terms in a document compared with the corpus.

The Cirrus tool provided an interactive aerial snapshot of the corpus terms, whereas the Trends tool offered both a broad and detailed visualisation of the relative frequent terms within corpus documents. The Trends tool provides a rapid visualization of the relative or raw frequency of the default five highest frequency terms across a dedicated OER corpus with the ability to interactively drill down to a document term (Sinclair & Rockwell, 2016m). Trends were useful in visualising the state of a term (i.e., frequency of occurrence) in relation to other terms within each institutional document and the dedicated OER policy corpus. Results from the Context and Collocates tools were complimentary in providing a richer picture of the terms and graphs generated by the Trends tool.

Voyant Tools Application

The concordance tools used in the distant reading of a dedicated OER policy corpus were consistent in producing the same results with the same corpus, online ( and offline (Sinclair & Rockwell, 2022ab). However, certain tools in the Voyant Tools suite were not used, such as topics (i.e., term clusters), which can generate different results for each run of topic modelling (Sinclair & Rockwell, 2022ac). Thus, Appendix A is the Voyant Tools results from using the selected text analysis tools on the dedicated OER policy corpus.


Voyant Tools is an established research application for reading textual data to display and export dynamic and static information, as texts, graphics, and hyperlinked content in peer-reviewed publications (Daines III et al., 2018; Hendrigan, 2019; Hetenyi et al., 2019; Philbin, 2018; Sinclair, 2020; Steiner et al., 2014). Voyant Tools has been used in dissertations such as Pegoda (2016), Ryan (2021), and Sinclair (2001). Notably, Sinclair’s (2001) dissertation on the creation of HyperPro, was the precursor to Voyant Tools (Sinclair & Rockwell, 2016b).

Voyant Tools is agile in providing interactive text analysis results with both the corpus and individual document views that provide the ability to zoom into a document and term within the dedicated OER policy corpus. Currently, Voyant Tools is available as a stand-alone desktop application (Voyant Server) for offline use (Sinclair & Rockwell, 2022ab), and as an online version at (Sinclair & Rockwell, 2022a). The online version of Voyant Tools was used for shareability of the data and results, with the stand-alone Voyant Server used for offline use and application testing, including verifying that offline output was identical to the online output. Notably, the offline stand-alone desktop version of Voyant Tools required Java (Lana, 2022; Swettenham & MacDonald, 2022b), thereby potentially giving the online version the advantage of convenience and shareability.

Alternative text analysis applications can be found on the Text Analysis Portal for Research Version 3 (TAPoR 3) portal site (, and sites such as Concordancers (Weisser, 2021). Although Orange 3 ( was a promising open source data mining software, it required manual entry of terms to query to produce concordance, whereas Voyant Tools automatically contained selectable queried terms. AntConc (Anthony, 2020) concordance software was similar to Orange 3 with respect to manual entry of queries of known terms in the corpus. However, AntConc is freeware, proprietary, and platform specific, whereas Voyant Tools is free, open source, multi-platform, and runs from any web browser.

The Voyant Server is a stand-alone desktop application and web-based launcher for a local instance of Voyant Tools (Sinclair & Rockwell, 2022ab; Wikipedia contributors, 2021f). The standalone version was useful for offline text analysis and portability in situations without Internet access. The online version of Voyant Tools was used for convenience in sharing data and information via the Voyant Tools citation URL on the Internet (see Appendix A). According to MacDonald (2022a), the online versions of Voyant Tools “aims to have corpa persist for as long as possible,” and thus the URL for the dedicated OER policy corpus output is open to current external review, experimentation, and opportunities for future research.

Corpus Preparation

Preparation of the dedicated OER policy files for Voyant Tools involved the conversion of source file formats to a common plain text format for consistency of data input, as recommended by Voyant Tools (Sinclair & Rockwell, 2016i, sec. 2.0 Preparing an Electronic Text, 2016g). Converting PDF to text files may use free and open source stand-alone software or free online file conversion services. Different online and offline conversion applications were tested for plain text output, such as the free online PDF to Text website ( and the free offline open source Xpdf command line tools for multiple platforms at (Noonburg, 2022a). The AntConv Macintosh version 2.0.0 (Anthony, 2021), a freeware multi-platform software with batch processing and a user-friendly graphical interface, was tested but not used for my study because it is a proprietary application. Additionally, AntConv had difficulty arranging text that were in columns within a PDF, thereby potentially creating confusion in contexts for Voyant Tools. Xpdftotext version 4.04 (Noonburg, 2022a) was chosen for the conversion of PDF to text with the aim of obtaining accuracy from a free, open source, stable, and multi-platform application in active development. Xpdftotext runs in a terminal application with a batch command line (i.e., for file in *.pdf; do ./pdftotext -layout -enc UTF-8 “$file”; done) set to the original layout of text and encoding to UTF-8 (Logix, 2019; Noonburg, 2022b), which produced a Voyant Tools compliant plain text file that needed further review for extraneous characters (Swettenham & MacDonald, 2022c). The conversions of PDF to UTF-8 encoded plain text files were checked for accuracy in conversion and extraneous characters, such as unrecognised invisible and visible characters. Invisible formfeed characters (i.e., “/f”) were removed (Wikipedia contributors, 2021d), and the unrecognised bullets (e.g., visible as inverted question mark or question mark in a box) were replaced with U+2022 unicode (• – Bullet, 2022). The preparation of the corpus by conversion and cleaning of plain text files removed extraneous characters that could be introduced into the results by Voyant Tools misinterpreting the data from the tokenization of the uploaded files (Sinclair & Rockwell, 2016g).

The body of dedicated OER policy texts embedded in HTML sources were extracted into PDF and plain text formats. Furthermore, dedicated OER policies that were embedded within an institutional policy PDF compilation were extracted into a PDF file and then converted to plain text for consistency of the workflow. However, the German PDFs were translated into English using the LibreOffice OpenDocument Text (ODT) format (The Document Foundation, n.d.), then exported to plain text if there were no headers and footers in the German source file; otherwise, the English-translated ODT file would need to be converted to PDF for conversion to text to preserve the header and footer content.

The mix of multi-formats policy text files for the corpus could benefit from a consistent digital open document format standard, such as publishing institutional dedicated OER policy documents in LibreOffice ODT format, a free and open source multi-platform office software (The Document Foundation, n.d.). Hence, a policy document in ODT with open copyright could be freely accessible, modifiable, of archival quality, and non-proprietary format for digital preservation (Library of Congress, 2021); thereby eliminating the time and effort of employing file conversion tools for this study.

Corpus Text Analysis

The plain text conversion, from the PDF and web page sources, provided a baseline format for Voyant Tools researchers. However, graphics such as institutional logos and creative commons icons were not converted to text and therefore these elements were excluded from text conversion and analysis. The collection of dedicated OER policy plain text files, in English, was uploaded together into Voyant Tools, to become a multi-document corpus. Figure 3 is the default display output of the dedicated OER policy corpus from the online version of Voyant Tools (the offline version produced the identical display and results). Notably, the default Stopwords list filters the output, which may include words useful to the study. Therefore, it was necessary to review and customise the Stopwords list when appropriate. For the purposes of this exploratory research the default configuration from Voyant Tools was used to produce the initial output for interpretation followed by custom Stopwords lists to produce extended filtered terms output based on the initial distant reading analysis. The process of close and distant readings, framed in hermeneutic cyclic processes, necessitated a modification of the Stopwords lists to produce specific term analysis output in further understanding of legal and actionable meanings in the dedicated OER policy corpus. The extended outputs from filtered terms were collated with the initial results in a LibreOffice spreadsheet compilation archive. Future text analysis of the dedicated OER policy corpus may consider a customised Stopwords list to filter Voyant Tools output.

Figure 3

Screen Capture of the Voyant Tools Default View for the Post-Secondary Institutional Dedicated OER Policy Corpus Initial Output (Sinclair & Rockwell, 2022a)

Default Display Voyant Tools

Each panel in Figure 3 was expandable to a web browser window and included features, such as term filters, referencing, and data export, that were used for archiving results to a LibreOffice spreadsheet for online distribution with the research website. Depending on the panel of the webpage, selecting a term changed the output in other panel displays, thereby providing a reading experience of zooming in or out from corpus, document, and word levels. Reloading the web page or resizing a window rearranged the location of words in the Cirrus tool panel of the default Voyant Tools display, but colour and weighting of term frequencies remained unchanged (Sinclair & Rockwell, 2016c). The term colour assignments were the same across tool panels with the default palette options, such as the Cirrus and Trends tools. The dedicated OER policy corpus results were in the form of text, quantities, and visualisations that aided the examination of terms within each document (Silge & Robinson, 2021). Text results from each tool for the study were exported as “all available data in tab separated values (text)” (Sinclair & Rockwell, 2022a), and imported in the LibreOffice Calc Spreadsheet for review, archiving, and online sharing. Visualisations were exported in the default image formats (i.e., PNG and SVG) to the worksheets within the spreadsheet of text results for centralising and archiving the offline data collection from the online Voyant Tools webpage output. Voyant Tools is scalable for larger volumes of texts in future research, such as the inclusion of newer post-secondary institutional dedicated OER policy documents, as per Appendix D (Sinclair & Rockwell, 2016b). Furthermore, the experimentation of the texts with a wider selection of tools than was used in the study is possible with my research data and related Voyant Tools URL (


In the hermeneutic exploration of dedicated OER policies, understanding progressed from gathering meaning of terms, phrases, and contexts in the process of close and distant readings. According to George (2020), understanding is the successful outcome of interpretation that “is not measured by norms and methods typical of the modern natural sciences and quantitative social sciences” (sec. 1.1 Understanding as Educative). George (2020) described the success of understanding as “educative in that we learn from our interpretive experience, perhaps not only about a matter, but thereby also about ourselves, the world, and others” (sec. 1.1 Understanding as Educative). Aims of this research employing exploratory design included, but are not limited to, gaining a richer picture of dedicated OER policy, direction on techniques, generating insights, and developing future inquiries that contribute to the OER policy landscape.

According to Gadamer (1975) understanding is the fusion of past and present horizons; each horizon is the “range of vision that can be seen from a particular vantage point” (p. 269). Zweck et al. (2008) asserted that “Gadamer suggested that the interpretation of a phenomenon reflects the intersection of the vision of the researcher (past horizon) with the view of the text (present horizon)” (p. 120). Thus, understanding of OER in dedicated OER policy in the context of my research was a fusion of horizons.


Three criteria of persuasiveness, insightfulness, and practical utility were adapted to examination of my hermeneutic research (Patterson, 1993; Patterson et al., 1998; Patterson & Williams, 2002). According to Giorgi (1975, as cited in Patterson, 1998), “persuasiveness deals with whether the reader can make a reasonable judgment about the researcher’s claims” (p. 96). Patterson and Williams (2002) cautioned “that multiple interpretations exist and we should not necessarily expect inter-rater agreement” (p. 33). However, Patterson and William, (2002) asserted that “the concept of persuasiveness encourages a focus on the product or outcome of interpretation and the empirical warrants for the interpretations presented to justify the interpretation” (p. 33). The research included the Voyant Tools results, freely accessible online with Creative Commons copyright within the same dedicated OER corpus study viewpoint, and the close reading annotations using Skim in multiple file formats for researchers (A. Maxwell et al., 2022).

Thompson (1990, as cited in Patterson, 1998) described insightfulness as research that increases understanding of a phenomenon such that “the reader is guided through data in a way that produces an understanding of the phenomenon reflecting greater insight than was held prior to reading the research” (p. 28). This investigation extends the discourse on OER policy and offers the gained insights towards dedicated OER policy opportunities for post-secondary educational institutions.

The practical utility criteria evaluated whether the inquiry uncovered an answer to the research question that addressed the problem statement, and application for other researchers (Patterson & Williams, 2002). The practical utility criteria developed by Patterson et al. (1998, p. 173) were an encapsulation of the hermeneutic evaluation of practical implications and use in future research. A foundational commitment to the research was openness, such that anyone with a computer, and access to the Internet, could obtain the software tools, data, and documents without cost (i.e., free). According to Patterson and Williams (2002), an important consideration was the usefulness of the gained “knowledge in enhancing understanding, promoting communication, or resolving conflict” (p. 35). Hence, the close and distant reading analysis, interpretations, and insights towards understanding OER in dedicated OER policy, could lead to improvements in current institutional OER policy and greater adoption of dedicated OER policy in post-secondary education. Furthermore, the open research was made freely available online for future researchers.

Share This Book


Leave a Reply

Your email address will not be published. Required fields are marked *