Frequently Asked Questions of the Oxford Text Archive

Table of contents

1. Most Frequent Question

1.1. Why can't my computer open the files that I have downloaded?

The resources in the OTA do not rely on file suffixes to indicate what application should be used with them. You shouldn't expect your computer operating system to be able to identify the type of files and suggest an appropriate program to process it. This would require your computer to know all about the various practices that have been followed in the past 30 years of text encoding in the Humanities, and it would need to know how you wish to process the file. Resources in the OTA are made available for the purposes of scholarly research, so we assume some familiarity with electronic text, but we don't assume that we know what you want to do with the files.

Having said that, the files that you have downloaded are probably plain text files, and you could try using a text editor or a web browser to see what it is in there. Unfortunately, the OTA does not have the resources to deal with queries about problems relating to opening files.

2. About the OTA

2.1. What is the OTA?

The University of Oxford Text Archive (OTA) is a repository of digital literary and linguistic resources for research and teaching. We also offer advice to resource creators about best practice for creating digital resources, and to users of digital resources on how to benefit from existing resources.

The OTA used to host AHDS Literature, Languages and Linguistics.

2.2. Who funds the OTA?

The OTA is supported by Bodleian Libraries and IT Services, University of Oxford, and additional funding is sometimes acquired through project work. The archiving and repository services of the OTA are provided pro bono to the academic community.

The OTA received funding for its AHDS activities from the JISC and AHRC for around ten years until March 2008.

2.3. Where is the OTA?

The staff responsible for the OTA are physically located at Osney One Building in Oxford. Please see our Contact Information page for our addresses.

2.4. How much do your services cost?

Access to our catalogue is free. You can search the collection and download resources for free. We used to offer free advisory and archival services to all UK higher and further institutions as part of our AHDS remit until the end of March 2008. We are willing to provide these and other services, undertake project work, and provide consultation and have a pricing policy with regards to such activities. If you have any questions about our services, please do not hesitate to get in touch with us at

2.5. What projects are you involved in?

The OTA is involved in numerous projects, initiatives, services, special interest groups and associations. A list can be found on the projects page.

2.6. What plans do you have for future developments?

The OTA will continue to develop its current services but also wants to respond to changes in the needs of our user community. If you have any suggestions of developments that you would like to see, please let us know at .

2.7. Who works at the OTA?

We maintain a list of current staff on our Contact Information page.

2.8. What was the AHDS?

The Arts and Humanities Data Service (AHDS) was a national service in the UK aiding the discovery, creation and preservation of digital resources in and for research, teaching and learning in the arts and humanities. The AHDS covered five subject areas, and was organised via an Executive at King's College London and five AHDS Centres, hosted by various Higher Education Institutions. The AHDS was funded by the Joint Information Systems Committee and the Arts and Humanities Research Council. The AHDS website is no longer available.

2.9. When did the OTA start?

The OTA was founded by Lou Burnard in 1976. We celebrated our 30th birthday with a number of events in 2006, and our fortieth in 2016.

2.10. Is the OTA part of the University of Oxford?


2.11. Is the OTA part of the Oxford University Press?


2.12. Is the OTA part of CLARIN?

Yes. The OTA collections can be found via the Virtual Language Observatory, and the OTA is involved in a number of initiatives to share resources via CLARIN. The OTA is a registered CLARIN C Centre, and a migration to the CLARIN DSpace platform is under way, with a launch planned in 2018.

3. Searching and Downloading

3.1. Do you only have texts?

Traditionally, most of our deposits have been in textual form. We do, however, have a number of other resource types, for example databases and some spoken resources in digital audio and video files.

3.2. What's the best way to find what I'm looking for?

You can search the website and resources on our search page. You can also browse the catalogue sorting by author, title, or language. Click on the Catalogue link at the top of any page.

3.3. Why do some texts seem to appear more than once?

The OTA collection consists of resources deposited with us over a long period of time. Some resources may exist in more than one variant. These can be either be different editions of the same text, different electronic versions of the same edition, or the same resource in different formats, for example in plain text as well as XML.

3.4. What formats are the resources in?

The resources are nearly in all in text formats, but the way in which they have been marked up follows various conventions. Follow the link to 'more info' for each resource to learn about the format.

3.5. What is the TEI?

The Text Encoding Initiative (TEI) issue guidelines for the mark-up of text. To learn more about the TEI, visit their web page at We have a great deal of expertise in TEI encoding and are able to provide detailed advice on preparing TEI XML resources.

3.6. Are all your resources in TEI XML?

No, the format of the resources varies. Each resource does, however, have a TEI header, containing information about the resource. The static pages that make up the website, such as this FAQ, are all also stored as TEI XML.

3.7. What is SGML?

Standard Generalized Markup Language is an international standard used for annotating documents with information about structure and semantics in a way that both computers and humans can understand. HTML and XML are based on the earlier SGML standard.

3.8. What is XML?

XML stands for eXtensible Markup Language. It is a standardised way of tagging texts, in order to represent information about the structure of documents, and can also be used to add annotations and interpretative information. It is a simplified subset of the Standard Generalized Markup Language (SGML).

3.9. Why does it say 'unknown format' in the description of some resources?

The OTA started archiving texts before there were generally accepted standards for text formats. Some of our older resources were deposited in a format that is unknown or poorly documented, perhaps with annotation that does not follow any standards. Such texts are given the label 'unknown format'.

3.10. What is the difference between a freely available text and a restricted one?

The resources in the OTA collection have been deposited with the Archive under different licenses. Some depositors require that you register and sometimes also contact them before you are allowed to download their resource. These resources you have to request first by filling out a form. Other resources are able to be freely downloaded, but this still involves providing your email so we can send you a link at which you can download the text.

3.11. Why do some resources require asking for permission?

Some of our depositors want us to identify the users before they can access the resources. It may be that they want to know who is using the resource or that they are working on improving or expanding the resource and may have a later version available. The OTA encourages all depositors to make their works freely available if at all possible.

3.12. Why do I have to give you my email before downloading?

The resources in the OTA collection has been deposited by different individuals. Some are happy to make the resources freely available while others impose certain restrictions. One such restriction is that interested users have to register before they can download the resource. In order to simplify the maintenance of the OTA website and delivery of its resources we use the same process for both restricted and freely available resources. In the case of freely available ones, we only ask for your email address and send you a link to download the text fairly quickly. Those requesting restricted resources may have to wait longer.

3.13. What will you do with my personal information?

The OTA is part of the University of Oxford which is registered under the Data Protection Act 1998. Personal information submitted via forms within the domain will be stored securely. This information may be used for a number of activities such as: statiscal analysis to benefit the OTA user community, to assist with any queries you have regarding a resource you have downloaded, and where required to keep depositors informed of the users/uses of their material. We will not otherwise distribute, sell, trade or rent your personal information to third parties. Please also see our data protection statement

3.14. I requested a freely available text where is it?

A link to the resource you requested should have been emailed to you at the email address you provided when requesting the text. If you filled in the email address incorrectly, you will not receive your notification. If an hour or two has passed an you have not received your notification, and it hasn't ended up in your junk or spam folder, then try to request the text again. If you still have no luck, email us at: and we will try to find out what has gone wrong.

3.15. I requested a restricted text where is it?

When you request a restricted text, a notification goes into a queue awaiting the receipt of your signed request form. We will then fulfill the requirements of the depositor (e.g. recording your information, contacting them for permission) and when we are able to we will email you a link from which you can download the resource. This may take days or weeks depending upon the conditions (and willingness to respond) of the depositor. If you are concerned that your request is taking a long time email us at: and we will try to find out its status.

3.16. Do you distribute the BNC?

Yes - take a look at the corpora in the catalogue. In past years, the British National Corpus (BNC) was curated by Oxford University Computing Services separately from the Oxford Text Archive collections. It still has its own website:

3.17. Is there a printed catalogue available?

No. We used to provide a printed catalogue many years ago, however, but we do not now publish a printed catalogue. We recommend the search and catalogue pages.

3.18. What languages are the resources in?

The OTA holds resources in a large number of languages, although English is the most popular. You can find out what languages there are in the collection by sorting the catalogue by language.

4. Depositing

4.1. Do deposits have to be in TEI XML?

No, we can accept deposit in other formats as well, as long as they are of sufficient quality and come with good documentation. Some formats are less suitable for preservation and we may not be able to guarantee that these remain usable in the future. We may refuse deposits that are in an unsuitable format or for a variety of other reasons.

4.2. How do I deposit resources?

Please get in touch with us at by email at and we'll be happy to explain the options. Please also see our deposits page.

4.3. Why does my deposit not appear in the catalogue?

Your deposit (or another one you believe us to have) may not appear in our catalogue for a number of reasons. These could include that it is still being accessioned, or (if previously available) that it has temporarily been taken off line for a number of reasons. (This can range from migration or website maintenance to someone having expressed a copyright concern.) These are sometimes temporary, and if it turns out to be permanent we will make a best-effort attempt to contact the original depositor. If the deposit is still not there when you check back after a reasonable length of time, please get in touch with us at by email at and we'll be happy to investigate the matter.