Search the Gothic BibleBETA

You can enter a string, word or clause, or search the text using regular expressions.

The engine does not do full-text search. It simply scans Streitberg's readings verse by verse and tries to match strings, across word boundaries. In other words, if you look for etun ‘they ate’, you'll also get fretun ‘they devoured’ and praufetuns ‘prophets’ – unless you select one of the word-based matching modes, or use regular expression syntax to anchor search patterns manually.

You may enter the thorn and hwair characters directly, if you can, or substitute c resp. v for them. How hwair is shown in the results depends on the display configuration. You can select interlinear translations, if you want.

This is a first version. There are some limitations.

Pattern matching

Regular expressions provide a powerful and precise method for searching text. The syntax may seem complicated, but writing expressions is actually quite easy once you know the basics. (They tend to be harder to read.)

[aeuio]	matches one character out of a set of characters (in this case a vowel)
[a-zþ]	idem, with a range of characters and one additional letter
\w	matches any word character
\W	matches any non-word character, i.e. punctuation, witespace etc.
\b	matches a word boundary
.	matches any character
a\|bc	matches a OR bc
*	repeats the preceding character or group 0, 1 or more times
+	repeats the preceding character or group 1 or more times
?	repeats the preceding character or group 0 or 1 time, i.e. makes it optional
{n,m}	repeats the preceding character or group at least `n` and at most `m` times

You can create complex expressions by grouping basic expressions in parentheses, just like you would build arithmetic expressions using numbers, parentheses and the + and × operators:

(a\|b)c	matches ac and bc
(a\|bc)(x\|y)	matches ax, ay, bcx and bcy

The operators *, ? and + are quantifiers. They indicate how many times the preceding character (or group) may be repeated:

suna?us?	matches sunus, sunu, sunau, sunaus
(a\|b)+	matches one or more times `a` or `b`, i.e. any combination a, b, aa, bb, ab, abba, etc. ad infinitum.

Quantifiers are not wildcards. Be sure to understand the difference when you normally use * and ? as wildcards. The corresponding regular expressions are:

.	any single character (equivalent to wildcard ? in other systems)
.*	zero, one or more occurrences of any character (equivalent to wildcard *)

In practice, you'll often want to use \w (any word character) rather than . to match unknown characters:

[A-Z]\w*us

Proper names ending in us. (Contrary to full-text systems, regular expressions are case sensitive by default!)

We also defined the following non-standard character placeholders:

\V	any Gothic vowel
\C	any Gothic consonant

There are many more things you can do. Have a look at this comprehensive overview if you want to know the details. Below are a few examples to get you started.

Examples

Most of the examples operate in default matching mode and use \b or \w to simulate word-based searches. Alternatively, you can select a word-based matching mode and simply enter the word pattern you want to find in words, or at the start or end of words.

att(an?|ins?): Matches atta, attan, attin and attins.
liuha[þd](a|is)?: Matches singular case forms of liuhaþ.
hlai[bf]\w*: Matches hlaifs, hlaif, hlaiba, hlaibis, ...
\bhlai[bf]\w*: The same, but with initial word boundary, excluding gahlaibaim.; = match at start of word + hlai[bf]
\betun\b: Matches the word etun.; = match exact word + etun
aza\b: Matches words that end in aza, e.g. passive verb forms.; = match at end of word + aza
\b(sa|so|þata)\b: Demonstrative pronoun (or article) nominative singular.; = match exact word + sa|so|þata
\C{4,}: Matches clusters of 4 consonants or more.; = [bdfghjklmnpqrstwxzþƕ]{4,}
Kre[kt]\w*: Timeo Danaos et dona ferentes... (Particularly when they bring paradoxes, as they seem to do in Titus.); = match words + Kre[kt]
[A-Z]\w+(a?us?|jus|uns|um|iwe)\b: Finds proper names with u-declension. (Remember that regular expressions are case-sensitive by default. Conveniently, Streitberg's text only uses capitals for names and chapter initials. (There seem to be some exceptions though.))
\bin þ(amma|izai|aim) \w{3,}: Finds in + dative pronoun + any word that is at least 3 characters long.
·[\w·]+·: Finds numbers like ·ib·.
\[[^\]]+\]: Finds text deleted by Streitberg.
<[^>]+>: Finds text added by Streitberg.
\w+(~\w+)+: Finds words that show enclisis and/or assimilation (marked with ~).
\b(\w+)\b( \b\1\b)+: Finds repeated words. Literally, the expression reads ‘one or more word characters between word boundaries, captured as a group, followed by one or more occurrences of space followed by the first captured string between word boundaries’.
\b(\C)\w+( \1\w+){3,}: Alliteration? (Probably accidental...)
\bmiþ[^þ\W]\w*: Matches words starting with miþ excluding miþþ-.
Note the use of [^þ\W] (literally: any character that is not þ and not a not-word-char) as a somewhat quirky way to say any word character except þ.; = match at start of word + miþ[^þ\W]
\b(at|af|ana|and|bi|du|ga|in|us)?(\C)ai\2\w+: Matches reduplicating verb forms with optional prefixes (resulting in some false positives, e.g. taitrarkes).; = match at start of word + (at|af|ana|and|bi|du|ga|in|us)?(\C)ai\2\w+
\w*[bdfhjklmnpqrstwxzþ]w[bdfghklmnpqrstwxzþ]\w*: Finds words that have ⟨w⟩ between consonants, i.e. when it represents Greek ⟨υ⟩, as in swnagogei, but excluding -ggw- in triggws and -wj- in manwjan e.a.; = match words + [bdfhjklmnpqrstwxzþ]w[bdfghklmnpqrstwxzþ]

Characters

Regular expressions operate on strings over an alphabet. Knowing the alphabet is helpful when writing complex expressions. The text contains these Unicode characters:

aeuio	vowels (lower- and uppercase)
bdfghjklmnpqrstwxzþƕ	consonants (lower- and uppercase)
ï	Esaïan, gaïddja, ... (more)
û	þû is sa qimanda
.,:;?!	punctuation
“ ” ‘ ’	quotation marks (in Skeireins)
~	assimilation: jan~ni (= jah ni)
·	midpoint delimits numbers: du Kaurinþium ·b· ustauh
—	em dash
< > [ ]	additions and deletions by Streitberg
_	underscore marks a gap in a word, e.g. . . . . _teins þis balsanis warþ?

As mentioned above, you may substitute c for þ and v for ƕ (hwair).

Known limitations

The search engine only scans Gothic text. You won't get results for John 1:1, Wulfila, dative, Christ, Streitberg, Codex Argenteus etc. (Use Google or another search engine for these.)
It scans the text as edited by Streitberg, including marks for additions <...> and deletions [...]. As these may occur within words, you will occasionally miss results that you would expect to find. (We are working on a more sophisticated engine that looks in different views on the text.)
It is not yet possible to look for headwords or to filter on grammatical tags in the database. (However, thanks to Gothic's rich inflectional morphology, you can get quite far by filtering on endings and other morphological features, as shown in the examples.)
The engine will abort when a query takes too long. We have to set a limit to protect the server against denial of service attacks with maliciously crafted expressions. (Because of the way regular expression matching works, foul expressions could literally run for days.) If you get a warning, try to use a more simple or alternative expression.
Similarly, queries that yield many results (e.g. looking for a single letter or any word) may be refused or truncated. Given the small size of the corpus, we prefer to have all results at once, without paging. (This is convenient when you want to copy/paste results.) But there are better ways to download the entire text than entering an expression that returns every line in the Gothic bible.
Last but not least: this is a beta version. There may be unexpected glitches. Drop a note when something goes wrong or can be improved, or when you come up with better examples.