Apertium linguistic data for English
Find a file
2025-09-08 19:15:33 +02:00
.github/workflows Add workflows 2022-05-21 13:45:43 +02:00
dev find-duplicates.sh 2022-08-07 22:19:37 +02:00
tagger-data more cleaned up texts for tagger training 2025-04-18 19:51:07 +02:00
test update tests 2024-11-29 18:07:31 -05:00
texts more cleaned up texts for tagger training 2025-04-18 19:51:07 +02:00
.gitattributes Scripts and updated data for tagger training 2025-04-05 11:51:09 +02:00
.gitignore Scripts and updated data for tagger training 2025-04-05 11:51:09 +02:00
apertium-eng.eng.acx Clarify license of files 2018-08-08 21:25:01 +02:00
apertium-eng.eng.dix with the naked eye 2025-09-08 19:15:33 +02:00
apertium-eng.eng.mtx Include commondefs in MTX 2018-09-20 17:39:12 +02:00
apertium-eng.eng.rlx seems to work better 2023-11-10 09:56:57 -05:00
apertium-eng.eng.syn.rlx initial syntactic function annotation CG 2019-06-22 11:00:29 +02:00
apertium-eng.eng.tsx More updated texts and fixes for the tagger 2025-04-05 18:07:04 +02:00
apertium-eng.pc.in Remove lib from data-only pkg-config 2016-02-01 20:23:16 +00:00
apertium-eng.post-eng.dix Clarify license of files 2018-08-08 21:25:01 +02:00
AUTHORS Add words from pull requests 2023-07-01 13:23:33 -04:00
autogen.sh files from apertium-init 2015-03-14 08:01:53 +00:00
ChangeLog files from apertium-init 2015-03-14 08:01:53 +00:00
commondefns.mtx [eng] MTX for perceptron tagger training 2017-08-07 17:11:54 +00:00
configure.ac "check for apertium-regtest in configure.ac" 2021-07-19 10:20:22 -05:00
COPYING files from apertium-init 2015-03-14 08:01:53 +00:00
eng.perceptron.prob Scripts and updated data for tagger training 2025-04-05 11:51:09 +02:00
eng.prob Scripts and updated data for tagger training 2025-04-05 11:51:09 +02:00
Makefile.am setup apertium-regtest 2021-07-16 10:44:33 -05:00
modes.xml autopgen in gener mode 2025-05-06 17:58:00 -04:00
NEWS merging EN-FR into ENG: 2016-12-01 15:08:07 +00:00
README Update README 2025-04-06 13:00:40 +02:00
README.md Added README.md symlink to README 2019-01-01 13:40:10 +01:00
tagger.supervised.make More updated texts and fixes for the tagger 2025-04-05 18:07:04 +02:00
tagger.unsupervised.make More updated texts and fixes for the tagger 2025-04-05 18:07:04 +02:00

English: apertium-eng

This is an Apertium monolingual language package for English. What you can use this language package for:

  • Morphological analysis of English
  • Morphological generation of English
  • Part-of-speech tagging of English

Requirements

You will need the following software installed:

  • lttoolbox (>= 3.5.0)
  • apertium (>= 3.6.0)
  • vislcg3 (>= 1.3.0)

If this does not make any sense, we recommend you look at: https://apertium.org

Compiling

Given the requirements being installed, you should be able to just run:

$ ./autogen.sh
$ make

If you're doing development, you don't have to install the data, you can use it directly from this directory.

If you are installing this language package as a prerequisite for an Apertium translation pair, then do (typically as root / with sudo):

# make install

You can give a --prefix to ./autogen.sh to install as a non-root user, but make sure to use the same prefix when installing the translation pair and any other language packages.

If any of this doesn't make sense or doesn't work, see https://wiki.apertium.org/wiki/Install_language_data_by_compiling

Testing

If you are in the source directory after running make, the following commands should work:

$ echo "the blue house" | apertium -d . eng-morph ^the/the ^blue/blue<n><sg>/blue<adj><sint> ^house/house/house/house/house$

$ echo "the blue house" | apertium -d . eng-tagger ^the ^blue<adj><sint> ^house$

Tagger model training

To train the tagger model, do one of the following:

Supervised training:

$ make -f tagger.supervised.make

Unsupervised training

$ make -f tagger.unsupervised.make

For details on the corpora used in training, check the corpora information.

For more information, see https://wiki.apertium.org/wiki/Tagger_training

A perceptron tagger model is also included as eng.perceptron.prob. If you want to use it, it needs to be called with -gx: apertium-tagger -gx eng.perceptron.prob

Files and data

For more information

Help and support

If you need help using this language pair or data, you can contact:

See also the file AUTHORS, included in this distribution.

License

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.