Implement in-memory dictionary

Description

The dictionary in Matterhorn stores words and stopwords extracted from multiple wikipedia language exports in a relational database table. This is a fairly heavyweight solution. Given enough memory and a more selective list of words (wikipedia contains a lot of oddities and typos), these dictionaries can be stored in memory.

Implement a simple in-memory dictionary implementation containing only valid words, obtained from more reliable sources. Stop words are a feature that is not (yet) used in the system, so do not need to be included.

Steps to reproduce

None

Status

Assignee

Unassigned

Reporter

Josh Holtzman

Criticality

None

Tags (folksonomy)

None

Components

Fix versions

Affects versions

None

Priority

Major
Configure