Implement in-memory dictionary

Description

The dictionary in Matterhorn stores words and stopwords extracted from multiple wikipedia language exports in a relational database table. This is a fairly heavyweight solution. Given enough memory and a more selective list of words (wikipedia contains a lot of oddities and typos), these dictionaries can be stored in memory.

Implement a simple in-memory dictionary implementation containing only valid words, obtained from more reliable sources. Stop words are a feature that is not (yet) used in the system, so do not need to be included.

Activity

Show:
Josh Holtzman
March 7, 2011, 7:16 AM

Attached a patch that replaces the JPA dictionary implementation with an English-only in-memory implementation. The word list was generated from a combination of:

Greg Logan
April 24, 2012, 4:00 AM

Per http://opencast.3480289.n2.nabble.com/JIRA-Ticket-Cleanup-proposal-td7475080.html, this has been bulk resolved as won't fix. If this is still important to you please reopen and we can triage as appropriate.

Assignee

Unassigned

Reporter

Josh Holtzman

Tags (folksonomy)

None

Components

Fix versions

Affects versions

Priority

Major
Configure