-

-

Abstract

Syntactic and semantic word segmentation and labelling in a given text of a large corpus, is one of the basic research
activities to produce a linguistic database for the sake of language modelling. In this paper, the author explains the difficulties encountered to manage such an activity in the project "a feasibility study for Farsi language n'lOdelling".
Several linguistic criteria and one engineering criterion were used to handle the difficulties. Finally, based on an n-state n1arkov process (n=O, 1,2,3), a software package is written to extract Farsi words conditional probabilities distributions for both labels-dependent and independent cases.

Keywords

Language Modelling
Markov Process
Segmentation and Labelling

163-162, Issue 0 - Serial Number 1009
September 2002

-

163-162, Issue 0 - Serial Number 1009September 2002

Files

Share

How to cite

Statistics

163-162, Issue 0 - Serial Number 1009
September 2002