Khmer unicode text data that can be use for pretrain or fine tune a Khmer language model. This goal of this repos is to serve as a collection of Khmer corpus that can be used to train a Khmer language ...