You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

test_mmseg.py 983 B

6 years ago
1234567891011121314151617181920212223242526272829303132333435
  1. #!/usr/bin/env python
  2. # -*- coding: utf-8 -*-
  3. """
  4. * Copyright (C) 2018.
  5. *
  6. * Name : test_mmseg.py
  7. * Author : Leo <1162441289@qq.com>
  8. * Version : 0.01
  9. * Description : mmseg分词方法测试
  10. """
  11. import unittest
  12. import jiagu
  13. class TestTextRank(unittest.TestCase):
  14. def setUp(self):
  15. pass
  16. def tearDown(self):
  17. pass
  18. def test_seg_one(self):
  19. sentence = "人要是行干一行行一行"
  20. words = jiagu.seg(sentence, model="mmseg")
  21. self.assertTrue(list(words) == ['人', '要是', '行', '干一行', '行', '一行'])
  22. def test_seg_two(self):
  23. sentence = "武汉市长江大桥上的日落非常好看,很喜欢看日出日落。"
  24. words = jiagu.seg(sentence, model="mmseg")
  25. self.assertTrue(list(words) == ['武汉市', '长江大桥', '上', '的', '日落', '非常', '好看', ',', '很', '喜欢', '看', '日出日落', '。'])
  26. if __name__ == '__main__':
  27. unittest.main()

Jiagu使用大规模语料训练而成。将提供中文分词、词性标注、命名实体识别、情感分析、知识图谱关系抽取、关键词抽取、文本摘要、新词发现、情感分析、文本聚类等常用自然语言处理功能。参考了各大工具优缺点制作,将Jiagu回馈给大家

Contributors (1)