You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

test_findword.py 662 B

6 years ago
1234567891011121314151617181920212223242526272829
  1. # -*- encoding:utf-8 -*-
  2. """
  3. * Copyright (C) 2017 OwnThink.
  4. *
  5. * Name : test_findword.py
  6. * Author : zengbin93 <zeng_bin8888@163.com>
  7. * Version : 0.01
  8. * Description : 新词发现算法 Unittest
  9. """
  10. import os
  11. import jiagu
  12. import unittest
  13. class TestFindWord(unittest.TestCase):
  14. def setUp(self):
  15. self.input_file = r"C:\迅雷下载\test_msr.txt"
  16. self.output_file = self.input_file.replace(".txt", '_words.txt')
  17. def tearDown(self):
  18. os.remove(self.output_file)
  19. def test_findword(self):
  20. jiagu.findword(self.input_file, self.output_file)
  21. self.assertTrue(os.path.exists(self.output_file))

Jiagu使用大规模语料训练而成。将提供中文分词、词性标注、命名实体识别、情感分析、知识图谱关系抽取、关键词抽取、文本摘要、新词发现、情感分析、文本聚类等常用自然语言处理功能。参考了各大工具优缺点制作,将Jiagu回馈给大家

Contributors (1)