You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

fastnlp_1_minute_tutorial.rst 2.7 kB

6 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111
  1. FastNLP 1分钟上手教程
  2. =====================
  3. step 1
  4. ------
  5. 读取数据集
  6. .. code:: ipython3
  7. from fastNLP import DataSet
  8. # linux_path = "../test/data_for_tests/tutorial_sample_dataset.csv"
  9. win_path = "C:\\Users\zyfeng\Desktop\FudanNLP\\fastNLP\\test\\data_for_tests\\tutorial_sample_dataset.csv"
  10. ds = DataSet.read_csv(win_path, headers=('raw_sentence', 'label'), sep='\t')
  11. step 2
  12. ------
  13. 数据预处理 1. 类型转换 2. 切分验证集 3. 构建词典
  14. .. code:: ipython3
  15. # 将所有数字转为小写
  16. ds.apply(lambda x: x['raw_sentence'].lower(), new_field_name='raw_sentence')
  17. # label转int
  18. ds.apply(lambda x: int(x['label']), new_field_name='label_seq', is_target=True)
  19. def split_sent(ins):
  20. return ins['raw_sentence'].split()
  21. ds.apply(split_sent, new_field_name='words', is_input=True)
  22. .. code:: ipython3
  23. # 分割训练集/验证集
  24. train_data, dev_data = ds.split(0.3)
  25. print("Train size: ", len(train_data))
  26. print("Test size: ", len(dev_data))
  27. .. parsed-literal::
  28. Train size: 54
  29. Test size: 23
  30. .. code:: ipython3
  31. from fastNLP import Vocabulary
  32. vocab = Vocabulary(min_freq=2)
  33. train_data.apply(lambda x: [vocab.add(word) for word in x['words']])
  34. # index句子, Vocabulary.to_index(word)
  35. train_data.apply(lambda x: [vocab.to_index(word) for word in x['words']], new_field_name='word_seq', is_input=True)
  36. dev_data.apply(lambda x: [vocab.to_index(word) for word in x['words']], new_field_name='word_seq', is_input=True)
  37. step 3
  38. ------
  39. 定义模型
  40. .. code:: ipython3
  41. from fastNLP.models import CNNText
  42. model = CNNText(embed_num=len(vocab), embed_dim=50, num_classes=5, padding=2, dropout=0.1)
  43. step 4
  44. ------
  45. 开始训练
  46. .. code:: ipython3
  47. from fastNLP import Trainer, CrossEntropyLoss, AccuracyMetric
  48. trainer = Trainer(model=model,
  49. train_data=train_data,
  50. dev_data=dev_data,
  51. loss=CrossEntropyLoss(),
  52. metrics=AccuracyMetric()
  53. )
  54. trainer.train()
  55. print('Train finished!')
  56. .. parsed-literal::
  57. training epochs started 2018-12-07 14:03:41
  58. .. parsed-literal::
  59. HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=6), HTML(value='')), layout=Layout(display='i…
  60. .. parsed-literal::
  61. Epoch 1/3. Step:2/6. AccuracyMetric: acc=0.26087
  62. Epoch 2/3. Step:4/6. AccuracyMetric: acc=0.347826
  63. Epoch 3/3. Step:6/6. AccuracyMetric: acc=0.608696
  64. Train finished!
  65. 本教程结束。更多操作请参考进阶教程。
  66. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~