FastNLP 1分钟上手教程 ===================== 教程原文见 https://github.com/fastnlp/fastNLP/blob/master/tutorials/fastnlp_1min_tutorial.ipynb step 1 ------ 读取数据集 .. code:: ipython3 from fastNLP import DataSet # linux_path = "../test/data_for_tests/tutorial_sample_dataset.csv" win_path = "C:\\Users\zyfeng\Desktop\FudanNLP\\fastNLP\\test\\data_for_tests\\tutorial_sample_dataset.csv" ds = DataSet.read_csv(win_path, headers=('raw_sentence', 'label'), sep='\t') step 2 ------ 数据预处理 1. 类型转换 2. 切分验证集 3. 构建词典 .. code:: ipython3 # 将所有数字转为小写 ds.apply(lambda x: x['raw_sentence'].lower(), new_field_name='raw_sentence') # label转int ds.apply(lambda x: int(x['label']), new_field_name='label_seq', is_target=True) def split_sent(ins): return ins['raw_sentence'].split() ds.apply(split_sent, new_field_name='words', is_input=True) .. code:: ipython3 # 分割训练集/验证集 train_data, dev_data = ds.split(0.3) print("Train size: ", len(train_data)) print("Test size: ", len(dev_data)) .. parsed-literal:: Train size: 54 Test size: 23 .. code:: ipython3 from fastNLP import Vocabulary vocab = Vocabulary(min_freq=2) train_data.apply(lambda x: [vocab.add(word) for word in x['words']]) # index句子, Vocabulary.to_index(word) train_data.apply(lambda x: [vocab.to_index(word) for word in x['words']], new_field_name='word_seq', is_input=True) dev_data.apply(lambda x: [vocab.to_index(word) for word in x['words']], new_field_name='word_seq', is_input=True) step 3 ------ 定义模型 .. code:: ipython3 from fastNLP.models import CNNText model = CNNText(embed_num=len(vocab), embed_dim=50, num_classes=5, padding=2, dropout=0.1) step 4 ------ 开始训练 .. code:: ipython3 from fastNLP import Trainer, CrossEntropyLoss, AccuracyMetric trainer = Trainer(model=model, train_data=train_data, dev_data=dev_data, loss=CrossEntropyLoss(), metrics=AccuracyMetric() ) trainer.train() print('Train finished!') .. parsed-literal:: training epochs started 2018-12-07 14:03:41 .. parsed-literal:: HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=6), HTML(value='')), layout=Layout(display='i… .. parsed-literal:: Epoch 1/3. Step:2/6. AccuracyMetric: acc=0.26087 Epoch 2/3. Step:4/6. AccuracyMetric: acc=0.347826 Epoch 3/3. Step:6/6. AccuracyMetric: acc=0.608696 Train finished! 本教程结束。更多操作请参考进阶教程。 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~