change readme

dbiir · Mar 9, 2024 · 4d31484 · 4d31484
1 parent 69f4413
commit 4d31484
Show file tree

Hide file tree

Showing 2 changed files with 16 additions and 15 deletions.
diff --git a/README.md b/README.md
@@ -7,7 +7,7 @@
 
 <img src="logo.jpg" width="390" hegiht="390" align=left />
 
-Pre-training has become an essential part for NLP tasks. UER-py (Universal Encoder Representations) is a toolkit for pre-training on general-domain corpus and fine-tuning on downstream task. UER-py maintains model modularity and supports research extensibility. It facilitates the use of existing pre-training models, and provides interfaces for users to further extend upon. With UER-py, we build a model zoo which contains pre-trained models of different properties. **See the Wiki for [Full Documentation](https://github.com/dbiir/UER-py/wiki)**.
+Pre-training has become an essential part for NLP tasks. UER-py (Universal Encoder Representations) is a toolkit for pre-training on general-domain corpus and fine-tuning on downstream task. UER-py maintains model modularity and supports research extensibility. It facilitates the use of existing pre-training models, and provides interfaces for users to further extend upon. With UER-py, we build a model zoo which contains pre-trained models of different properties. **See the [UER-py project Wiki](https://github.com/dbiir/UER-py/wiki) for full documentation**.
 
 <br/>
 <br/>
@@ -160,19 +160,20 @@ UER-py is organized as follows：
 ```
 UER-py/
     |--uer/
-    |    |--embeddings/ # contains embeddings
-    |    |--encoders/ # contains encoders such as RNN, CNN, 
-    |    |--decoders/ # contains decoders
-    |    |--targets/ # contains targets such as language modeling, masked language modeling
-    |    |--layers/ # contains frequently-used NN layers, such as embedding layer, normalization layer
-    |    |--models/ # contains model.py, which combines embedding, encoder, and target modules
+    |    |--embeddings/ # contains modules of embedding component
+    |    |--encoders/ # contains modules of encoder component such as RNN, CNN, Transformer
+    |    |--decoders/ # contains modules of decoder component
+    |    |--targets/ # contains modules of target component such as language modeling, masked language modeling
+    |    |--layers/ # contains frequently-used NN layers
+    |    |--models/ # contains model.py, which combines modules of different components
     |    |--utils/ # contains frequently-used utilities
     |    |--model_builder.py
     |    |--model_loader.py
     |    |--model_saver.py
+    |    |--opts.py
     |    |--trainer.py
     |
-    |--corpora/ # contains corpora for pre-training
+    |--corpora/ # contains pre-training data
     |--datasets/ # contains downstream tasks
     |--models/ # contains pre-trained models, vocabularies, and configuration files
     |--scripts/ # contains useful scripts for pre-training models
@@ -184,7 +185,7 @@ UER-py/
     |--README.md
     |--README_ZH.md
     |--requirements.txt
-    |--logo.jpg
+    |--LICENSE
 
 ```
 
@@ -214,7 +215,7 @@ UER-py has been used in winning solutions of many NLP competitions. In this sect
 <br/>
 
 ## Contact information
-For communication related to this project, please contact Zhe Zhao (helloworld@ruc.edu.cn; nlpzhezhao@tencent.com) or Yudong Li (liyudong123@hotmail.com) or Cheng Hou (chenghoubupt@bupt.edu.cn) or Wenhang Shi (wenhangshi@ruc.edu.cn).
+For communication related to this project, please contact Zhe Zhao (helloworld@alu.ruc.edu.cn; nlpzhezhao@tencent.com) or Yudong Li (liyudong123@hotmail.com) or Cheng Hou (chenghoubupt@bupt.edu.cn) or Wenhang Shi (wenhangshi@ruc.edu.cn).
 
 This work is instructed by my enterprise mentors __Qi Ju__, __Xuefeng Yang__, __Haotang Deng__ and school mentors __Tao Liu__, __Xiaoyong Du__.
 

diff --git a/README_ZH.md b/README_ZH.md
@@ -7,7 +7,7 @@
 
 <img src="logo.jpg" width="390" hegiht="390" align=left />
 
-预训练已经成为自然语言处理任务的重要组成部分，为大量自然语言处理任务带来了显著提升。UER-py（Universal Encoder Representations）是一个用于对通用语料进行预训练并对下游任务进行微调的工具包。UER-py遵循模块化的设计原则。通过模块的组合，用户能迅速精准的复现已有的预训练模型，并利用已有的接口进一步开发更多的预训练模型。通过UER-py，我们建立了一个模型仓库，其中包含不同性质的预训练模型（例如基于不同编码器和目标任务）。用户可以根据具体任务的要求，从中选择合适的预训练模型使用。**[完整文档](https://github.com/dbiir/UER-py/wiki/主页)请参见本项目Wiki**。
+预训练已经成为自然语言处理任务的重要组成部分，为大量自然语言处理任务带来了显著提升。UER-py（Universal Encoder Representations）是一个用于对通用语料进行预训练并对下游任务进行微调的工具包。UER-py遵循模块化的设计原则。通过模块的组合，用户能迅速精准的复现已有的预训练模型，并利用已有的接口进一步开发更多的预训练模型。通过UER-py，我们建立了一个模型仓库，其中包含不同性质的预训练模型（例如基于不同语料、编码器、目标任务）。用户可以根据具体任务的要求，从中选择合适的预训练模型使用。**完整文档请参见[本项目Wiki]((https://github.com/dbiir/UER-py/wiki/主页))**。
 
 
 <br>
@@ -36,10 +36,10 @@
 UER-py有如下几方面优势:
 - __可复现__ UER-py已在许多数据集上进行了测试，与原始预训练模型实现（例如BERT、GPT-2、ELMo、T5）的表现相匹配
 - __模块化__ UER-py使用解耦的模块化设计框架。框架分成Embedding、Encoder、Target等多个部分。各个部分之间有着清晰的接口并且每个部分包括了丰富的模块。可以对不同模块进行组合，构建出性质不同的预训练模型
-- __模型训练__ UER-py支持CPU、单机单GPU、单机多GPU、多机多GPU训练模式
-- __模型仓库__ 我们维护并持续发布预训练模型。用户可以根据具体任务的要求，从中选择合适的预训练模型使用
+- __模型训练__ UER-py支持单机CPU、单机GPU、多机多GPU训练模式
+- __模型仓库__ 我们维护并发布预训练模型。用户可以根据具体任务的要求，从中选择合适的预训练模型使用
 - __SOTA结果__ UER-py支持全面的下游任务，包括文本分类、文本对分类、序列标注、阅读理解等，并提供了多个竞赛获胜解决方案
-- __预训练相关功能__ UER-py提供了丰富的预训练相关的功能和优化，包括特征抽取、近义词检索、预训练模型转换、模型集成、文本生成等
+- __预训练相关功能__ UER-py提供了丰富的预训练相关的功能，包括特征抽取、近义词检索、预训练模型转换、模型集成、文本生成等
 
 
 <br/>
@@ -75,7 +75,7 @@ doc2-sent1
 doc3-sent1
 doc3-sent2
 ```
-书评语料是由书评分类数据集去掉标签得到的。我们将一条评论从中间分开，从而形成一个两句话的文档，具体可见*corpora*文件夹中的*book_review_bert.txt*。
+书评语料是由书评情感分类数据集去掉标签得到的。我们将一条评论从中间分开，从而形成一个两句话的文档，具体可见*corpora*文件夹中的*book_review_bert.txt*。
 
 分类数据集的格式如下：
 ```