-
Notifications
You must be signed in to change notification settings - Fork 0
shuoyes/ARAModel
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
torch==1.11.0
transformers==4.5.0
## Pretrained Model MacBERT: https://huggingface.co/hfl/chinese-macbert-large
BERT: https://huggingface.co/bert-base-chinese
BERT-wwm: https://huggingface.co/hfl/chinese-bert-wwm
RoBERTa: https://huggingface.co/hfl/chinese-roberta-wwm-ext
We use the learning rate of 2e-5 for all pretrained Model # How to Run Most of the code is based on https://github.com/yjang43/pushingonreadability_transformers 1. Go to pushingonreadability_transformers-master folder 2. Create 5-Fold of a dataset for training. ```bash python kfold.py --corpus_path mainland.csv --corpus_name mainland ``` - Stratified folds of data will save under file name _"data/mainland.{k}.{type}.csv"_. _k_ means _k_-th of the K-Fold and _type_ is either train, valid, or test. 3. Fine-tune on dataset with pretrained model. ```bash python train.py --corpus_name mainland --model chinese-macbert-large --learning_rate 2e-5 ``` 4. Collect output probability with a trained model. ```bash python inference.py --checkpoint_path checkpoint/mainland.chinese-macbert-large.0.14 --data_path data/mainland.0.test.csv ``` 5. Collect features and combine with output probability. 6. Go to pushingonreadability_traditional_ML-master folder. 7. Create result folder and put the combination of output probability and features file into the folder. For example: mainland.0.train.combined.csv,mainland.0.test.combined.csv 8. Fed into Classifiers ```bash python nonneural-classification.py -r ``` - -r means random forest classifier
- -s means SVM
- -g means XGB
## References Pushing on Text Readability Assessment: A Transformer Meets Handcrafted Linguistic Features
https://aclanthology.org/2021.emnlp-main.834v2.pdf Tools:
https://github.com/brucewlee/pushingonreadability_traditional_ML
https://github.com/yjang43/pushingonreadability_transformers
Most of our code are modifed from the above tools
# ARAModelForReadability
About
test best pretrain model for Chinese text feature
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published