【Python】Transformers加载BERT模型from_pretrained()问题解决

星拱北辰 2023-07-28 原文

文章目录

开发环境搭建

Ubuntu服务器上安装Miniconda，通过VSCode或PyCharm或Gateway连接远程开发。

推荐阅读：VSCode通过虚拟环境运行Python程序

安装PyTorch、Scikit-Learn、Transformers等库。

推荐阅读：Conda安装TensorFlow和PyTorch的GPU支持包

说明：安装Scikit-Learn的时候不要pip install sklearn，应该pip install scikit-learn。

OSError: Can‘t load config for ‘xxxxxx’. If you were trying

遇到报错：
OSError: Can‘t load config for ‘xxxxxx’. If you were trying

根据这篇博客，试着手动下载了bert-base-uncased的相关文件，但还是不能成功。

UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x80 in position 0: invalid start byte

遇到报错：
UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x80 in position 0: invalid start byte

根据网上的文章，该错误的产生原因大致是以错误的编码格式和读取方式读取了二进制文件，在此工程中无法处理。

Can’t load the configuration of ‘xxxxxx’.

Can’t load the configuration of ‘xxxxxx’. If you were trying to load it from ‘https://huggingface.co/models’, make sure you don’t have a local directory with the same name. Otherwise, make sure ‘xxxxxx’ is the correct path to a directory containing a config.json file

引入如下脚本，先将bert-base-uncased模型从Huggingface的仓库中download到本地：

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")
tokenizer.save_pretrained('./bert/')
model.save_pretrained('./bert/')

Loading model from pytorch_pretrained_bert into transformers library

查看Huggingface官方Discussion帖子Loading model from pytorch_pretrained_bert into transformers library，有这样一段话：

Hi. This is probably caused by the transformer verison. You might downgrade your transformer version from 4.4 to 2.8 with pip install transformers==2.8.0

因此尝试将transformers版本降到2.8.0。

首先查看transformers版本：
pip show transformers

输出信息显示版本为4.26.1：
Name: transformers
Version: 4.26.1
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: transformers@huggingface.co
License: Apache
Location: xxxxxxxxxxxxxxxxxxxxxxx
Requires: filelock, huggingface-hub, importlib-metadata, numpy, packaging, pyyaml, regex, requests, tokenizers, tqdm
Required-by:

直接安装transformers的2.8.0版本：
pip install transformers==2.8.0

遇到一串错误，其中一行是：
During handling of the above exception, another exception occurred:

卸载transformers：
pip uninstall transformers

随后安装：
pip install transformers==2.8.0

遇到错误：
ERROR: Could not find a version that satisfies the requirement boto3 (from transformers) (from versions: none)
ERROR: No matching distribution found for boto3

ERROR: No matching distribution found for boto3

为了解决上面的问题，参考StackOverflow，安装boto3库。

首先查看是否已安装boto3：
pip show boto3

输出结果：
WARNING: Package(s) not found: boto3

显然，没有安装过。

然后正式安装boto3：
pip install boto3

随后安装2.8.0版本的transformers库：
pip install transformers==2.8.0

Missing key(s) in state_dict: “bert.embeddings.position_ids”.

安装2.8.0版本的transformers库后，运行程序报错：
Missing key(s) in state_dict: “bert.embeddings.position_ids”.

参考这篇博客稍加改造后，加入以下代码：

cudnn.benchmark = True

仍然报错：
TypeError: ‘BertTokenizer’ object is not callable

检索到GitHub的一个相关Issue：TypeError: ‘BertTokenizer’ object is not callable #53，该Issue的回复指出：

Transformers fails “TypeError: ‘BertTokenizer’ object is not callable” if the installed version is <v3.0.0 . In the requirements file, transformers should be “transformers>=3.0.0”

因此决定将版本升到3.0.0：
pip install transformers==3.0.0

成功，部分输出如下：
Installing collected packages: tokenizers, transformers
Attempting uninstall: tokenizers
Found existing installation: tokenizers 0.5.2
Uninstalling tokenizers-0.5.2:
Successfully uninstalled tokenizers-0.5.2
Attempting uninstall: transformers
Found existing installation: transformers 2.8.0
Uninstalling transformers-2.8.0:
Successfully uninstalled transformers-2.8.0
Successfully installed tokenizers-0.8.0rc4 transformers-3.0.0

运行程序，可以得到结果，伴随着输出如下内容：
Some weights of the model checkpoint at ./bert/ were not used when initializing BertModel: [‘embeddings.position_ids’]
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

有关【Python】Transformers加载BERT模型from_pretrained()问题解决的更多相关文章

python - 如何使用 Ruby 或 Python 创建一系列高音调和低音调的蜂鸣声？ - 2
关闭。这个问题是opinion-based.它目前不接受答案。想要改进这个问题？更新问题，以便editingthispost可以用事实和引用来回答它.关闭4年前。Improvethisquestion我想在固定时间创建一系列低音和高音调的哔哔声。例如:在150毫秒时发出高音调的蜂鸣声在151毫秒时发出低音调的蜂鸣声200毫秒时发出低音调的蜂鸣声250毫秒的高音调蜂鸣声有没有办法在Ruby或Python中做到这一点？我真的不在乎输出编码是什么(.wav、.mp3、.ogg等等)，但我确实想创建一个输出文件。
ruby-on-rails - Rails - 子类化模型的设计模式是什么？ - 2
我有一个模型:classItem项目有一个属性“商店”基于存储的值，我希望Item对象对特定方法具有不同的行为。Rails中是否有针对此的通用设计模式？如果方法中没有大的if-else语句，这是如何干净利落地完成的？最佳答案通常通过Single-TableInheritance. 关于ruby-on-rails-Rails-子类化模型的设计模式是什么？，我们在StackOverflow上找到一个类似的问题： https://stackoverflow.co
ruby - 在 64 位 Snow Leopard 上使用 rvm、postgres 9.0、ruby 1.9.2-p136 安装 pg gem 时出现问题 - 2
我想为Heroku构建一个Rails3应用程序。他们使用Postgres作为他们的数据库，所以我通过MacPorts安装了postgres9.0。现在我需要一个postgresgem并且共识是出于性能原因你想要pggem。但是我对我得到的错误感到非常困惑当我尝试在rvm下通过geminstall安装pg时。我已经非常明确地指定了所有postgres目录的位置可以找到但仍然无法完成安装:$envARCHFLAGS='-archx86_64'geminstallpg--\--with-pg-config=/opt/local/var/db/postgresql90/defaultdb/po
ruby - 通过 rvm 升级 rubygems 的问题 - 2
尝试通过RVM将RubyGems升级到版本1.8.10并出现此错误:$rvmrubygemslatestRemovingoldRubygemsfiles...Installingrubygems-1.8.10forruby-1.9.2-p180...ERROR:Errorrunning'GEM_PATH="/Users/foo/.rvm/gems/ruby-1.9.2-p180:/Users/foo/.rvm/gems/ruby-1.9.2-p180@global:/Users/foo/.rvm/gems/ruby-1.9.2-p180:/Users/foo/.rvm/gems/rub
ruby-on-rails - Rails - 一个 View 中的多个模型 - 2
我需要从一个View访问多个模型。以前，我的links_controller仅用于提供以不同方式排序的链接资源。现在我想包括一个部分(我假设)显示按分数排序的顶级用户(@users=User.all.sort_by(&:score))我知道我可以将此代码插入每个链接操作并从View访问它，但这似乎不是“ruby方式”，我将需要在不久的将来访问更多模型。这可能会变得很脏，是否有针对这种情况的任何技术？注意事项:我认为我的应用程序正朝着单一格式和动态页面内容的方向发展，本质上是一个典型的网络应用程序。我知道before_filter但考虑到我希望应用程序进入的方向，这似乎很麻烦。最终从任何
ruby-on-rails - 在混合/模块中覆盖模型的属性访问器 - 2
我有一个包含模块的模型。我想在模块中覆盖模型的访问器方法。例如:classBlah这显然行不通。有什么想法可以实现吗？最佳答案您的代码看起来是正确的。我们正在毫无困难地使用这个确切的模式。如果我没记错的话，Rails使用#method_missing作为属性setter，因此您的模块将优先，阻止ActiveRecord的setter。如果您正在使用ActiveSupport::Concern(参见thisblogpost)，那么您的实例方法需要进入一个特殊的模块:classBlah
ruby - 如何在续集中重新加载表模式？ - 2
鉴于我有以下迁移:Sequel.migrationdoupdoalter_table:usersdoadd_column:is_admin,:default=>falseend#SequelrunsaDESCRIBEtablestatement,whenthemodelisloaded.#Atthispoint,itdoesnotknowthatusershaveais_adminflag.#Soitfails.@user=User.find(:email=>"admin@fancy-startup.example")@user.is_admin=true@user.save!ende
ruby-on-rails - 如何验证非模型(甚至非对象)字段 - 2
我有一个表单，其中有很多字段取自数组(而不是模型或对象)。我如何验证这些字段的存在？solve_problem_pathdo|f|%>... 最佳答案创建一个简单的类来包装请求参数并使用ActiveModel::Validations。#definedsomewhere,atthesimplest:require'ostruct'classSolvetrue#youcouldevencheckthesolutionwithavalidatorvalidatedoerrors.add(:base,"WRONG!!!")unlesss
ruby-on-rails - form_for 中不在模型中的自定义字段 - 2
我想向我的Controller传递一个参数，它是一个简单的复选框，但我不知道如何在模型的form_for中引入它，这是我的观点:{:id=>'go_finance'}do|f|%>Transferirde:para:Entrada:"input",:placeholder=>"Quantofoiganho?"%>Saída:"output",:placeholder=>"Quantofoigasto?"%>Nota:我想做一个额外的复选框，但我该怎么做，模型中没有一个对象，而是一个要检查的对象，以便在Controller中创建一个ifelse，如果没有检查，请帮助我，非常感谢,谢谢
ruby - 主要 :Object when running build from sublime 的未定义方法 `require_relative' - 2
我已经从我的命令行中获得了一切，所以我可以运行rubymyfile并且它可以正常工作。但是当我尝试从sublime中运行它时，我得到了undefinedmethod`require_relative'formain:Object有人知道我的sublime设置中缺少什么吗？我正在使用OSX并安装了rvm。最佳答案或者，您可以只使用“require”，它应该可以正常工作。我认为“require_relative”仅适用于ruby1.9+ 关于ruby-主要:Objectwhenrun