我有一个 Flume 代理将推文写入 HBase 接收器。
几秒钟后,到接收器的事务失败,每隔 8-10 秒我就会在 Flume 代理日志中收到这些错误消息,告诉我到 HBase 的事务失败。
奇怪的是,一些推文仍然通过并进入 HBase 表。是什么原因造成的? 这是在单节点 Cloudera Quickstart VM 上运行,会不会是资源问题?
这是代理日志
9:20:44.618 PM ERROR org.apache.flume.SinkRunner
Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: Could not write events to Hbase. Transaction failed, and rolled back.
at org.apache.flume.sink.hbase.AsyncHBaseSink.process(AsyncHBaseSink.java:245)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:662)
9:20:53.883 PM ERROR org.apache.flume.SinkRunner
Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: Could not write events to Hbase. Transaction failed, and rolled back.
at org.apache.flume.sink.hbase.AsyncHBaseSink.process(AsyncHBaseSink.java:245)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:662)
这些是调试日志中的一些奇怪的东西,也许相关?
2014-03-06 09:39:12,069 DEBUG org.apache.zookeeper.client.ZooKeeperSaslClient: Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration
2014-03-06 09:39:12,298 DEBUG org.apache.zookeeper.ClientCnxn: An exception was thrown while closing send thread for session 0x144965080900029 : Unable to read additional data from server sessionid 0x144965080900029, likely server has closed socket
这是我的代理配置
TwitterAgent.sinks.HBaseTweet.channel = MemChannel
TwitterAgent.sinks.HBaseTweet.type = org.apache.flume.sink.hbase.AsyncHBaseSink
TwitterAgent.sinks.HBaseTweet.table = tweets
TwitterAgent.sinks.HBaseTweet.columnFamily = tweet
TwitterAgent.sinks.HBaseTweet.batchSize = 100
TwitterAgent.sinks.HBaseTweet.serializer = flume_hdfs.hbase.util.AsyncHbaseTwitterEventSerializer
TwitterAgent.sinks.HBaseTweet.serializer.columns = tweet:id,tweet:created_at,tweet:source,tweet:favourited,tweet:text
TwitterAgent.sinks.HBaseTweet.serializer.delimiter = \\t
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 200
TwitterAgent.channels.MemChannel.transactionCapacity = 100
停止代理时日志中的一些指标可能很有趣
Component type: CHANNEL, name: MemChannel stopped
Shutdown Metric for type: CHANNEL, name: MemChannel. channel.start.time == 1394093630078
Shutdown Metric for type: CHANNEL, name: MemChannel. channel.stop.time == 1394093894804
Shutdown Metric for type: CHANNEL, name: MemChannel. channel.capacity == 200
Shutdown Metric for type: CHANNEL, name: MemChannel. channel.current.size == 125
Shutdown Metric for type: CHANNEL, name: MemChannel. channel.event.put.attempt == 220
Shutdown Metric for type: CHANNEL, name: MemChannel. channel.event.put.success == 209
Shutdown Metric for type: CHANNEL, name: MemChannel. channel.event.take.attempt == 3059
Shutdown Metric for type: CHANNEL, name: MemChannel. channel.event.take.success == 9
Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: Could not write events to Hbase. Transaction failed, and rolled back.
at org.apache.flume.sink.hbase.AsyncHBaseSink.process(AsyncHBaseSink.java:245)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:662)
Component type: SINK, name: HBaseTweet stopped
Shutdown Metric for type: SINK, name: HBaseTweet. sink.start.time == 1394093630407
Shutdown Metric for type: SINK, name: HBaseTweet. sink.stop.time == 1394093894833
Shutdown Metric for type: SINK, name: HBaseTweet. sink.batch.complete == 27
Shutdown Metric for type: SINK, name: HBaseTweet. sink.batch.empty == 0
Shutdown Metric for type: SINK, name: HBaseTweet. sink.batch.underflow == 7
Shutdown Metric for type: SINK, name: HBaseTweet. sink.connection.closed.count == 1
Shutdown Metric for type: SINK, name: HBaseTweet. sink.connection.creation.count == 1
Shutdown Metric for type: SINK, name: HBaseTweet. sink.connection.failed.count == 0
Shutdown Metric for type: SINK, name: HBaseTweet. sink.event.drain.attempt == 3053
Shutdown Metric for type: SINK, name: HBaseTweet. sink.event.drain.sucess == 9
HBase 区域服务器错误
2014-03-08 09:37:44,371 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family retweet does not exist in region tweets,,1394029330397.953f602dd0790637df8106720396f219. in table 'tweets', {NAME => 'entities', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', ENCODE_ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'retweeted_status', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', ENCODE_ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'tweet', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', ENCODE_ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'user', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', ENCODE_ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
at org.apache.hadoop.hbase.regionserver.HRegion.checkFamily(HRegion.java:5475)
at org.apache.hadoop.hbase.regionserver.HRegion.checkFamilies(HRegion.java:3022)
at org.apache.hadoop.hbase.regionserver.HRegion.internalPut(HRegion.java:2900)
at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:2083)
at org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:2239)
at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:323)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1428)
最佳答案
HBase 日志中的错误消息表明存在模式不匹配,特别是代理期望有一个名为 retweet 的列族,而模式实际上指定了 retweeted_status。
解决方案是重新编译代理以使用正确的列族名称,或者更改架构以使用代理所需的名称。我不知道什么修复更正确;如果您自己定义了这个架构,那么您很可能只需更改列族名称即可。但是,如果模式是在外部定义的(即:通过某些脚本或遵循来自某处的特定指令),重命名列族可能会破坏其他内容,这取决于名称是 retweeted_status。在这种情况下,应修复 Twitter_HBase_Impala 的源代码以使用正确的名称。
关于hadoop - 到 HBase 的 Flume 交易失败,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22208912/
我已经构建了一些serverspec代码来在多个主机上运行一组测试。问题是当任何测试失败时,测试会在当前主机停止。即使测试失败,我也希望它继续在所有主机上运行。Rakefile:namespace:specdotask:all=>hosts.map{|h|'spec:'+h.split('.')[0]}hosts.eachdo|host|begindesc"Runserverspecto#{host}"RSpec::Core::RakeTask.new(host)do|t|ENV['TARGET_HOST']=hostt.pattern="spec/cfengine3/*_spec.r
我正在尝试在Rails上安装ruby,到目前为止一切都已安装,但是当我尝试使用rakedb:create创建数据库时,我收到一个奇怪的错误:dyld:lazysymbolbindingfailed:Symbolnotfound:_mysql_get_client_infoReferencedfrom:/Library/Ruby/Gems/1.8/gems/mysql2-0.3.11/lib/mysql2/mysql2.bundleExpectedin:flatnamespacedyld:Symbolnotfound:_mysql_get_client_infoReferencedf
1.1.1 YARN的介绍 为克服Hadoop1.0中HDFS和MapReduce存在的各种问题⽽提出的,针对Hadoop1.0中的MapReduce在扩展性和多框架⽀持⽅⾯的不⾜,提出了全新的资源管理框架YARN. ApacheYARN(YetanotherResourceNegotiator的缩写)是Hadoop集群的资源管理系统,负责为计算程序提供服务器计算资源,相当于⼀个分布式的操作系统平台,⽽MapReduce等计算程序则相当于运⾏于操作系统之上的应⽤程序。 YARN被引⼊Hadoop2,最初是为了改善MapReduce的实现,但是因为具有⾜够的通⽤性,同样可以⽀持其他的分布式计算模
Region是HBase数据管理的基本单位,region有一点像关系型数据的分区。region中存储这用户的真实数据,而为了管理这些数据,HBase使用了RegionSever来管理region。Region的结构hbaseregion的大小设置默认情况下,每个Table起初只有一个Region,随着数据的不断写入,Region会自动进行拆分。刚拆分时,两个子Region都位于当前的RegionServer,但处于负载均衡的考虑,HMaster有可能会将某个Region转移给其他的RegionServer。RegionSplit时机:当1个region中的某个Store下所有StoreFile
我需要一个非常简单的字符串验证器来显示第一个符号与所需格式不对应的位置。我想使用正则表达式,但在这种情况下,我必须找到与表达式相对应的字符串停止的位置,但我找不到可以做到这一点的方法。(这一定是一种相当简单的方法……也许没有?)例如,如果我有正则表达式:/^Q+E+R+$/带字符串:"QQQQEEE2ER"期望的结果应该是7 最佳答案 一个想法:你可以做的是标记你的模式并用可选的嵌套捕获组编写它:^(Q+(E+(R+($)?)?)?)?然后你只需要计算你获得的捕获组的数量就可以知道正则表达式引擎在模式中停止的位置,你可以确定匹配结束
我正在尝试在配备ARMv7处理器的SynologyDS215j上安装ruby2.2.4或2.3.0。我用了optware-ng安装gcc、make、openssl、openssl-dev和zlib。我根据README中的说明安装了rbenv(版本1.0.0-19-g29b4da7)和ruby-build插件。.这些是随optware-ng安装的软件包及其版本binutils-2.25.1-1gcc-5.3.0-6gconv-modules-2.21-3glibc-opt-2.21-4libc-dev-2.21-1libgmp-6.0.0a-1libmpc-1.0.2-1libm
一段时间以来,我一直在使用open_uri下拉ftp路径作为数据源,但突然发现我几乎连续不断地收到“530抱歉,允许的最大客户端数(95)已经连接。”我不确定我的代码是否有问题,或者是否是其他人在访问服务器,不幸的是,我无法真正确定谁有问题。本质上,我正在读取FTPURI:defself.read_uri(uri)beginuri=open(uri).readuri=="Error"?nil:urirescueOpenURI::HTTPErrornilendend我猜我需要在这里添加一些额外的错误处理代码...我想确保我采取一切预防措施来关闭所有连接,这样我的连接就不是问题所在,但是我
我在思考流量控制的最佳实践。我应该走哪条路?1)不要检查任何东西并让程序失败(更清晰的代码,自然的错误消息):defself.fetch(feed_id)feed=Feed.find(feed_id)feed.fetchend2)通过返回nil静默失败(但是,“CleanCode”说,你永远不应该返回null):defself.fetch(feed_id)returnunlessfeed_idfeed=Feed.find(feed_id)returnunlessfeedfeed.fetchend3)抛出异常(因为不按id查找feed是异常的):defself.fetch(feed_id
我正在为毕业设计开发GEM,TravisCI构建不断失败。这是我在Travis上的链接:https://travis-ci.org/ricardobond/perpetuus/builds/8709218构建错误是:$bundleexecrakerakeaborted!Don'tknowhowtobuildtask'default'/home/travis/.rvm/gems/ruby-1.9.3-p448/bin/ruby_noexec_wrapper:14:in`eval'/home/travis/.rvm/gems/ruby-1.9.3-p448/bin/ruby_noexec_
运行:ruby1.9.3p0和Rails3.2.1尝试使用rspec但当我尝试将其安装到我的应用程序中时出现以下错误:/Users/Si/.rvm/gems/ruby-1.9.3-p0/gems/railties-3.2.1/lib/rails/railtie/configuration.rb:85:in`method_missing':undefinedmethod`generators'for#(NoMethodError)from/Users/Si/.rvm/gems/ruby-1.9.3-p0/gems/rspec-rails-2.0.0.beta.18/lib/rspec-r