函数evaluate不识别公式中的中括号怎么办(没有evaluate函数)
727
2022-05-29
概况:
某一用户反馈的hbase查询问题,查询使用get list,单次get list超过25条就查询异常,客户端返回multiActionResultTooLarge
2020-09-09 16:33:00,607Z+0000|INFO|custom-tomcat-51||||Https| requestId=05a89c1b-cecf-4693-a8d6-a319b6621cff|com.xxx.xxx.xxx.hbase.HBaseOperations.get(HBaseOperations.java:429)|(1078409497)get batch rows, tableName: DETAIL.
2020-09-09 16:33:00,622Z+0000|WARN|hconnection-0x39b61605-shared--pool1-t8272||||||org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.logNoResubmit(AsyncProcess.java:1313)|(1078409512)#1, table=DETAIL, attempt=1/1 failed=13ops, last exception: org.apache.hadoop.hbase.MultiActionResultTooLarge: org.apache.hadoop.hbase.MultiActionResultTooLarge: Max size exceeded CellSize: 132944 BlockSize: 109051904
问题现象:
1. 客户的HBase集群,写入69条数据后,使用htable.get(list)查询数据,如果list大于25,则会遇到MultiActionResultTooLarge异常。经过1小时左右后,list大于25不会出现异常。
2. 初始分析时,看到MultiActionResultTooLarge的报错,还以为是服务端设置的查询BlockSize超过了100M的阈值,100M由参数hbase.server.scanner.max.result.size控制,但是用户反馈,该表总共占的存储空间才几百KB。
3. 客户表结构信息为
COLUMN FAMILIES DESCRIPTION
{NAME => 'CF1', BLOOMFILTER => 'ROW', VERSIONS => '1000', IN_MEMORY => 'false',
KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => '2147472000 SECONDS (24855 DAYS)', COMPRES
SION => 'SNAPPY', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
问题分析:
1. 从表现上看,问题抛出了MultiActionResultTooLarge,查看该处代码,是因为这个context.getResponseCellSize超过了quota值,这里的quota就是hbase.server.scanner.max.result.size设置的100MB,
if (context != null
&& context.isRetryImmediatelySupported()
&& (context.getResponseCellSize() > maxQuotaResultSize
|| context.getResponseBlockSize() + context.getResponseExceptionSize()
> maxQuotaResultSize)) {
// We're storing the exception since the exception and reason string won't
// change after the response size limit is reached.
if (sizeIOE == null ) {
// We don't need the stack un-winding do don't throw the exception.
// Throwing will kill the JVM's JIT.
//
// Instead just create the exception and then store it.
sizeIOE = new MultiActionResultTooLarge("Max size exceeded"
+ " CellSize: " + context.getResponseCellSize()
+ " BlockSize: " + context.getResponseBlockSize());
// Only report the exception once since there's only one request that
// caused the exception. Otherwise this number will dominate the exceptions count.
rpcServer.getMetrics().exception(sizeIOE);
}
2. 接着分析context.getResponseCellSize为什么会超过100MB,从下面代码可以看到,这里是将查询的Result中的cell拿出来累加block的size, 如果上一个是相同block则不累加。
/**
* Method to account for the size of retained cells and retained data blocks.
* @return an object that represents the last referenced block from this response.
*/
Object addSize(RpcCallContext context, Result r, Object lastBlock) {
if (context != null && r != null && !r.isEmpty()) {
for (Cell c : r.rawCells()) {
context.incrementResponseCellSize(CellUtil.estimatedHeapSizeOf(c));
// We're using the last block being the same as the current block as
// a proxy for pointing to a new block. This won't be exact.
// If there are multiple gets that bounce back and forth
// Then it's possible that this will over count the size of
// referenced blocks. However it's better to over count and
// use two RPC's than to OOME the RegionServer.
byte[] valueArray = c.getValueArray();
if (valueArray != lastBlock) {
context.incrementResponseBlockSize(valueArray.length);
lastBlock = valueArray;
}
}
}
return lastBlock;
}
3. 这时候怀疑可能是不是因为用户表的Version过多导致,从用户侧得知,他们的业务的确存在反复对一个Row做更新,且表的Version为1000,但是在重新把表的version从1000修改为1后,问题还是存在。经过测试,当把数据手工执行flush后,查询又能恢复,怀疑查询有问题的数据应该是没有落盘HDFS,可能是在WAL或者memstore中。
4. 后面到社区去根据关键字“MultiActionResultTooLarge”查询到https://issues.apache.org/jira/browse/HBASE-23158这个单,现象恰好是跟当前遇到的问题是一一样的,这个单是Unresolved的状态,这个是hbase为了保护bigScan所以设置了一个代码上的保护,这里单提及如果Cell还在Memstore的时候,代码中计算的那个array可能会变得很大。
5. 由于平时Get List是比较常见的操作,应该不至于因为这个保护就必然出现问题。接着从ISSUE单提供的test patch发现,复现此问题时,他把客户端的retry次数调低了。这时候回过头看客户的报错日志,发现重试次数只有1次,当我们把这个重试次数稍微调大,问题就不出现了。
org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.logNoResubmit(AsyncProcess.java:1313)|(1078409512)#1, table=DETAIL, attempt=1/1 failed=13ops, last exception: org.apache.hadoop.hbase.MultiActionResultTooLarge
规避此问题的方法是稍微调大客户端重试次数,当客户端重试次数为1时,遇到些异常时就不会重新去请求服务端,容易引起一些偶发性的问题。至于重试次数为1时,出现此问题,则需要HBase社区一起看看有什么好的解决方法。
EI企业智能 智能数据 HBase 表格存储服务 CloudTable
版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。