目录
大家好,我是老六。
在数据开发中,我们有大量解析json串的需求,我们选用的UDF函数无非就是:get_json_object和json_tuple。但是在使用get_json_object函数过程中,老六发现get_json_object无法解析key为中文的key:value对。带着这个问题,老六通过源码研究了get_json_object这个函数,探索其中奥秘。
本案例是在hive1.2.1版本下实测,可供参考,若有出入,请对照相应版本源码。
构造JSON串:{"gg今日":"gg今日","test":"test","gg":"gg","今日":"今日"}

1、正常案例
SQL:select get_json_object('{"gg今日":"gg今日","test":"test","gg":"gg","今日":"今日"}',"$.test");
结果:

结果很正常,就是解析json,通过key来取value
2、异常案例
1)SQL:select get_json_object('{"gg今日":"gg今日","test":"test","gg":"gg","今日":"今日"}',"$.今日");
结果:

结果开始有些不对了,按照函数使用来说,我理解是可以正常取到 今日:今日键值对,但是结果是null,开始好奇,想搞明白原因是什么。
再次测试
2)SQL:select get_json_object('{"gg今日":"gg今日","test":"test","gg":"gg","今日":"今日"}',"$.gg今日");
结果:

这就很神奇了, 以 'gg今日' 作为键却取出了以'gg'作为键的结果,缘分真是妙不可言呐。
通过测试我们发现两个问题:
1)中文作为key,get_json_object无法取出value值(见 2.1)。
2)以英文开头的混有中文的key,取出了英文部分key对应的value(见 2.2)。
通过查看源码我们发现get_json_object对应的UDF类的源码如下:
/**
* UDFJson.
*
*/
@Description(name = "get_json_object",
value = "_FUNC_(json_txt, path) - Extract a json object from path ",
extended = "Extract json object from a json string based on json path "
+ "specified, and return json string of the extracted json object. It "
+ "will return null if the input json string is invalid.\n"
+ "A limited version of JSONPath supported:\n"
+ " $ : Root object\n"
+ " . : Child operator\n"
+ " [] : Subscript operator for array\n"
+ " * : Wildcard for []\n"
+ "Syntax not supported that's worth noticing:\n"
+ " '' : Zero length string as key\n"
+ " .. : Recursive descent\n"
+ " @ : Current object/element\n"
+ " () : Script expression\n"
+ " ?() : Filter (script) expression.\n"
+ " [,] : Union operator\n"
+ " [start:end:step] : array slice operator\n")
public class UDFJson extends UDF {
private final Pattern patternKey = Pattern.compile("^([a-zA-Z0-9_\\-\\:\\s]+).*");
private final Pattern patternIndex = Pattern.compile("\\[([0-9]+|\\*)\\]");
private static final JsonFactory JSON_FACTORY = new JsonFactory();
static {
// Allows for unescaped ASCII control characters in JSON values
JSON_FACTORY.enable(Feature.ALLOW_UNQUOTED_CONTROL_CHARS);
}
private static final ObjectMapper MAPPER = new ObjectMapper(JSON_FACTORY);
private static final JavaType MAP_TYPE = TypeFactory.fromClass(Map.class);
private static final JavaType LIST_TYPE = TypeFactory.fromClass(List.class);
// An LRU cache using a linked hash map
static class HashCache<K, V> extends LinkedHashMap<K, V> {
private static final int CACHE_SIZE = 16;
private static final int INIT_SIZE = 32;
private static final float LOAD_FACTOR = 0.6f;
HashCache() {
super(INIT_SIZE, LOAD_FACTOR);
}
private static final long serialVersionUID = 1;
@Override
protected boolean removeEldestEntry(Map.Entry<K, V> eldest) {
return size() > CACHE_SIZE;
}
}
static Map<String, Object> extractObjectCache = new HashCache<String, Object>();
static Map<String, String[]> pathExprCache = new HashCache<String, String[]>();
static Map<String, ArrayList<String>> indexListCache =
new HashCache<String, ArrayList<String>>();
static Map<String, String> mKeyGroup1Cache = new HashCache<String, String>();
static Map<String, Boolean> mKeyMatchesCache = new HashCache<String, Boolean>();
Text result = new Text();
public UDFJson() {
}
/**
* Extract json object from a json string based on json path specified, and
* return json string of the extracted json object. It will return null if the
* input json string is invalid.
*
* A limited version of JSONPath supported: $ : Root object . : Child operator
* [] : Subscript operator for array * : Wildcard for []
*
* Syntax not supported that's worth noticing: '' : Zero length string as key
* .. : Recursive descent &#064; : Current object/element () : Script
* expression ?() : Filter (script) expression. [,] : Union operator
* [start:end:step] : array slice operator
*
* @param jsonString
* the json string.
* @param pathString
* the json path expression.
* @return json string or null when an error happens.
*/
public Text evaluate(String jsonString, String pathString) {
if (jsonString == null || jsonString.isEmpty() || pathString == null
|| pathString.isEmpty() || pathString.charAt(0) != '$') {
return null;
}
int pathExprStart = 1;
boolean isRootArray = false;
if (pathString.length() > 1) {
if (pathString.charAt(1) == '[') {
pathExprStart = 0;
isRootArray = true;
} else if (pathString.charAt(1) == '.') {
isRootArray = pathString.length() > 2 && pathString.charAt(2) == '[';
} else {
return null;
}
}
// Cache pathExpr
String[] pathExpr = pathExprCache.get(pathString);
if (pathExpr == null) {
pathExpr = pathString.split("\\.", -1);
pathExprCache.put(pathString, pathExpr);
}
// Cache extractObject
Object extractObject = extractObjectCache.get(jsonString);
if (extractObject == null) {
JavaType javaType = isRootArray ? LIST_TYPE : MAP_TYPE;
try {
extractObject = MAPPER.readValue(jsonString, javaType);
} catch (Exception e) {
return null;
}
extractObjectCache.put(jsonString, extractObject);
}
for (int i = pathExprStart; i < pathExpr.length; i++) {
if (extractObject == null) {
return null;
}
extractObject = extract(extractObject, pathExpr[i], i == pathExprStart && isRootArray);
}
if (extractObject instanceof Map || extractObject instanceof List) {
try {
result.set(MAPPER.writeValueAsString(extractObject));
} catch (Exception e) {
return null;
}
} else if (extractObject != null) {
result.set(extractObject.toString());
} else {
return null;
}
return result;
}
private Object extract(Object json, String path, boolean skipMapProc) {
// skip MAP processing for the first path element if root is array
if (!skipMapProc) {
// Cache patternkey.matcher(path).matches()
Matcher mKey = null;
Boolean mKeyMatches = mKeyMatchesCache.get(path);
if (mKeyMatches == null) {
mKey = patternKey.matcher(path);
mKeyMatches = mKey.matches() ? Boolean.TRUE : Boolean.FALSE;
mKeyMatchesCache.put(path, mKeyMatches);
}
if (!mKeyMatches.booleanValue()) {
return null;
}
// Cache mkey.group(1)
String mKeyGroup1 = mKeyGroup1Cache.get(path);
if (mKeyGroup1 == null) {
if (mKey == null) {
mKey = patternKey.matcher(path);
mKeyMatches = mKey.matches() ? Boolean.TRUE : Boolean.FALSE;
mKeyMatchesCache.put(path, mKeyMatches);
if (!mKeyMatches.booleanValue()) {
return null;
}
}
mKeyGroup1 = mKey.group(1);
mKeyGroup1Cache.put(path, mKeyGroup1);
}
json = extract_json_withkey(json, mKeyGroup1);
}
// Cache indexList
ArrayList<String> indexList = indexListCache.get(path);
if (indexList == null) {
Matcher mIndex = patternIndex.matcher(path);
indexList = new ArrayList<String>();
while (mIndex.find()) {
indexList.add(mIndex.group(1));
}
indexListCache.put(path, indexList);
}
if (indexList.size() > 0) {
json = extract_json_withindex(json, indexList);
}
return json;
}
private transient AddingList jsonList = new AddingList();
private static class AddingList extends ArrayList<Object> {
@Override
public Iterator<Object> iterator() {
return Iterators.forArray(toArray());
}
@Override
public void removeRange(int fromIndex, int toIndex) {
super.removeRange(fromIndex, toIndex);
}
};
@SuppressWarnings("unchecked")
private Object extract_json_withindex(Object json, ArrayList<String> indexList) {
jsonList.clear();
jsonList.add(json);
for (String index : indexList) {
int targets = jsonList.size();
if (index.equalsIgnoreCase("*")) {
for (Object array : jsonList) {
if (array instanceof List) {
for (int j = 0; j < ((List<Object>)array).size(); j++) {
jsonList.add(((List<Object>)array).get(j));
}
}
}
} else {
for (Object array : jsonList) {
int indexValue = Integer.parseInt(index);
if (!(array instanceof List)) {
continue;
}
List<Object> list = (List<Object>) array;
if (indexValue >= list.size()) {
continue;
}
jsonList.add(list.get(indexValue));
}
}
if (jsonList.size() == targets) {
return null;
}
jsonList.removeRange(0, targets);
}
if (jsonList.isEmpty()) {
return null;
}
return (jsonList.size() > 1) ? new ArrayList<Object>(jsonList) : jsonList.get(0);
}
@SuppressWarnings("unchecked")
private Object extract_json_withkey(Object json, String path) {
if (json instanceof List) {
List<Object> jsonArray = new ArrayList<Object>();
for (int i = 0; i < ((List<Object>) json).size(); i++) {
Object json_elem = ((List<Object>) json).get(i);
Object json_obj = null;
if (json_elem instanceof Map) {
json_obj = ((Map<String, Object>) json_elem).get(path);
} else {
continue;
}
if (json_obj instanceof List) {
for (int j = 0; j < ((List<Object>) json_obj).size(); j++) {
jsonArray.add(((List<Object>) json_obj).get(j));
}
} else if (json_obj != null) {
jsonArray.add(json_obj);
}
}
return (jsonArray.size() == 0) ? null : jsonArray;
} else if (json instanceof Map) {
return ((Map<String, Object>) json).get(path);
} else {
return null;
}
}
}
private final Pattern patternKey = Pattern.compile("^([a-zA-Z0-9_\\-\\:\\s]+).*"); 这个就是匹配key的模式串,它的意思是匹配以数字、字母、_、-、:、空格 为开头的字符串。那么这个匹配模式串就决定了,get_json_object函数无法匹配出key是纯中文的键值对,即select get_json_object('{"gg今日":"gg今日","test":"test","gg":"gg","今日":"今日"}',"$.今日") 结果为null; 对于 select get_json_object('{"gg今日":"gg今日","test":"test","gg":"gg","今日":"今日"}',"$.gg今日"),由于匹配key的模式串是匹配以数字、字母、_、-、:、空格 为开头的字符串,所以它将'gg今日'中的'gg'匹配出来,然后作为key去取value,所以就取出了 "gg":"gg" 键值对,即答案为'gg'。
实践是检验真理的唯一标准。遇到问题,我们就去它的源头寻找答案。
类classAprivatedeffooputs:fooendpublicdefbarputs:barendprivatedefzimputs:zimendprotecteddefdibputs:dibendendA的实例a=A.new测试a.foorescueputs:faila.barrescueputs:faila.zimrescueputs:faila.dibrescueputs:faila.gazrescueputs:fail测试输出failbarfailfailfail.发送测试[:foo,:bar,:zim,:dib,:gaz].each{|m|a.send(m)resc
我正在尝试设置一个puppet节点,但rubygems似乎不正常。如果我通过它自己的二进制文件(/usr/lib/ruby/gems/1.8/gems/facter-1.5.8/bin/facter)在cli上运行facter,它工作正常,但如果我通过由rubygems(/usr/bin/facter)安装的二进制文件,它抛出:/usr/lib/ruby/1.8/facter/uptime.rb:11:undefinedmethod`get_uptime'forFacter::Util::Uptime:Module(NoMethodError)from/usr/lib/ruby
我已经从我的命令行中获得了一切,所以我可以运行rubymyfile并且它可以正常工作。但是当我尝试从sublime中运行它时,我得到了undefinedmethod`require_relative'formain:Object有人知道我的sublime设置中缺少什么吗?我正在使用OSX并安装了rvm。 最佳答案 或者,您可以只使用“require”,它应该可以正常工作。我认为“require_relative”仅适用于ruby1.9+ 关于ruby-主要:Objectwhenrun
在我的Controller中,我通过以下方式在我的index方法中支持HTML和JSON:respond_todo|format|format.htmlformat.json{renderjson:@user}end在浏览器中拉起它时,它会自然地以HTML呈现。但是,当我对/user资源进行内容类型为application/json的curl调用时(因为它是索引方法),我仍然将HTML作为响应。如何获取JSON作为响应?我还需要说明什么? 最佳答案 您应该将.json附加到请求的url,提供的格式在routes.rb的路径中定义。这
如果您尝试在Ruby中的nil对象上调用方法,则会出现NoMethodError异常并显示消息:"undefinedmethod‘...’fornil:NilClass"然而,有一个tryRails中的方法,如果它被发送到一个nil对象,它只返回nil:require'rubygems'require'active_support/all'nil.try(:nonexisting_method)#noNoMethodErrorexceptionanymore那么try如何在内部工作以防止该异常? 最佳答案 像Ruby中的所有其他对象
我有一个非常简单的RubyRack服务器,例如:app=Proc.newdo|env|req=Rack::Request.new(env).paramspreq.inspect[200,{'Content-Type'=>'text/plain'},['Somebody']]endRack::Handler::Thin.run(app,:Port=>4001,:threaded=>true)每当我使用JSON对象向服务器发送POSTHTTP请求时:{"session":{"accountId":String,"callId":String,"from":Object,"headers":
一、引擎主循环UE版本:4.27一、引擎主循环的位置:Launch.cpp:GuardedMain函数二、、GuardedMain函数执行逻辑:1、EnginePreInit:加载大多数模块int32ErrorLevel=EnginePreInit(CmdLine);PreInit模块加载顺序:模块加载过程:(1)注册模块中定义的UObject,同时为每个类构造一个类默认对象(CDO,记录类的默认状态,作为模板用于子类实例创建)(2)调用模块的StartUpModule方法2、FEngineLoop::Init()1、检查Engine的配置文件找出使用了哪一个GameEngine类(UGame
1.错误信息:Errorresponsefromdaemon:Gethttps://registry-1.docker.io/v2/:net/http:requestcanceledwhilewaitingforconnection(Client.Timeoutexceededwhileawaitingheaders)或者:Errorresponsefromdaemon:Gethttps://registry-1.docker.io/v2/:net/http:TLShandshaketimeout2.报错原因:docker使用的镜像网址默认为国外,下载容易超时,需要修改成国内镜像地址(首先阿里
目录第1题连续问题分析:解法:第2题分组问题分析:解法:第3题间隔连续问题分析:解法:第4题打折日期交叉问题分析:解法:第5题同时在线问题分析:解法:第1题连续问题如下数据为蚂蚁森林中用户领取的减少碳排放量iddtlowcarbon10012021-12-1212310022021-12-124510012021-12-134310012021-12-134510012021-12-132310022021-12-144510012021-12-1423010022021-12-154510012021-12-1523.......找出连续3天及以上减少碳排放量在100以上的用户分析:遇到这类
我正在使用ruby2.1.0我有一个json文件。例如:test.json{"item":[{"apple":1},{"banana":2}]}用YAML.load加载这个文件安全吗?YAML.load(File.read('test.json'))我正在尝试加载一个json或yaml格式的文件。 最佳答案 YAML可以加载JSONYAML.load('{"something":"test","other":4}')=>{"something"=>"test","other"=>4}JSON将无法加载YAML。JSON.load("