test_hive_草庐IT

hadoop - Spark SQL 不返回 HDP 上 HIVE 事务表的记录

我在HDP设置上遇到了这个问题，事务表只需要一次压缩就可以使用SparkSQL获取记录。另一方面，Apache设置甚至不需要压缩一次。可能是压缩后在元存储上触发了某些东西，SparkSQL开始识别增量文件。如果需要其他详细信息来找出根本原因，请告诉我。试试这个，查看完整场景:hive>createtabledefault.foo(idint)clusteredby(id)into2bucketsSTOREDASORCTBLPROPERTIES('transactional'='true');hive>insertintodefault.foovalues(10);scala>sqlCo

hadoop - 使用 HIVE 添加列

我有以下数据表。IDsalaryoccupation15000Engineer26000Doctor38000Pilot41000Army13000Engineer24000Teacher32000Engineer11000Teacher31000Engineer15000Doctor现在我想向该表添加另一个列标志，使其看起来如下所示。IDsalaryoccupationFlag15000Engineer026000Doctor038000Pilot041000Army013000Engineer124000Teacher132000Engineer111000Teacher23100

hadoop HIVE Engineer section code apache-spark hiveql

Hive 中的 regexp_extract 参数

花括号中的参数在下面的代码段中有什么作用？regexp_extract(col_value,'^(?:([^,]*)\,?){1}',1)Id,regexp_extract(col_value,'^(?:([^,]*)\,?){2}',1)Score,regexp_extract(col_value,'^(?:([^,]*)\,?){9}',1)DisplayName, 最佳答案如您所见here，大括号包含前面标记的次数，在本例中为non-capturinggroup,可能会重复。该组包含一个(可能为空)capturinggrou

regexp_extract extract section regex apache hadoop hive

hadoop - 在 HIVE 中创建 View

我想在分区的配置单元表上创建一个View。我的View定义如下:createviewschema.V1asselectt1.*fromscehma.tab1ast1innerjoin(selectrecord_key,max(last_update)aslast_updatefromscehma.tab1groupbyrecord_key)ast2ont1.record_key=t2.record_keyandt1.last_update=t2.last_update我的tab1表是按quarter_id分区的。当我在View上运行任何查询时它给出错误:FAILED:SemanticE

中创 hadoop section last_update record_key hive hadoop-partitioning

hadoop - 使用 yarn 的 hive 问题

我在yarn上运行hivesql，它在连接条件下抛出错误，我能够创建外部表和内部表但是在使用命令时无法创建表createtableasASSELECTnamefromstudent.当通过hivecli运行相同的查询时它工作正常但是使用springjog它会抛出错误2016-03-2804:26:50,692[Thread-17]WARNorg.apache.hadoop.hive.shims.HadoopShimsSecure-Can'tfetchtasklog:TaskLogServletisnotsupportedinMR2mode.Taskwiththemostfailures

hadoop yarn section hive 1458863269455 hadoop-yarn

hadoop - Hive - 以分钟为单位的时差为负

我需要以分钟为单位获取时差，以便在Hive查询中进行分析。我正在使用unix_timestamp()将日期转换为秒，然后减去以秒为单位的差异，然后乘以60以分钟为单位。我的问题是我最近的约会-较早的日期差异变为负值。这是我的查询和结果Hivequeryandresultscreenshotprocessed_tscreate_tsprocessed_unix_timestampcreate_unix_timestampminiueDiff2017-03-123:01:062017-03-122:58:3614893128651489316315-57.52017-03-123:01:3

hadoop Hive 2017 section strong unix-timestamp

hadoop - 如何在写入文件时禁用 Hive 中的日志

我有一个用例，我正在执行配置单元查询并将输出存储到文件中。hive-S-e"SELECT*fromtest.employeewhereempid=1">/mapr/Piyush/test/output.txt查询执行正常，但我也收到日志以及文件中的数据。我猜这是因为log4j属性。这里的问题是我无权访问log4j配置文件，因此我无法对其进行任何更改。我尝试设置几个配置。sethive.root.logger=ERROR,console和sethive.root.logger=INFO,console和sethive.server2.logging.operation.enabled=f

何在 hadoop apache java logging hive

hadoop - 缺少 Hive 执行 Jar :/usr/local/apache-hive-2. 1.0-bin/lib/hive-exec-*.jar

运行hive时出现以下错误MissingHiveExecutionJar:/usr/local/apache-hive-2.1.0-bin/lib/hive-exec-*.jar查看所有相关帖子，例如MissingHiveExecutionJar:/usr/local/hadoop/hive/lib/hive-exec-*.jar但没有帮助..!!几乎什么都试过了按照这里的步骤安装http://www.bogotobogo.com/Hadoop/BigData_hadoop_Hive_Install_On_Ubuntu_16_04.php这是我的所有设置#HADOOPVARIABLES

hive apache-hive export HADOOP HADOOP_INSTALL ubuntu-14.04

python - 在 hive 或 pyspark 中透视日志

我有很多这种格式的文件日志:[Windowsuser]Pâmela[Hostname]DV6000[Localtime]14:25:07[Systemtime]17:25:07[ASCWebBrowserinfo]1.1.1[LastWriteTime]07/19/201614:01[HDInfo]Volumename:,Serial:1713925408,FileSystem:NTFS,MaxComponentLength:255[NetworkInfo[Index]48[Type]1[Description]TAP-Win32AdapterOAS#6[Name]{343D77F2-

pyspark python 34 39 nwi_seq hadoop apache-spark hive pivot

arrays - Hive:数组列上的聚合函数

我想知道是否可以对具有as数据类型数组的列运行聚合函数。该表按以下方式创建:CREATEEXTERNALTABLEtmp_table(start_datearray,customer_idstring)ROWFORMATDELIMITEDFIELDSTERMINATEDBY'\t'LOCATION''start_date包含一组以逗号分隔的日期。我想使用MIN函数找到这些日期中的最小值:SELECTcustomer_id,MIN(start_date)FROMtmp_tableGROUPBYcustomer_id如果MIN不适用于数组结构，有什么替代解决方案？谢谢!

arrays Hive section customer_id start_date hadoop aggregate-functions