extract_options

performance - Tableau 受限 Data Extract 连接速度慢

我在Tableau中设计可视化，我的数据在Hive/hadoop中，数据量很大，当我尝试设计可视化时，查询运行非常非常慢，因为每次它尝试从hadoop中提取数据。所以对于任何可视化，简单的拖放通常需要4分钟，而可视化可能需要10秒的拖放，所以我最终要花很多时间等待。我尝试使用数据提取选项，但是它永远需要数据提取(38分钟并且仍在继续)问题:有没有办法我只能提取1000条记录，这样我就可以处理这1000条记录来创建可视化，然后在设计完成后切换到实时连接。我试图查看画面社区的帮助，但到目前为止没有运气最佳答案复制XL中的所有数据并将

受限 performance section 中设仪表板 hadoop tableau-api data-extraction

java - 相当于 mongo 的输出 :reduce option in hadoop

我正在重写MongoDBmapreduce作业以改用Hadoop(使用mongo-hadoop连接器)，但是当我将两个数据集映射到同一个集合时，它会覆盖值而不是使用它们{reduce:"collectionName"}-Ifdocumentsexistsforagivenkeyintheresultsetandintheoldcollection,thenareduceoperation(usingthespecifiedreducefunction)willbeperformedonthetwovaluesandtheresultwillbewrittentotheoutputcol

相当 hadoop section reduce the java mongodb mapreduce

hadoop - cassandra 需要 javax.jdo.option.ConnectionURL

hive-site.xml中的以下属性对于Hive访问cassandra是否正确？(我复制了整个HIVE-DEFAULT.XML内容，但仅更改了以下属性)javax.jdo.option.ConnectionURL:cassandra://localhost:9160javax.jdo.option.ConnectionDriverName:org.apache.cassandra.cql.jdbc.CassandraDriverhive.stats.dbclass:jdbc:cassandrahive.stats.jdbcdriver:org.apache.cassandra.cql

ConnectionURL cassandra section hive hadoop cassandra-jdbc

regex - 使用 REGEX_EXTRACT_ALL 但投影我得到 "()"

我正在使用Cloudera-quickstat5.4。我有一个文件，每一行都有数据，例如:323.81.303.680--[25/Oct/2011:01:41:00-0500]"GET/download/download6.zipHTTP/1.1"2000"-""Mozilla/5.0(Windows;U;WindowsNT5.1;en-US;rv:1.9.0.19)Gecko/2010031422Firefox/3.0.19"在apachepig中，我使用的脚本如下:A=LOAD'weblog.txt'usingTextLoader()as(line:chararray);B=FOR

REGEX_EXTRACT_ALL amp chararray section 34 regex hadoop apache-pig

Hive 中的 regex_extract

我有字符串列，我想要第一个分号后的数据列数据:Options;list:direct&ACFs:Sharemarket我希望输出为list:direct&ACFs:股票市场我试过这个选项选择(regexp_extract(property,'^(?:([^;]*)\;?){2}',1))结果输出为list:direct&如何在第一个分号之后填充完整的字符串，就像我的输出一样list:direct&ACFs:股票市场有人能帮帮我吗？最佳答案你可以试试这个selectregexp_extract('Options;list;d

regex_extract extract section direct list regex hadoop hive

regex - 当 regexp_like 和 regexp_extract 工作正常时，Impala regexp_like 查询返回 null

我需要使用regex_extract从列中的字符串中提取数字。我在外部表上使用Impala。我已经检查了正则表达式，为了测试它，我还使用了regexp_like和regexp_replace。他们两个都工作得很完美。这里是查询:selectsucursal,regexp_like(sucursal,'^[0-9]{1,3}')asmatch,regexp_extract(sucursal,'^[0-9]{1,3}',1)asCodSucusal,regexp_replace(sucursal,'^[0-9]{1,3}','lala')asRepCodSucusalfromjdv.stg

regexp regexp_like code sucursal regex hadoop etl impala

hadoop - Spark : Minimize task/partition size skew with textFile's minPartitions option?

我正在通过sc.textFile("/data/*/*/*")之类的方式将数万个文件读入rdd>一个问题是这些文件中的大多数都是微小的，而其他的则巨大。这会导致任务不平衡，从而导致各种众所周知的问题。我能否通过sc.textFile("/data/*/*/*",minPartitions=n_files*5)读取数据来拆分最大的分区，其中n_files是输入文件的个数吗？如约定elsewhere在stackoverflow上，minPartitions被传递到hadooprabithole，并在org.apache.hadoop.mapred.TextInputFormat.getSp

minPartitions partition code hadoop section apache-spark

regex - 配置单元查询 regexp_extract

我正在尝试从“[223.104.227.42]”中提取IP地址。我想提取“[”和“]”之间的223.104.227.42，我正在使用这个查询:selectregexp_extract('[223.104.227.42]','\\[(.*?)\\]')但是我得到一个错误:FAILED:Infunctionregexp_extract,patternmusthasonegroupreferenceatleast. 最佳答案尝试将捕获组索引指定为参数(1):hive>selectregexp_extract('[223.104.227.

配置单 regexp_extract section extract regex hadoop hive hiveql

hadoop - --options-file 与 --connection-param-file 有何不同

Sqoop文档将--options-file的示例显示为:##OptionsfileforSqoopimport##Specifiesthetoolbeinginvokedimport#Connectparameterandvalue--connectjdbc:mysql://localhost/db#Usernameparameterandvalue--usernamefoo##Remainingoptionsshouldbespecifiedinthecommandline.#按照上面的说法，如果它只是连接信息，并且按照注释，所有剩余的选项都应该在命令行中指定，为什么它在--opt

connection-param-file file code section options hadoop sqoop

regex - 使用 REGEXP_EXTRACT 没有给出预期的结果 - Hive

我正在尝试在Hive中使用REGEXP_EXTRACT函数从列中获取所需的字符串。列中数据的形式为:单词\more_words我需要提取\之后的字符串部分。我试着做这样的事情:SELECTREGEXP_EXTRACT('words\more_words','(.*)(\\+)(.*)',3)->不返回任何内容SELECTREGEXP_EXTRACT('words\more_words','.*(\\+)(.*)',2)->不返回任何内容SELECTREGEXP_EXTRACT('words\more_words','\w+(\\+)(\w+)',2)->什么都不返回SELECTREGE

REGEXP_EXTRACT EXTRACT code words more_words regex hadoop hive

104 105 106107108 109 110