spark-hive_草庐IT

mysql - 如何在 UBUNTU 中从 Hive 到 HiveServer2

我遵循了1个指南，并且能够在我的虚拟机ubuntu上设置、hadoop和hive。现在，我想让hiveserver2在ubuntu中启动。我找不到任何指南可以告诉我如何开始使用hiveserver2。我的计划是先弄hiveserver2，然后beeline和mysql，然后在HDFS中连接mysql到tomcat，用eclipse开发一些DB软件。我不期望从任何人那里得到大的答案，只是一些可供研究的引用。谢谢最佳答案您可以试试Cloudera提供的hive文档。ClouderaInstallationGuide

何在 HiveServer2 section strong mysql ubuntu hadoop hive ubuntu-14.04

hadoop - 如何使用 hadoop 自定义输入格式调整 Spark 应用程序

我的spark应用程序使用自定义hadoop输入格式处理文件(平均大小为20MB)，并将结果存储在HDFS中。以下是代码片段。Configurationconf=newConfiguration();JavaPairRDDbaseRDD=ctx.newAPIHadoopFile(input,CustomInputFormat.class,Text.class,Text.class,conf);JavaRDDmapPartitionsRDD=baseRDD.mapPartitions(newFlatMapFunction>,myClass>(){//mylogicgoeshere}//f

自定 hadoop section strong stackoverflow mapreduce apache-spark

tomcat - hive-jdbc-standalone.jar 未由 tomcat 7 加载

我正在尝试从Web应用程序连接到Hivethrift服务器(hiveserver2)。我在Eclipse中创建了动态Web项目，并在WEB-INF/lib下添加了以下jar-hive-jdbc-0.14.0-standalone.jar配置单元-jdbc-0.14.0.jarhadoop-common-2.6.0.jarmongo-hadoop-core.jarmongo-hadoop-hive.jarmongo-java-driver.jar我使用的是tomcat7.0.61。当我将应用程序部署到tomcat服务器时，它会显示以下消息并且不会加载hive-jdbc-0.14.0-st

tomcat hive-jdbc-standalone jar standalone hive-jdbc hadoop jdbc hive tomcat7

hadoop - Apache Spark JavaSchemaRDD 是空的，即使它的输入 RDD 有数据

我有大量超过40列的制表符分隔文件。我想对其应用聚合，只选择几列。我认为ApacheSpark是最好的选择，因为我的文件存储在Hadoop中。我有以下程序publicclassMyPOJO{intfield1;Stringfield2;etc}JavaSparkContextsc;JavaRDDdata=sc.textFile("path/input.csv");JavaSQLContextsqlContext=newJavaSQLContext(sc);JavaRDDrdd_records=sc.textFile(data).map(newFunction(){publicRecor

有数 JavaSchemaRDD section 制表符 String hadoop apache-spark

hadoop - Hive 不会在 Hortonworks 2.2.4 中运行

我刚刚下载了HortonworksSandbox2.2.4，当我关注Hortonwork'stutorialonHive时我注意到了，我明白了，HCatClienterroroncreatetable:{"statement":"usedefault;createtablenyse_stocks(`exchange`string,`stock_symbol`string,`date`string,`stock_price_open`float,`stock_price_high`float,`stock_price_low`float,`stock_price_close`float,

中运 Hortonworks SLF4J HiveConf SLF4 hadoop hive hortonworks-data-platform

hadoop - Hive 中的 Unicode 数据支持

根据Hive文档，Hive支持表中的unicode数据。我创建了一个数据类型为“string”的表，并将unicode数据加载到其中，但是当我说select*from时我得到垃圾值createtableunicode(datastring);loaddatalocalinpath'unicode.txt'intotableunicode;下面是选择的输出Lescaractï¿½resaccentuï¿½s(Franï¿½ais)Endonnï¿½esnousavonsconfianceDonnï¿½es,donnï¿½es,partoutettouslesnoeudsï¿½taient

Unicode hadoop section code hive

linux - hive 脚本(hivequery.hql)文件中的这个符号是什么意思 "use ${word:word}"

脚本(hivequery.hql:)如下所示:Use${platformType:platformName};select*fromhivetablename;这个脚本在bash脚本中被调用为#!/usr/bin/envbashhive-fhivequery.hql 最佳答案在hql文件中，use命令设置默认数据库。参见UseDatabase.${platformType:platformName}是Hive的变量表示法，其中platformType是命名空间，platformName是变量名。这在UsingVariables中有

word amp code section platformType linux bash shell hadoop hive

hadoop - 使用 Teradata 连接器的 Teradata 到 Hive 表导入工具

我正在使用TDCH将TD表导入Hive。使用以下命令:-hadoopjarteradata-connector-1.3.4.jarcom.teradata.hadoop.tool.TeradataImportTool-urljdbc:teradata://URL-username****-password******-jobtypehive-fileformattextfile-separator","-methodsplit.by.hash-sourcetabletest-sourcefieldnames"name,id"-targettabletest_td-targetfield

Teradata hadoop java apache import hive

scala - Spark 流式传输多个套接字源

我是Spark的新手。对于我的项目，我需要合并来自不同端口上不同流的数据。为了测试我做了一个练习，目的是打印来自不同端口的流的数据。下面你可以看到代码:objecthello{defmain(args:Array[String]){valssc=newStreamingContext(newSparkConf(),Seconds(2))vallines9=ssc.socketTextStream("localhost",9999)vallines8=ssc.socketTextStream("localhost",9998)lines9.print()lines8.print()ssc

字源套接 section lines Dstream scala hadoop apache-spark spark-streaming

hadoop - Spark 错误 : Server IPC version 9 cannot communicate with client version 4

我运行的是hadoop2.7.0版本、scala2.10.4、java1.7.0_21和spark1.3.0我创建了一个如下所示的小文件hduser@ubuntu:~$cat/home/hduser/test_sample/sample1.txtEid1,EName1,EDept1,100Eid2,EName2,EDept1,102Eid3,EName3,EDept1,101Eid4,EName4,EDept2,110Eid5,EName5,EDept2,121Eid6,EName6,EDept3,99运行以下命令时出现错误。scala>valemp=sc.textFile("/hom

version communicate section sample EName hadoop apache-spark