草庐IT

以 100GB SSB 性能测试为例,通过 ByteHouse 云数仓开启你的数据分析之路

ByteHouse 团队 2023-10-10 原文

I. 传统数仓的演进:云数仓

近年来,随着数据“爆炸式”的增长,越来越多的数据被产生、收集和存储。而挖掘海量数据中的真实价值,从其中提取商机并洞见未来,则成了现代企业和组织不可忽视的命题。

随着数据量级和复杂度的增大,数据分析处理的技术架构也在不断演进。在面对海量数据分析时,传统 OLAP 技术架构中的痛点变得越来越明显,如扩容缩容耗时长,导致资源利用率偏低,成本居高不下;以及运维配置复杂,需要专业的技术人员介入等。

为了解决这类问题,云数仓的概念应运而生。和传统数仓架构不同的是,云原生数仓借助于云平台的基础资源,实现了资源的动态扩缩容,并最大化利用资源,从而达到 Pay as you go 按实际用量付费的模式。

ByteHouse 作为云原生的数据平台,从架构层面入手,通过存储和计算分离的云原生架构完美适配云上基础设施。在字节跳动内部,ByteHouse 已经支持 80% 的分析应用场景,包括用户增长业务、广告、A/B 测试等。除了极致的分析性能之外,ByteHouse 开箱即用,按实际使用付费的特性也极大地降低了企业和个人的上手门槛,能够在短短数分钟内体验到数据分析的魅力。Talk is cheap, 接下来就让我们通过一个实战案例来体验下 ByteHouse 云数仓的强大功能。

II. 快速上手 ByteHouse —— 轻量级云数仓

本章节通过使用 ByteHouse 云数仓进行 SSB 基准测试,在带领读者了解产品性能的同时,也一并熟悉产品中各个模块的功能,开启你的数据分析之路,通过分析海量数据,加速数据洞察。ByteHouse 的架构总览如下。

SSB 基准测试

SSB(Star Schema Benchmark)是由麻省州立大学波士顿校区的研究员定义的基于现实商业应用的数据模型。SSB 是在 TPC-H 标准的基础上改进而成,主要将 TPC-H 中的雪花模型改成了更为通用的的星型模型,将基准查询从复杂的 Ad-hoc 查询改成了结构更加固定的 OLAP 查询,从而主要用于模拟测试 OLAP 引擎和轻量数仓场景下的查询性能。由于 SSB 基准测试较为中立,并贴近现实的商业场景,因此在学界及工业界有广泛的应用。

SSB 基准测试中对应的表结构如下所示,可以看到 SSB 主要采用星型模型,其中包含了 1 个事实表 lineorder 和 4 个维度表 customer, part, dwdate 以及 supplier,每张维度表通过 Primary Key 和事实表进行关联。测试通过执行 13 条 SQL 进行查询,包含了多表关联,group by,复杂条件等多种组合。更多详细信息请参考 SSB 文献 (​​https://www.cs.umb.edu/~poneil/StarSchemaB.pdf​​)。

步骤一:官网注册并开通 ByteHouse

访问ByteHouse 云数仓火山引擎官网,注册火山引擎账户,完成实名认证后,即可登录到产品控制台。开通产品进行测试,目前 ByteHouse 支持包年包月和按量付费两种模式的实例,便于您根据业务需求进行选择。

ByteHouse 云数仓火山引擎官网

​https://www.volcengine.com/product/bytehouse-cloud​

产品控制台

​https://console.volcengine.com/bytehouse​

步骤二:创建计算组

登录到控制台后,可以看到数据库表管理、数据加载、SQL 工作表、计算组、查询历史和角色管理等几大模块。分别具有如下作用:

数据库表管理:用于创建和管理数据库、数据表以及视图等数据对象 数据加载:用于从不同的离线和实时数据源如对象存储、Kafka 等地写入数据 SQL 工作表:在界面上编辑、管理并运行 SQL 查询 计算组:创建和管理虚拟的计算资源,用于执行数据查询等操作 查询历史:用于查看 SQL 的历史执行记录、状态和查询详情等

为了方便进行后续的建库建表和查询等操作,首先在 ByteHouse 控制台创建型号为 L 的计算组,如下图所示

计算组是 Bytehouse 中的计算资源集群,可按需进行横向扩展。计算组提供所需的资源如 CPU、内存及临时存储等,用于执行数据查询 DQL、DML 等操作。ByteHouse 计算组能够实现弹性扩缩容,读写分离、存算分离等,并且能对资源进行细粒度的权限控制。

步骤三:创建数据库表

在控制台页面中创建名为 ssb_``100 的数据库

创建完毕后,进入到 SQL 工作表模块,通过如下建表语句建立四个数据表(事实表),并保存对应的 SQL 语句。

CREATE TABLE ssb_100.customer
(
C_CUSTKEY UInt32,
C_NAME String,
C_ADDRESS String,
C_CITY LowCardinality(String),
C_NATION LowCardinality(String),
C_REGION LowCardinality(String),
C_PHONE String,
C_MKTSEGMENT LowCardinality(String),
C_PLACEHOLDER Nullable(String)
)
ENGINE = CnchMergeTree ORDER BY (C_CUSTKEY);
CREATE TABLE ssb_100.lineorder
(
LO_ORDERKEY UInt32,
LO_LINENUMBER UInt8,
LO_CUSTKEY UInt32,
LO_PARTKEY UInt32,
LO_SUPPKEY UInt32,
LO_ORDERDATE Date,
LO_ORDERPRIORITY LowCardinality(String),
LO_SHIPPRIORITY UInt8,
LO_QUANTITY UInt8,
LO_EXTENDEDPRICE UInt32,
LO_ORDTOTALPRICE UInt32,
LO_DISCOUNT UInt8,
LO_REVENUE UInt32,
LO_SUPPLYCOST UInt32,
LO_TAX UInt8,
LO_COMMITDATE Date,
LO_SHIPMODE LowCardinality(String),
LO_PLACEHOLDER Nullable(String)
)
ENGINE = CnchMergeTree PARTITION BY toYear(LO_ORDERDATE) ORDER BY (LO_ORDERDATE, LO_ORDERKEY);
CREATE TABLE ssb_100.part
(
P_PARTKEY UInt32,
P_NAME String,
P_MFGR LowCardinality(String),
P_CATEGORY LowCardinality(String),
P_BRAND LowCardinality(String),
P_COLOR LowCardinality(String),
P_TYPE LowCardinality(String),
P_SIZE UInt8,
P_CONTAINER LowCardinality(String),
P_PLACEHOLDER Nullable(String)
)
ENGINE = CnchMergeTree ORDER BY P_PARTKEY;
CREATE TABLE ssb_100.supplier
(
S_SUPPKEY UInt32,
S_NAME String,
S_ADDRESS String,
S_CITY LowCardinality(String),
S_NATION LowCardinality(String),
S_REGION LowCardinality(String),
S_PHONE String,
S_PLACEHOLDER Nullable(String)
)
ENGINE = CnchMergeTree ORDER BY S_SUPPKEY;
CREATE TABLE ssb_100.dwdate
(
D_DATEKEY UInt32,
D_DATE String,
D_DAYOFWEEK String, -- defined in Section 2.6 as Size 8, but Wednesday is 9 letters
D_MONTH String,
D_YEAR UInt32,
D_YEARMONTHNUM UInt32,
D_YEARMONTH String,
D_DAYNUMINWEEK UInt32,
D_DAYNUMINMONTH UInt32,
D_DAYNUMINYEAR UInt32,
D_MONTHNUMINYEAR UInt32,
D_WEEKNUMINYEAR UInt32,
D_SELLINGSEASON String,
D_LASTDAYINWEEKFL UInt32,
D_LASTDAYINMONTHFL UInt32,
D_HOLIDAYFL UInt32,
D_WEEKDAYFL UInt32,
S_PLACEHOLDER Nullable(String)
)
ENGINE=CnchMergeTree() ORDER BY (D_DATEKEY);

SQL 执行完毕后,在控制台左侧对应的数据对象页面会展示出创建完成的五个工作表,分别为 customer,dwdate,lineorder以及part 和 supplier

步骤四:从对象存储中导入 SSB 数据

通过预先生成 SSB_100 GB 的数据集并存储在对象存储(如 AWS S3 或者 火山引擎 TOS),我们可以方便且快速的将数据导入到 ByteHouse 中进行分析。本次实践中通过配置 火山引擎 TOS 的数据源对数据进行导入。

首先在数据加载模块,新建对象存储数据源,并配置对应的秘钥连接火山引擎对象存储

连接新的数据源后,选择 bytehouse-shared-dataset 的储存桶和ssb_100/lineorder.csv 相应的路径

选择之前建的数据库ssb_100和对应标表lineorder,然后按创建。重复步骤为其他四个工作表数据加载。

数据源中存储的数据条数如下所示。用于导入完成后,对数据表的行数进行统计,进行准确性校验。

创建导入任务完成后,点击“开始”启动导入任务,任务启动后会在几秒钟内分配资源并初始化导入任务,并在导入过程中展示预估的时间和导入进度。在导入任务的执行详情中,可以查看导入状态、导入详细日志、配置信息等。

步骤五:数据处理及分析

1、原始查询测试

通过执行 SSB 的 13 条查询语句,对于多表关联和排序等场景进行性能测试。查询语句如下所示:

-- pre-warm
select * from ssb_100.customer order by C_CUSTKEY desc limit 100;
select * from ssb_100.dwdate order by D_DATEKEY desc limit 100;
select * from ssb_100.lineorder order by LO_ORDERKEY desc limit 100;
select * from ssb_100.part order by P_PARTKEY desc limit 100;
select * from ssb_100.supplier order by S_SUPPKEY desc limit 100;
select * from ssb_100.lineorder_flat order by LO_ORDERKEY desc limit 100;

-- Q1.1
select sum(LO_EXTENDEDPRICE*LO_DISCOUNT) as revenue
from ssb_100.lineorder
where toYear(LO_ORDERDATE) = 1993
and LO_DISCOUNT between 1 and 3
and LO_QUANTITY < 25;

-- Q1.2
select sum(LO_EXTENDEDPRICE*LO_DISCOUNT) as revenue
from ssb_100.lineorder
where toYYYYMM(LO_ORDERDATE) = 199401
and LO_DISCOUNT between 4 and 6
and LO_QUANTITY between 26 and 35;

-- Q1.3
select sum(LO_EXTENDEDPRICE*LO_DISCOUNT) as revenue
from ssb_100.lineorder
where toISOWeek(LO_ORDERDATE) = 6
and toYear(LO_ORDERDATE)= 1994
and LO_DISCOUNT between 5 and 7
and LO_QUANTITY between 26 and 35;

-- Q2.1
select sum(LO_REVENUE), toYear(LO_ORDERDATE) AS d_year, P_BRAND
from ssb_100.lineorder, ssb_100.part, ssb_100.supplier
where LO_PARTKEY = P_PARTKEY and LO_SUPPKEY = S_SUPPKEY
and P_CATEGORY = 'MFGR#53' and S_REGION = 'AMERICA'
GROUP BY d_year, P_BRAND;

-- Q2.2
SELECT sum(LO_REVENUE), toYear(LO_ORDERDATE) AS year, P_BRAND
FROM ssb_100.lineorder, ssb_100.part, ssb_100.supplier
WHERE LO_PARTKEY = P_PARTKEY and LO_SUPPKEY = S_SUPPKEY
and P_BRAND >= 'MFGR#2221' and P_BRAND <= 'MFGR#2228' and S_REGION = 'ASIA'
GROUP BY year, P_BRAND
ORDER BY year, P_BRAND;

-- Q2.3
SELECT sum(LO_REVENUE), toYear(LO_ORDERDATE) AS year, P_BRAND
FROM ssb_100.lineorder, ssb_100.part, ssb_100.supplier
WHERE LO_PARTKEY = P_PARTKEY and LO_SUPPKEY = S_SUPPKEY
and P_BRAND = 'MFGR#2239'and S_REGION = 'EUROPE'
GROUP BY year, P_BRAND
ORDER BY year, P_BRAND;

-- Q3.1
SELECT C_NATION, S_NATION, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue
FROM ssb_100.lineorder, ssb_100.part, ssb_100.supplier, ssb_100.customer
WHERE LO_PARTKEY = P_PARTKEY and LO_SUPPKEY = S_SUPPKEY and LO_CUSTKEY = C_CUSTKEY
and C_REGION = 'ASIA' AND S_REGION = 'ASIA' AND year >= 1992 AND year <= 1997
GROUP BY C_NATION, S_NATION, year
ORDER BY year ASC, revenue DESC;

-- Q3.2
SELECT C_CITY, S_CITY, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue
FROM ssb_100.lineorder, ssb_100.part, ssb_100.supplier, ssb_100.customer
WHERE LO_PARTKEY = P_PARTKEY and LO_SUPPKEY = S_SUPPKEY and LO_CUSTKEY = C_CUSTKEY
and C_NATION = 'UNITED STATES' AND S_NATION = 'UNITED STATES' AND year >= 1992 AND year <= 1997
GROUP BY C_CITY, S_CITY, year
ORDER BY year ASC, revenue DESC;

-- Q3.3
SELECT C_CITY, S_CITY, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue
FROM ssb_100.lineorder, ssb_100.part, ssb_100.supplier, ssb_100.customer
WHERE LO_PARTKEY = P_PARTKEY and LO_SUPPKEY = S_SUPPKEY and LO_CUSTKEY = C_CUSTKEY
and (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND year >= 1992 AND year <= 1997
GROUP BY C_CITY, S_CITY, year
ORDER BY year ASC, revenue DESC;

-- Q3.4
SELECT C_CITY, S_CITY, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue
FROM ssb_100.lineorder, ssb_100.part, ssb_100.supplier, ssb_100.customer
WHERE LO_PARTKEY = P_PARTKEY and LO_SUPPKEY = S_SUPPKEY and LO_CUSTKEY = C_CUSTKEY
and (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND toYYYYMM(LO_ORDERDATE) = 199712
GROUP BY C_CITY, S_CITY, year
ORDER BY year ASC, revenue DESC;

-- Q4.1
SELECT toYear(LO_ORDERDATE) AS year, C_NATION, sum(LO_REVENUE - LO_SUPPLYCOST) AS profit
FROM ssb_100.lineorder, ssb_100.part, ssb_100.supplier, ssb_100.customer
WHERE LO_PARTKEY = P_PARTKEY and LO_SUPPKEY = S_SUPPKEY and LO_CUSTKEY = C_CUSTKEY
and C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2')
GROUP BY year, C_NATION
ORDER BY year ASC, C_NATION ASC;

-- Q4.2
SELECT toYear(LO_ORDERDATE) AS year, S_NATION, P_CATEGORY, sum(LO_REVENUE - LO_SUPPLYCOST) AS profit
FROM ssb_100.lineorder, ssb_100.part, ssb_100.supplier, ssb_100.customer
WHERE LO_PARTKEY = P_PARTKEY and LO_SUPPKEY = S_SUPPKEY and LO_CUSTKEY = C_CUSTKEY
and C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (year = 1997 OR year = 1998) AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2')
GROUP BY year, S_NATION, P_CATEGORY
ORDER BY year ASC, S_NATION ASC, P_CATEGORY ASC;

-- Q4.3
SELECT toYear(LO_ORDERDATE) AS year, S_CITY, P_BRAND, sum(LO_REVENUE - LO_SUPPLYCOST) AS profit
FROM ssb_100.lineorder, ssb_100.part, ssb_100.supplier
WHERE LO_PARTKEY = P_PARTKEY and LO_SUPPKEY = S_SUPPKEY
and S_NATION = 'UNITED STATES' AND (year = 1997 OR year = 1998) AND P_CATEGORY = 'MFGR#14'
GROUP BY year, S_CITY, P_BRAND
ORDER BY year ASC, S_CITY ASC, P_BRAND ASC;

2.打平表测试

为了方便对 SSB 数据集进行测试,我们可以通过改写 SSB,将星型模型打平转换为大宽表进行分析

注:为了确保打平表的执行,需要配置参数 SET max_memory_usage = 20000000000; 此外需要在 ByteHouse 控制台中配置查询超时为 3600s (我的账户 > 查询配置 > 查询超时),避免执行超时导致的失败。

SET max_memory_usage = 20000000000;
SET send_timeout = 3600;
SET receive_timeout = 3600;
CREATE TABLE IF NOT EXISTS ssb_100.lineorder_flat
engine = CnchMergeTree
partition by toYear(LO_ORDERDATE)
order by (LO_ORDERDATE, LO_ORDERKEY) as
select
L.LO_ORDERKEY as LO_ORDERKEY,
L.LO_LINENUMBER as LO_LINENUMBER,
L.LO_CUSTKEY as LO_CUSTKEY,
L.LO_PARTKEY as LO_PARTKEY,
L.LO_SUPPKEY as LO_SUPPKEY,
L.LO_ORDERDATE as LO_ORDERDATE,
L.LO_ORDERPRIORITY as LO_ORDERPRIORITY,
L.LO_SHIPPRIORITY as LO_SHIPPRIORITY,
L.LO_QUANTITY as LO_QUANTITY,
L.LO_EXTENDEDPRICE as LO_EXTENDEDPRICE,
L.LO_ORDTOTALPRICE as LO_ORDTOTALPRICE,
L.LO_DISCOUNT as LO_DISCOUNT,
L.LO_REVENUE as LO_REVENUE,
L.LO_SUPPLYCOST as LO_SUPPLYCOST,
L.LO_TAX as LO_TAX,
L.LO_COMMITDATE as LO_COMMITDATE,
L.LO_SHIPMODE as LO_SHIPMODE,
C.C_NAME as C_NAME,
C.C_ADDRESS as C_ADDRESS,
C.C_CITY as C_CITY,
C.C_NATION as C_NATION,
C.C_REGION as C_REGION,
C.C_PHONE as C_PHONE,
C.C_MKTSEGMENT as C_MKTSEGMENT,
S.S_NAME as S_NAME,
S.S_ADDRESS as S_ADDRESS,
S.S_CITY as S_CITY,
S.S_NATION as S_NATION,
S.S_REGION as S_REGION,
S.S_PHONE as S_PHONE,
P.P_NAME as P_NAME,
P.P_MFGR as P_MFGR,
P.P_CATEGORY as P_CATEGORY,
P.P_BRAND as P_BRAND,
P.P_COLOR as P_COLOR,
P.P_TYPE as P_TYPE,
P.P_SIZE as P_SIZE,
P.P_CONTAINER as P_CONTAINER
from ssb_100.lineorder as L
inner join ssb_100.customer as C on C.C_CUSTKEY = L.LO_CUSTKEY
inner join ssb_100.supplier as S on S.S_SUPPKEY = L.LO_SUPPKEY
inner join ssb_100.part as P on P.P_PARTKEY = L.LO_PARTKEY;

建表完成后,通过执行查询语句进行 SSB 性能测试,如下所示:

-- F1.1
SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
FROM ssb_100.lineorder_flat
WHERE toYear(LO_ORDERDATE) = 1993
AND LO_DISCOUNT BETWEEN 1 AND 3
AND LO_QUANTITY < 25;
-- F1.2
SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
FROM ssb_100.lineorder_flat
WHERE toYYYYMM(LO_ORDERDATE) = 199401
AND LO_DISCOUNT BETWEEN 4 AND 6
AND LO_QUANTITY BETWEEN 26 AND 35;
-- F1.3
SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
FROM ssb_100.lineorder_flat
WHERE toISOWeek(LO_ORDERDATE) = 6
AND toYear(LO_ORDERDATE) = 1994
AND LO_DISCOUNT BETWEEN 5 AND 7
AND LO_QUANTITY BETWEEN 26 AND 35;
-- F2.1
SELECT sum(LO_REVENUE), toYear(LO_ORDERDATE) AS year, P_BRAND
FROM ssb_100.lineorder_flat
WHERE P_CATEGORY = 'MFGR#12' AND S_REGION = 'AMERICA'
GROUP BY year, P_BRAND
ORDER BY year, P_BRAND;
-- F2.2
SELECT sum(LO_REVENUE), toYear(LO_ORDERDATE) AS year, P_BRAND
FROM ssb_100.lineorder_flat
WHERE P_BRAND >= 'MFGR#2221' AND P_BRAND <= 'MFGR#2228' AND S_REGION = 'ASIA'
GROUP BY year, P_BRAND
ORDER BY year, P_BRAND;
-- F2.3
SELECT sum(LO_REVENUE), toYear(LO_ORDERDATE) AS year, P_BRAND
FROM ssb_100.lineorder_flat
WHERE P_BRAND = 'MFGR#2239' AND S_REGION = 'EUROPE'
GROUP BY year, P_BRAND
ORDER BY year, P_BRAND;
-- F3.1
SELECT C_NATION, S_NATION, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue
FROM ssb_100.lineorder_flat
WHERE C_REGION = 'ASIA' AND S_REGION = 'ASIA' AND year >= 1992 AND year <= 1997
GROUP BY C_NATION, S_NATION, year
ORDER BY year ASC, revenue DESC;
-- F3.2
SELECT C_CITY, S_CITY, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue
FROM ssb_100.lineorder_flat
WHERE C_NATION = 'UNITED STATES' AND S_NATION = 'UNITED STATES' AND year >= 1992 AND year <= 1997
GROUP BY C_CITY, S_CITY, year
ORDER BY year ASC, revenue DESC;
-- F3.3
SELECT C_CITY, S_CITY, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue
FROM ssb_100.lineorder_flat
WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND year >= 1992 AND year <= 1997
GROUP BY C_CITY, S_CITY, year
ORDER BY year ASC, revenue DESC;
-- F3.4
SELECT C_CITY, S_CITY, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue
FROM ssb_100.lineorder_flat
WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND toYYYYMM(LO_ORDERDATE) = 199712
GROUP BY C_CITY, S_CITY, year
ORDER BY year ASC, revenue DESC;
-- F4.1
SELECT toYear(LO_ORDERDATE) AS year, C_NATION, sum(LO_REVENUE - LO_SUPPLYCOST) AS profit
FROM ssb_100.lineorder_flat
WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2')
GROUP BY year, C_NATION
ORDER BY year ASC, C_NATION ASC;
-- F4.2
SELECT toYear(LO_ORDERDATE) AS year, S_NATION, P_CATEGORY, sum(LO_REVENUE - LO_SUPPLYCOST) AS profit
FROM ssb_100.lineorder_flat
WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (year = 1997 OR year = 1998) AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2')
GROUP BY year, S_NATION, P_CATEGORY
ORDER BY year ASC, S_NATION ASC, P_CATEGORY ASC;
-- F4.3
SELECT toYear(LO_ORDERDATE) AS year, S_CITY, P_BRAND, sum(LO_REVENUE - LO_SUPPLYCOST) AS profit
FROM ssb_100.lineorder_flat
WHERE S_NATION = 'UNITED STATES' AND (year = 1997 OR year = 1998) AND P_CATEGORY = 'MFGR#14'
GROUP BY year, S_CITY, P_BRAND
ORDER BY year ASC, S_CITY ASC, P_BRAND ASC;

III. 查询结果和成本分析

执行完毕后,统计查询结果如下所示:

注:查询结果因配置参数和资源配置的不同,耗时也有差异,欢迎联系 ByteHouse 进行查询优化。

查询完成后,在 ByteHouse 计算组详情页面可以查看工作负载,包括总查询条数和 CPU/Mem 利用率等,从而确认计算资源的使用情况。

根据本次压测进行预估,消耗计算和存储资源如下表所示,由于 ByteHouse 云数仓版本按使用量计费的能力,在空闲时支持自动关闭计算组并不收取闲置费用,从而能够极大的节省资源。测试完成后,预估的总体消耗约为 31.23 元。

有关以 100GB SSB 性能测试为例,通过 ByteHouse 云数仓开启你的数据分析之路的更多相关文章

  1. Ruby 的数字方法性能 - 2

    我正在使用Ruby解决一些ProjectEuler问题,特别是这里我要讨论的问题25(Fibonacci数列中包含1000位数字的第一项的索引是多少?)。起初,我使用的是Ruby2.2.3,我将问题编码为:number=3a=1b=2whileb.to_s.length但后来我发现2.4.2版本有一个名为digits的方法,这正是我需要的。我转换为代码:whileb.digits.length当我比较这两种方法时,digits慢得多。时间./025/problem025.rb0.13s用户0.02s系统80%cpu0.190总计./025/problem025.rb2.19s用户0.0

  2. ruby - Ruby 性能中的计时器 - 2

    我正在寻找一个用ruby​​演示计时器的在线示例,并发现了下面的代码。它按预期工作,但这个简单的程序使用30Mo内存(如Windows任务管理器中所示)和太多CPU有意义吗?非常感谢deftime_blockstart_time=Time.nowThread.new{yield}Time.now-start_timeenddefrepeat_every(seconds)whiletruedotime_spent=time_block{yield}#Tohandle-vesleepinteravalsleep(seconds-time_spent)iftime_spent

  3. ruby-on-rails - 如果条件与 &&,是否有任何性能提升 - 2

    如果用户是所有者,我有一个条件来检查说删除和文章。delete_articleifuser.owner?另一种方式是user.owner?&&delete_article选择它有什么好处还是它只是一种写作风格 最佳答案 性能不太可能成为该声明的问题。第一个要好得多-它更容易阅读。您future的自己和其他将开始编写代码的人会为此感谢您。 关于ruby-on-rails-如果条件与&&,是否有任何性能提升,我们在StackOverflow上找到一个类似的问题:

  4. ruby - 如何找到我的 Ruby 应用程序中的性能瓶颈? - 2

    我编写了一个Ruby应用程序,它可以解析来自不同格式html、xml和csv文件的源中的大量数据。我如何找出代码的哪些区域花费的时间最长?有没有关于如何提高Ruby应用程序性能的好资源?或者您是否有任何始终遵循的性能编码标准?例如,你总是用加入你的字符串吗?output=String.newoutput或者你会使用output="#{part_one}#{part_two}\n" 最佳答案 好吧,有一些众所周知的做法,例如字符串连接比“#{value}”慢得多,但是为了找出您的脚本在哪里消耗了大部分时间或比所需时间更多,您需要进行分

  5. u盘安装系统(win10为例) - 2

    下载微PE工具箱进入官网下载微PE工具箱-下载 安装好后,打开微PE工具箱客户端,选择安装PE到U盘 PE壁纸可选择自己喜欢的壁纸,勾选上包含DOS工具箱,个性化盘符图标 下载原版系统进入网站下载镜像NEXT,ITELLYOU如果没有账号,注册一下就好进入选择开始使用选择win10 这里我们选择消费者版,用迅雷把BT种子下载下来 下面的两个盘符,是PE工具箱安装进U盘后,分成的盘符,注意EFI的盘符,这里面不能删东西,也不能添东西,另一个盘符可以当做正常的U盘空间使用,我们现在需要把下载下来的景象文件复制到正常的U盘空间中去 这个时候我们的系统U盘就只做好了 安装系统我们将U盘插入电脑,开机,

  6. 100个python算法超详细讲解:画直线 - 2

    1.问题描述使用Python的turtle(海龟绘图)模块提供的函数绘制直线。2.问题分析一幅复杂的图形通常都可以由点、直线、三角形、矩形、平行四边形、圆、椭圆和圆弧等基本图形组成。其中的三角形、矩形、平行四边形又可以由直线组成,而直线又是由两个点确定的。我们使用Python的turtle模块所提供的函数来绘制直线。在使用之前我们先介绍一下turtle模块的相关知识点。turtle模块提供面向对象和面向过程两种形式的海龟绘图基本组件。面向对象的接口类如下:1)TurtleScreen类:定义图形窗口作为绘图海龟的运动场。它的构造器需要一个tkinter.Canvas或ScrolledCanva

  7. STM32的HAL和LL库区别和性能对比 - 2

    LL库和HAL库简介LL:Low-Layer,底层库HAL:HardwareAbstractionLayer,硬件抽象层库LL库和hal库对比,很精简,这实际上是一个精简的库。LL库的配置选择如下:在STM32CUBEMX中,点击菜单的“ProjectManager”–>“AdvancedSettings”,在下面的界面中选择“AdvancedSettings”,然后在每个模块后面选择使用的库总结:1、如果使用的MCU是小容量的,那么STM32CubeLL将是最佳选择;2、如果结合可移植性和优化,使用STM32CubeHAL并使用特定的优化实现替换一些调用,可保持最大的可移植性。另外HAL和L

  8. ruby-on-rails - 计算数组中的项目跨越数千条记录的 100 条 - 2

    我有一个带有Postgres数据库的Rails应用程序,该数据库有一个带有jsonbgenres列的Artists表。有几十万行。该行中的每个流派列都有一个类似["rock","indie","seenlive","alternative","indierock"]的数组,其中包含不同的流派。我想要做的是在所有行中以JSON格式输出每种类型的计数。类似于:{"rock":532,"powermetal":328,"indie":862}有没有办法有效地做到这一点?更新...这是我目前得到的...genres=Artist.all.pluck(:genres).flatten.delet

  9. ruby - GC.disable 的任何性能缺点? - 2

    是否存在GC.disable会降低性能的情况?只要我使用的是真正的RAM而不是交换内存,就可以这样做吗?我正在使用MRIRuby2.0,据我所知,它是64位的,并且使用的是64位的Ubuntu:ruby2.0.0p0(2013-02-24revision39474)[x86_64-linux]Linux[redacted]3.2.0-43-generic#68-UbuntuSMPWedMay1503:33:33UTC2013x86_64x86_64x86_64GNU/Linux 最佳答案 GC.disable将禁用垃圾回收。像rub

  10. ruby-on-rails - Rails with angular 与 Rails pure(查看性能) - 2

    我尝试在Internet上搜索有关使用angularJS进入RubyonRails项目与RubyonRailspure的View性能的信息。我的问题是因为2个月前我开始使用纯AngularJS,现在我需要将AngularJS集成到一个新项目中,但需要展示使用带有RubyonRails的AngularJS呈现View的性能如何,并消除对RubyonRails的负担.例如:带Rails的Angular:使用RubyonRails获取数据(从数据库或GET请求),将信息发送到file.js.erb并使用AngularJS操作数据并显示带有解析数据的View。纯粹的Rails:(自然流程)使用

随机推荐