我在 MySQL 中有下表:
CREATE TABLE `events` (
`pv_name` varchar(60) COLLATE utf8mb4_bin NOT NULL,
`time_stamp` bigint(20) unsigned NOT NULL,
`event_type` varchar(40) COLLATE utf8mb4_bin NOT NULL,
`has_data` tinyint(1) NOT NULL,
`data` json DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin ROW_FORMAT=COMPRESSED;
ALTER TABLE `events`
ADD PRIMARY KEY (`pv_name`,`time_stamp`),
ADD UNIQUE KEY `has_data` (`pv_name`,`has_data`,`time_stamp`);
我试图找到不同的 pv_names 集,这些 pv_names 在两个给定时间之间有一些没有数据的行。以下两个查询似乎都返回此信息:
mysql> EXPLAIN SELECT pv_name FROM events
WHERE has_data = 0
AND events.time_stamp > 0 AND events.time_stamp < 9999999999999999999
GROUP BY events.pv_name;
+----+-------------+--------+------------+-------+------------------+----------+---------+------+---------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+-------+------------------+----------+---------+------+---------+----------+--------------------------+
| 1 | SIMPLE | events | NULL | index | PRIMARY,has_data | has_data | 251 | NULL | 1855281 | 1.11 | Using where; Using index |
+----+-------------+--------+------------+-------+------------------+----------+---------+------+---------+----------+--------------------------+
mysql> EXPLAIN SELECT pv_name, MAX(events.time_stamp) FROM events
WHERE has_data = 0
AND events.time_stamp > 0 AND events.time_stamp < 9999999999999999999
GROUP BY events.pv_name;
+----+-------------+--------+------------+-------+------------------+----------+---------+------+--------+----------+---------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+-------+------------------+----------+---------+------+--------+----------+---------------------------------------+
| 1 | SIMPLE | events | NULL | range | PRIMARY,has_data | has_data | 251 | NULL | 203123 | 100.00 | Using where; Using index for group-by |
+----+-------------+--------+------------+-------+------------------+----------+---------+------+--------+----------+---------------------------------------+
我不明白的是,为什么第二个查询(它对返回的内容有额外限制(我不需要)似乎比第一个查询运行的时间更少)。有没有一种方法可以改进第一个查询以匹配第二个查询的效率,而无需对 time_stamp 列进行聚合?
编辑:
根据 Rick James 的建议,我更改了 has_data 索引:
ALTER TABLE `events`
ADD PRIMARY KEY (`pv_name`,`time_stamp`), ADD KEY `has_data` (`has_data`,`pv_name`,`time_stamp`);
这将查询报告更改为:
mysql> EXPLAIN SELECT pv_name FROM events WHERE has_data = 0 AND events.time_stamp > 0 AND events.time_stamp < 9999999999999999999 GROUP BY events.pv_name;
+----+-------------+--------+------------+------+------------------+----------+---------+-------+--------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+------+------------------+----------+---------+-------+--------+----------+--------------------------+
| 1 | SIMPLE | events | NULL | ref | PRIMARY,has_data | has_data | 1 | const | 267096 | 11.11 | Using where; Using index |
+----+-------------+--------+------------+------+------------------+----------+---------+-------+--------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT pv_name, MAX(events.time_stamp) FROM events WHERE has_data = 0 AND events.time_stamp > 0 AND events.time_stamp < 9999999999999999999 GROUP BY events.pv_name;
+----+-------------+--------+------------+------+------------------+----------+---------+-------+--------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+------+------------------+----------+---------+-------+--------+----------+--------------------------+
| 1 | SIMPLE | events | NULL | ref | PRIMARY,has_data | has_data | 1 | const | 267096 | 11.11 | Using where; Using index |
+----+-------------+--------+------------+------+------------------+----------+---------+-------+--------+----------+--------------------------+
1 row in set, 1 warning (0.01 sec)
这似乎运行得更快。
编辑:
Rick James 要求的测试结果:
mysql> FLUSH STATUS;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT pv_name FROM events WHERE has_data = 0 AND events.time_stamp > 0 AND events.time_stamp < 9999999999999999999 GROUP BY events.pv_name;
.
.
.
114480 rows in set (0.34 sec)
mysql> SHOW SESSION STATUS LIKE 'Handler%';
+----------------------------+--------+
| Variable_name | Value |
+----------------------------+--------+
| Handler_commit | 1 |
| Handler_delete | 0 |
| Handler_discover | 0 |
| Handler_external_lock | 2 |
| Handler_mrr_init | 0 |
| Handler_prepare | 0 |
| Handler_read_first | 0 |
| Handler_read_key | 1 |
| Handler_read_last | 0 |
| Handler_read_next | 125527 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_next | 0 |
| Handler_rollback | 0 |
| Handler_savepoint | 0 |
| Handler_savepoint_rollback | 0 |
| Handler_update | 0 |
| Handler_write | 0 |
+----------------------------+--------+
18 rows in set (0.01 sec)
mysql> SELECT COUNT(*) FROM events;
+----------+
| COUNT(*) |
+----------+
| 3683887 |
+----------+
1 row in set (11.66 sec)
编辑:
运行时间:
mysql> SHOW INDEXES FROM events;
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| events | 0 | PRIMARY | 1 | pv_name | A | 216061 | NULL | NULL | | BTREE | | |
| events | 0 | PRIMARY | 2 | time_stamp | A | 4450791 | NULL | NULL | | BTREE | | |
| events | 1 | has_data | 1 | has_data | A | 258 | NULL | NULL | | BTREE | | |
| events | 1 | has_data | 2 | pv_name | A | 496542 | NULL | NULL | | BTREE | | |
| events | 1 | has_data | 3 | time_stamp | A | 4390035 | NULL | NULL | | BTREE | | |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
5 rows in set (0.00 sec)
mysql> EXPLAIN SELECT events.pv_name FROM events WHERE has_data = 0 AND events.time_stamp > 0 AND events.time_stamp < 9999999999999999999 GROUP BY events.pv_name;
+----+-------------+--------+------------+------+------------------+----------+---------+-------+--------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+------+------------------+----------+---------+-------+--------+----------+--------------------------+
| 1 | SIMPLE | events | NULL | ref | PRIMARY,has_data | has_data | 1 | const | 267096 | 11.11 | Using where; Using index |
+----+-------------+--------+------------+------+------------------+----------+---------+-------+--------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT events.pv_name, MAX(events.time_stamp) FROM events WHERE has_data = 0 AND events.time_stamp > 0 AND events.time_stamp < 9999999999999999999 GROUP BY events.pv_name;
+----+-------------+--------+------------+------+------------------+----------+---------+-------+--------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+------+------------------+----------+---------+-------+--------+----------+--------------------------+
| 1 | SIMPLE | events | NULL | ref | PRIMARY,has_data | has_data | 1 | const | 267096 | 11.11 | Using where; Using index |
+----+-------------+--------+------------+------+------------------+----------+---------+-------+--------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
SELECT events.pv_name FROM events WHERE has_data = 0 AND events.time_stamp > 0 AND events.time_stamp < 9999999999999999999 GROUP BY events.pv_name;
114480 rows in set (0.37 sec)
SELECT events.pv_name, MAX(events.time_stamp) FROM events WHERE has_data = 0 AND events.time_stamp > 0 AND events.time_stamp < 9999999999999999999 GROUP BY events.pv_name;
114480 rows in set (0.30 sec)
mysql> SHOW INDEXES FROM events;
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| events | 0 | PRIMARY | 1 | pv_name | A | 422951 | NULL | NULL | | BTREE | | |
| events | 0 | PRIMARY | 2 | time_stamp | A | 4321990 | NULL | NULL | | BTREE | | |
| events | 0 | has_data | 1 | pv_name | A | 240067 | NULL | NULL | | BTREE | | |
| events | 0 | has_data | 2 | has_data | A | 436525 | NULL | NULL | | BTREE | | |
| events | 0 | has_data | 3 | time_stamp | A | 4205163 | NULL | NULL | | BTREE | | |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
5 rows in set (0.00 sec)
mysql> EXPLAIN SELECT events.pv_name FROM events WHERE has_data = 0 AND events.time_stamp > 0 AND events.time_stamp < 9999999999999999999 GROUP BY events.pv_name;
+----+-------------+--------+------------+-------+------------------+----------+---------+------+---------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+-------+------------------+----------+---------+------+---------+----------+--------------------------+
| 1 | SIMPLE | events | NULL | index | PRIMARY,has_data | has_data | 251 | NULL | 4462633 | 1.11 | Using where; Using index |
+----+-------------+--------+------------+-------+------------------+----------+---------+------+---------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT events.pv_name, MAX(events.time_stamp) FROM events WHERE has_data = 0 AND events.time_stamp > 0 AND events.time_stamp < 9999999999999999999 GROUP BY events.pv_name;
+----+-------------+--------+------------+-------+------------------+----------+---------+------+--------+----------+---------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+-------+------------------+----------+---------+------+--------+----------+---------------------------------------+
| 1 | SIMPLE | events | NULL | range | PRIMARY,has_data | has_data | 251 | NULL | 240076 | 100.00 | Using where; Using index for group-by |
+----+-------------+--------+------------+-------+------------------+----------+---------+------+--------+----------+---------------------------------------+
1 row in set, 1 warning (0.00 sec)
SELECT events.pv_name FROM events WHERE has_data = 0 AND events.time_stamp > 0 AND events.time_stamp < 9999999999999999999 GROUP BY events.pv_name;
114480 rows in set (6.79 sec)
SELECT events.pv_name, MAX(events.time_stamp) FROM events WHERE has_data = 0 AND events.time_stamp > 0 AND events.time_stamp < 9999999999999999999 GROUP BY events.pv_name;
114480 rows in set (2.65 sec)
最佳答案
根据 [文档](http://dev.mysql.com/doc/refman/5.7/en/group-by-optimization.html 用于松散索引扫描):
除查询中引用的 GROUP BY 之外,索引的任何其他部分都必须是常量(也就是说,它们必须与常量等价引用),MIN() 或 MAX( ) 功能。
在您的第一个查询中,time_stamp 被引用但不是常量。在您的第二个查询中,time_stamp 也在 MAX() 的参数中。因此,松散索引扫描适用于这种情况。
关于通过添加聚合改进了 MySQL 分组查询性能,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40854279/
当我使用Bundler时,是否需要在我的Gemfile中将其列为依赖项?毕竟,我的代码中有些地方需要它。例如,当我进行Bundler设置时:require"bundler/setup" 最佳答案 没有。您可以尝试,但首先您必须用鞋带将自己抬离地面。 关于ruby-我需要将Bundler本身添加到Gemfile中吗?,我们在StackOverflow上找到一个类似的问题: https://stackoverflow.com/questions/4758609/
在控制台中反复尝试之后,我想到了这种方法,可以按发生日期对类似activerecord的(Mongoid)对象进行分组。我不确定这是完成此任务的最佳方法,但它确实有效。有没有人有更好的建议,或者这是一个很好的方法?#eventsisanarrayofactiverecord-likeobjectsthatincludeatimeattributeevents.map{|event|#converteventsarrayintoanarrayofhasheswiththedayofthemonthandtheevent{:number=>event.time.day,:event=>ev
我正在用Ruby编写一个简单的程序来检查域列表是否被占用。基本上它循环遍历列表,并使用以下函数进行检查。require'rubygems'require'whois'defcheck_domain(domain)c=Whois::Client.newc.query("google.com").available?end程序不断出错(即使我在google.com中进行硬编码),并打印以下消息。鉴于该程序非常简单,我已经没有什么想法了-有什么建议吗?/Library/Ruby/Gems/1.8/gems/whois-2.0.2/lib/whois/server/adapters/base.
尝试通过RVM将RubyGems升级到版本1.8.10并出现此错误:$rvmrubygemslatestRemovingoldRubygemsfiles...Installingrubygems-1.8.10forruby-1.9.2-p180...ERROR:Errorrunning'GEM_PATH="/Users/foo/.rvm/gems/ruby-1.9.2-p180:/Users/foo/.rvm/gems/ruby-1.9.2-p180@global:/Users/foo/.rvm/gems/ruby-1.9.2-p180:/Users/foo/.rvm/gems/rub
我有一个ModularSinatra应用程序,我正在尝试将Bootstrap添加到应用程序中。get'/bootstrap/application.css'doless:"bootstrap/bootstrap"end我在views/bootstrap中有所有less文件,包括bootstrap.less。我收到这个错误:Less::ParseErrorat/bootstrap/application.css'reset.less'wasn'tfound.Bootstrap.less的第一行是://CSSReset@import"reset.less";我尝试了所有不同的路径格式,但它
我正在使用puppet为ruby程序提供一组常量。我需要提供一组主机名,我的程序将对其进行迭代。在我之前使用的bash脚本中,我只是将它作为一个puppet变量hosts=>"host1,host2"我将其提供给bash脚本作为HOSTS=显然这对ruby不太适用——我需要它的格式hosts=["host1","host2"]自从phosts和putsmy_array.inspect提供输出["host1","host2"]我希望使用其中之一。不幸的是,我终其一生都无法弄清楚如何让它发挥作用。我尝试了以下各项:我发现某处他们指出我需要在函数调用前放置“function_”……这
我正在使用Sequel构建一个愿望list系统。我有一个wishlists和itemstable和一个items_wishlists连接表(该名称是续集选择的名称)。items_wishlists表还有一个用于facebookid的额外列(因此我可以存储opengraph操作),这是一个NOTNULL列。我还有Wishlist和Item具有续集many_to_many关联的模型已建立。Wishlist类也有:selectmany_to_many关联的选项设置为select:[:items.*,:items_wishlists__facebook_action_id].有没有一种方法可以
我正在编写一个gem,我必须在其中fork两个启动两个webrick服务器的进程。我想通过基类的类方法启动这个服务器,因为应该只有这两个服务器在运行,而不是多个。在运行时,我想调用这两个服务器上的一些方法来更改变量。我的问题是,我无法通过基类的类方法访问fork的实例变量。此外,我不能在我的基类中使用线程,因为在幕后我正在使用另一个不是线程安全的库。所以我必须将每个服务器派生到它自己的进程。我用类变量试过了,比如@@server。但是当我试图通过基类访问这个变量时,它是nil。我读到在Ruby中不可能在分支之间共享类变量,对吗?那么,还有其他解决办法吗?我考虑过使用单例,但我不确定这是
我的最终目标是安装当前版本的RubyonRails。我在OSXMountainLion上运行。到目前为止,这是我的过程:已安装的RVM$\curl-Lhttps://get.rvm.io|bash-sstable检查已知(我假设已批准)安装$rvmlistknown我看到当前的稳定版本可用[ruby-]2.0.0[-p247]输入命令安装$rvminstall2.0.0-p247注意:我也试过这些安装命令$rvminstallruby-2.0.0-p247$rvminstallruby=2.0.0-p247我很快就无处可去了。结果:$rvminstall2.0.0-p247Search
我在理解Enumerator.new方法的工作原理时遇到了一些困难。假设文档中的示例:fib=Enumerator.newdo|y|a=b=1loopdoy[1,1,2,3,5,8,13,21,34,55]循环中断条件在哪里,它如何知道循环应该迭代多少次(因为它没有任何明确的中断条件并且看起来像无限循环)? 最佳答案 Enumerator使用Fibers在内部。您的示例等效于:require'fiber'fiber=Fiber.newdoa=b=1loopdoFiber.yieldaa,b=b,a+bendend10.times.m