草庐IT

php - 提高(查询)性能

coder 2023-10-22 原文

我有一个数据库,其中包含大量记录(n_building/n_residence 表中的数以千计,buildinggeo 表中的数以百万计)。这是数据库的简化版本:

这是导出到 SQL 时的样子:

CREATE TABLE IF NOT EXISTS `district` (
  `districtid` INT(20) NOT NULL COMMENT 'cbs_wijk_cd',
  `description` VARCHAR(255) NOT NULL COMMENT 'cbs_wijk_oms',
  `municipalityid` INT(20) NOT NULL COMMENT 'FK gemeente',
  PRIMARY KEY (`districtid`),
  INDEX `wijk_gemeente_fk` (`municipalityid` ASC),
  CONSTRAINT `fk_district_municipality`
    FOREIGN KEY (`municipalityid`)
    REFERENCES `municipality` (`municipalityid`)
    ON DELETE CASCADE
    ON UPDATE CASCADE)
ENGINE = InnoDB;

CREATE TABLE IF NOT EXISTS `neighborhood` (
  `neighborhoodid` INT(20) NOT NULL COMMENT 'cbs_buurt_cd',
  `description` VARCHAR(255) NOT NULL COMMENT 'cbs_buurt_oms',
  `districtid` INT(20) NOT NULL COMMENT 'FK wijk',
  `municipalityid` INT(20) NOT NULL COMMENT 'FK gemeente',
  PRIMARY KEY (`neighborhoodid`),
  INDEX `buurt_gemeente_fk` (`municipalityid` ASC),
  INDEX `buurt_wijk_fk` (`districtid` ASC),
  FULLTEXT INDEX `index_neighborhood_description` (`description` ASC),
  CONSTRAINT `fk_neighborhood_municipality`
    FOREIGN KEY (`municipalityid`)
    REFERENCES `municipality` (`municipalityid`)
    ON DELETE CASCADE
    ON UPDATE CASCADE,
  CONSTRAINT `fk_neighborhood_district`
    FOREIGN KEY (`districtid`)
    REFERENCES `district` (`districtid`)
    ON DELETE CASCADE
    ON UPDATE CASCADE)
ENGINE = InnoDB;

CREATE TABLE IF NOT EXISTS `n_building` (
  `buildingid` BIGINT(20) NOT NULL,
  `neighborhoodid` INT(10) NOT NULL,
  `constructionyear` INT(4) NOT NULL,
  `height` INT(3) NOT NULL DEFAULT 9,
  `multifamily` TINYINT(1) NOT NULL DEFAULT 0,
  PRIMARY KEY (`buildingid`),
  INDEX `fk_building_buurt_idx` (`neighborhoodid` ASC),
  INDEX `index_neighborhoodid_buildingid` (`neighborhoodid` ASC, `buildingid` ASC),
  CONSTRAINT `fk_building_neighborhood`
    FOREIGN KEY (`neighborhoodid`)
    REFERENCES `neighborhood` (`neighborhoodid`)
    ON DELETE CASCADE
    ON UPDATE CASCADE)
ENGINE = InnoDB;

CREATE TABLE IF NOT EXISTS `n_buildinggeo` (
  `buildingid` BIGINT(20) NOT NULL,
  `order` INT(5) NOT NULL,
  `lat` DECIMAL(11,8) NOT NULL,
  `lon` DECIMAL(11,8) NOT NULL,
  PRIMARY KEY (`buildingid`, `order`),
  CONSTRAINT `fk_buildinggeo_building`
    FOREIGN KEY (`buildingid`)
    REFERENCES `n_building` (`buildingid`)
    ON DELETE CASCADE
    ON UPDATE CASCADE)
ENGINE = InnoDB;

CREATE TABLE IF NOT EXISTS `n_residence` (
  `residenceid` BIGINT(20) NOT NULL,
  `buildingid` BIGINT(20) NOT NULL,
  `geolat` DECIMAL(11,8) NOT NULL,
  `geolon` DECIMAL(11,8) NOT NULL,
  PRIMARY KEY (`residenceid`),
  INDEX `fk_residence_building_idx` (`buildingid` ASC),
  INDEX `index_geoloat_geolon_residenceid` (`geolat` ASC, `geolon` ASC, `residenceid` ASC),
  INDEX `index_geolat` (`geolat` ASC),
  INDEX `index_geolon` (`geolon` ASC),
  CONSTRAINT `fk_residence_building`
    FOREIGN KEY (`buildingid`)
    REFERENCES `n_building` (`buildingid`)
    ON DELETE CASCADE
    ON UPDATE CASCADE)
ENGINE = InnoDB;

CREATE TABLE IF NOT EXISTS `n_vabidata` (
  `residenceid` BIGINT(20) NOT NULL,
  `index` FLOAT NULL COMMENT '  ',
  `indexdate` VARCHAR(25) NULL,
  `type` VARCHAR(100) NULL,
  `subtype` VARCHAR(150) NULL,
  `rooftype` VARCHAR(50) NULL,
  PRIMARY KEY (`residenceid`),
  CONSTRAINT `fk_vabidata_residence`
    FOREIGN KEY (`residenceid`)
    REFERENCES `n_residence` (`residenceid`)
    ON DELETE CASCADE
    ON UPDATE CASCADE)
ENGINE = InnoDB;

我的目标是像这样创建该数据库内容的特定部分的 JSON 表示(这显然是匿名数据):

[
    {
        "buildingid": "632100000000000",
        "buurtid": "6320103",
        "constructionyear": "1969",
        "height": "9",
        "multifamily": "0",
        "gemeenteid": "632",
        "geo": [
            {
                "lat": "52.000",
                "lon": "4.000"
            },
            {
                "lat": "52.000",
                "lon": "4.000"
            },
            {
                "lat": "52.000",
                "lon": "4.000"
            },
            {
                "lat": "52.000",
                "lon": "4.000"
            },
            {
                "lat": "52.000",
                "lon": "4.000"
            }
        ],
        "res": [
            {
                "residenceid": "632010000000000",
                "surface": "159",
                "postalcode": "3400AA",
                "streetname": "Streetname",
                "housenumber": "00",
                "clusternr": "6320103533",
                "owner": "onbekend",
                "usageelec": "2463",
                "usagegas": "2006",
                "nomupd": "0",
                "cpwin": "0",
                "cpble": "0",
                "enet": "0",
                "gnet": "0",
                "type": null
            }
        ]
    }
]

有两种过滤数据库的方法:通过 neighbourhoodid(该社区的所有建筑物等)或通过边界框(所有建筑物等)。起初我决定以一种非常简单的方式这样做:

$path2 = Config::Path(2);//minlat
$path3 = Config::Path(3);//minlon
$path4 = Config::Path(4);//maxlat
$path5 = Config::Path(5);//maxlon

if (($path2 && is_numeric($path2) && $path3 && is_numeric($path3) &&
     $path4 && is_numeric($path4) && $path5 && is_numeric($path5)) ||
        ($path2 == "district" && $path3 && is_numeric($path3))) {
    if ($path2 == "neighborhood") {
        $buildings = DBUtils::FetchQuery("
SELECT  b.`buildingid`, b.`neighborhoodid` AS buurtid, b.`constructionyear`,
        b.`height`, b.`multifamily`, n.`municipalityid` AS gemeenteid
    FROM  `neighborhood` n
    INNER JOIN  `n_building` b ON b.`neighborhoodid` = n.`neighborhoodid`
    INNER JOIN  `n_residence` r ON r.`buildingid` = b.`buildingid`
    WHERE  b.`neighborhoodid` = '$path3'
    GROUP BY  b.`buildingid`;
");
    } else {
        $buildings = DBUtils::FetchQuery("
SELECT  b.`buildingid`, b.`neighborhoodid` AS buurtid, b.`constructionyear`,
        b.`height`, b.`multifamily`, n.`municipalityid` AS gemeenteid
    FROM  `neighborhood` n
    INNER JOIN  `n_building` b ON b.`neighborhoodid` = n.`neighborhoodid`
    INNER JOIN  `n_residence` r ON r.`buildingid` = b.`buildingid`
    WHERE  r.`geolat` >= '$path2'
      AND  r.`geolon` >= '$path3'
      AND  r.`geolat` <= '$path4'
      AND  r.`geolon` <= '$path5'
    GROUP BY  b.`buildingid`;
");
    }

    if ($buildings && count($buildings) > 0) {
        for ($i = 0; $i < count($buildings); $i++) {
            $building = $buildings[$i];

            $buildinggeo = DBUtils::FetchQuery("
SELECT  bg.`lat`, bg.`lon`
    FROM  `n_buildinggeo` bg
    WHERE  bg.`buildingid` = '$building[buildingid]';
");

            if ($buildinggeo && count($buildinggeo) > 0) {
                $buildings[$i]['geo'] = $buildinggeo;

                $buildingresidences = DBUtils::FetchQuery("
SELECT  r.`residenceid`, r.`surface`, r.`postalcode`, r.`streetname`,
        r.`housenumber`, r.`clusternr`, r.`owner`, r.`usageelec`,
        r.`usagegas`, r.`nomupd`, r.`cpwin`, r.`cpble`, r.`enet`,
        r.`gnet`, v.`type`
    FROM  `n_residence` r
    LEFT OUTER JOIN  `n_vabidata` v ON r.`residenceid` = v.`residenceid`
    WHERE  r.`buildingid` = '$building[buildingid]';
");

                if ($buildingresidences && count($buildingresidences) > 0) {
                    $buildings[$i]['res'] = $buildingresidences;
                }
            }
        }

        echo json_encode($buildings);
    }
}

后来我决定在单个查询中获取所有建筑物/住宅/vabidata 信息,并从中创建所需的 JSON 结构,因为每个请求(> 5 个建筑物)中的大部分时间都花在获取住宅数据上。

$path2 = Config::Path(2);//minlat
$path3 = Config::Path(3);//minlon
$path4 = Config::Path(4);//maxlat
$path5 = Config::Path(5);//maxlon

if (($path2 && is_numeric($path2) && $path3 && is_numeric($path3) &&
     $path4 && is_numeric($path4) && $path5 && is_numeric($path5)) ||
         ($path2 == "district" && $path3 && is_numeric($path3))) {
    if ($path2 == "district") {
        $results = DBUtils::FetchQuery("
SELECT  b.`buildingid`, b.`neighborhoodid`, b.`constructionyear`,
        b.`height`, b.`multifamily`, n.`municipalityid`, r.`residenceid`,
        r.`surface`, r.`postalcode`, r.`streetname`, r.`housenumber`,
        r.`clusternr`, r.`owner`, r.`usageelec`, r.`usagegas`,
        r.`nomupd`, r.`cpwin`, r.`cpble`, r.`enet`, r.`gnet`,
        v.`type`
    FROM  `neighborhood` n
    INNER JOIN  `n_building` b ON b.`neighborhoodid` = n.`neighborhoodid`
    INNER JOIN  `n_residence` r ON r.`buildingid` = b.`buildingid`
    LEFT OUTER JOIN  `n_vabidata` v ON r.`residenceid` = v.`residenceid`
    WHERE  b.`neighborhoodid` = '$path3';
");
    } else {
        $results = DBUtils::FetchQuery("
SELECT  b.`buildingid`, b.`neighborhoodid`, b.`constructionyear`,
        b.`height`, b.`multifamily`, n.`municipalityid`, r.`residenceid`,
        r.`surface`, r.`postalcode`, r.`streetname`, r.`housenumber`,
        r.`clusternr`, r.`owner`, r.`usageelec`, r.`usagegas`,
        r.`nomupd`, r.`cpwin`, r.`cpble`, r.`enet`, r.`gnet`,
        v.`type`
    FROM  `neighborhood` n
    INNER JOIN  `n_building` b ON b.`neighborhoodid` = n.`neighborhoodid`
    INNER JOIN  `n_residence` r ON r.`buildingid` = b.`buildingid`
    LEFT OUTER JOIN  `n_vabidata` v ON r.`residenceid` = v.`residenceid`
    WHERE  r.`geolat` >= '$path2'
      AND  r.`geolon` >= '$path3'
      AND  r.`geolat` <= '$path4'
      AND  r.`geolon` <= '$path5';
");
    }

    if ($results && count($results) > 0) {
        $buildings = array();

        for ($i = 0; $i < count($results); $i++) {
            $b = $results[$i];

            if (!array_key_exists($b['buildingid'],$buildings)) {
                $buildings[$b['buildingid']] = array(
                    "buildingid"        => $b['buildingid'],
                    "buurtid"           => $b['neighborhoodid'],
                    "constructionyear"  => $b['constructionyear'],
                    "height"            => $b['height'],
                    "multifamily"       => $b['multifamily'],
                    "gemeenteid"        => $b['municipalityid'],
                    "res"               => array()
                );
            }

            $buildings[$b['buildingid']]['res'][] = array(
                "residenceid"   => $b['residenceid'],
                "surface"       => $b['surface'],
                "postalcode"    => $b['postalcode'],
                "streetname"    => $b['streetname'],
                "housenumber"   => $b['housenumber'],
                "clusternr"     => $b['clusternr'],
                "owner"         => $b['owner'],
                "usageelec"     => $b['usageelec'],
                "usagegas"      => $b['usagegas'],
                "nomupd"        => $b['nomupd'],
                "cpwin"         => $b['cpwin'],
                "cpble"         => $b['cpble'],
                "enet"          => $b['enet'],
                "gnet"          => $b['gnet'],
                "type"          => $b['type']
            );
        }

        $buildings = array_values($buildings);

        for ($i = 0; $i < count($buildings); $i++) {
            $building = $buildings[$i];

            $buildinggeo = DBUtils::FetchQuery("
SELECT  bg.`lat`, bg.`lon`
    FROM  `n_buildinggeo` bg
    WHERE  bg.`buildingid` = '$building[buildingid]';
");

            if ($buildinggeo && count($buildinggeo) > 0) {
                $buildings[$i]['geo'] = $buildinggeo;
            }
        }

        echo json_encode($buildings);
    }
}

但是这种方法似乎比以前的方法慢了 30-70%。我的问题是:你们中的任何人都可以找到原因(以及可能的解决方案)为什么一个/两个查询都表现得如此糟糕吗?如果您有任何疑问或需要任何其他信息,请询问。

编辑

这是对第二个查询(一体式)的边界框版本的 EXPLAIN 的结果:

最佳答案

WHERE  b.`neighborhoodid` = '$path3'
GROUP BY  b.`buildingid`

b 需要 INDEX(neighborhoodid, buildingid) -- 按此顺序
r 需要 INDEX(buildingid)
r 需要 INDEX(lat, lon, residenceid) -- 按此顺序

(请提供SHOW CREATE TABLE 以便我们可以看到您有哪些索引。)

WHERE  r.`geolat` >= '$path2'
  AND  r.`geolon` >= '$path3'
  AND  r.`geolat` <= '$path4'
  AND  r.`geolon` <= '$path5'

目前无法优化。参见 http://mysql.rjweb.org/doc.php/latlng了解如何处理按纬度/经度进行的大量搜索。

关于php - 提高(查询)性能,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34592922/

有关php - 提高(查询)性能的更多相关文章

  1. ruby - ECONNRESET (Whois::ConnectionError) - 尝试在 Ruby 中查询 Whois 时出错 - 2

    我正在用Ruby编写一个简单的程序来检查域列表是否被占用。基本上它循环遍历列表,并使用以下函数进行检查。require'rubygems'require'whois'defcheck_domain(domain)c=Whois::Client.newc.query("google.com").available?end程序不断出错(即使我在google.com中进行硬编码),并打印以下消息。鉴于该程序非常简单,我已经没有什么想法了-有什么建议吗?/Library/Ruby/Gems/1.8/gems/whois-2.0.2/lib/whois/server/adapters/base.

  2. ruby-on-rails - 在 Rails 和 ActiveRecord 中查询时忽略某些字段 - 2

    我知道我可以指定某些字段来使用pluck查询数据库。ids=Item.where('due_at但是我想知道,是否有一种方法可以指定我想避免从数据库查询的某些字段。某种反拔?posts=Post.where(published:true).do_not_lookup(:enormous_field) 最佳答案 Model#attribute_names应该返回列/属性数组。您可以排除其中一些并传递给pluck或select方法。像这样:posts=Post.where(published:true).select(Post.attr

  3. 程序员如何提高代码能力? - 2

    前言作为一名程序员,自己的本质工作就是做程序开发,那么程序开发的时候最直接的体现就是代码,检验一个程序员技术水平的一个核心环节就是开发时候的代码能力。众所周知,程序开发的水平提升是一个循序渐进的过程,每一位程序员都是从“菜鸟”变成“大神”的,所以程序员在程序开发过程中的代码能力也是根据平时开发中的业务实践来积累和提升的。提高代码能力核心要素程序员要想提高自身代码能力,尤其是新晋程序员的代码能力有很大的提升空间的时候,需要针对性的去提高自己的代码能力。提高代码能力其实有几个比较关键的点,只要把握住这些方面,就能很好的、快速的提高自己的一部分代码能力。1、多去阅读开源项目,如有机会可以亲自参与开源

  4. sql - 查询忽略时间戳日期的时间范围 - 2

    我正在尝试查询我的Rails数据库(Postgres)中的购买表,我想查询时间范围。例如,我想知道在所有日期的下午2点到3点之间进行了多少次购买。此表中有一个created_at列,但我不知道如何在不搜索特定日期的情况下完成此操作。我试过:Purchases.where("created_atBETWEEN?and?",Time.now-1.hour,Time.now)但这最终只会搜索今天与那些时间的日期。 最佳答案 您需要使用PostgreSQL'sdate_part/extractfunction从created_at中提取小时

  5. Ruby 的数字方法性能 - 2

    我正在使用Ruby解决一些ProjectEuler问题,特别是这里我要讨论的问题25(Fibonacci数列中包含1000位数字的第一项的索引是多少?)。起初,我使用的是Ruby2.2.3,我将问题编码为:number=3a=1b=2whileb.to_s.length但后来我发现2.4.2版本有一个名为digits的方法,这正是我需要的。我转换为代码:whileb.digits.length当我比较这两种方法时,digits慢得多。时间./025/problem025.rb0.13s用户0.02s系统80%cpu0.190总计./025/problem025.rb2.19s用户0.0

  6. ruby-on-rails - solr 清理查询 - 2

    我在Rails上使用带有ruby​​的solr。一切正常,我只需要知道是否有任何现有代码来清理用户输入,比如以?开头的查询。或* 最佳答案 我不知道执行此操作的任何代码,但理论上可以通过查看parsingcodeinLucene来完成并搜索thrownewParseException(只有16个匹配!)。在实践中,我认为您最好只捕获代码中的任何solr异常并显示“无效查询”消息或类似信息。编辑:这里有几个“sanitizer”:http://pivotallabs.com/users/zach/blog/articles/937-s

  7. ruby - Ruby 性能中的计时器 - 2

    我正在寻找一个用ruby​​演示计时器的在线示例,并发现了下面的代码。它按预期工作,但这个简单的程序使用30Mo内存(如Windows任务管理器中所示)和太多CPU有意义吗?非常感谢deftime_blockstart_time=Time.nowThread.new{yield}Time.now-start_timeenddefrepeat_every(seconds)whiletruedotime_spent=time_block{yield}#Tohandle-vesleepinteravalsleep(seconds-time_spent)iftime_spent

  8. ruby-on-rails - Rails 3 在一个查询中包含多个表 - 2

    我正在为锦标赛开发一个Rails应用程序。我在这个查询中使用了三个模型:classPlayertruehas_and_belongs_to_many:tournamentsclassTournament:destroyclassPlayerMatch"Player",:foreign_key=>"player_one"belongs_to:player_two,:class_name=>"Player",:foreign_key=>"player_two"在tournaments_controller的显示操作中,我调用以下查询:Tournament.where(:id=>params

  9. ruby-on-rails - Sunspot:如何对具有不同值的多个字段进行全文查询? - 2

    我想用sunspot重现以下原始solr查询q=exact_term_text:fooORterm_textv:foo*ORalternate_text:bar*但我无法通过标准的太阳黑子界面理解这是否可能以及如何实现,因为看起来:fulltext方法似乎不接受多个文本/搜索字段参数我不知道将什么参数作为第一个参数传递给fulltext,就好像我通过了"foo"或"bar"结果不匹配如果我传递一个空参数,我得到一个q=*:*范围过滤器(例如with(:term).starting_with('foo*')(顾名思义)作为过滤器查询应用,因此不参与评分。似乎可以手动编写字符串(或者可能使

  10. ruby-on-rails - 如果条件与 &&,是否有任何性能提升 - 2

    如果用户是所有者,我有一个条件来检查说删除和文章。delete_articleifuser.owner?另一种方式是user.owner?&&delete_article选择它有什么好处还是它只是一种写作风格 最佳答案 性能不太可能成为该声明的问题。第一个要好得多-它更容易阅读。您future的自己和其他将开始编写代码的人会为此感谢您。 关于ruby-on-rails-如果条件与&&,是否有任何性能提升,我们在StackOverflow上找到一个类似的问题:

随机推荐