如何为matomo设置自动存档以优化响应速度

如果您的网站每天的访问量超过几百次（太棒了！），等待 Matomo 处理您的数据可能需要几分钟时间。避免这些等待时间的最佳方法是在服务器上设置一个 cron 作业，以便每小时自动处理一次数据。

如果您将 Matomo 用于 WordPress，则无需执行此操作，因为它利用了 WP Cron。如果您在 Matomo Cloud 上，这也会自动为您处理。

要自动触发 Matomo 存档，您可以设置每小时执行一次的脚本。

以下是针对使用 crontab 的 Linux/Unix 系统的说明，以及针对使用 Windows 任务计划程序的 Windows 用户以及 CPanel 等工具的说明。如果您无权访问服务器，还可以设置 Web cron。

Linux/Unix：如何设置 crontab 以自动存档报告。

crontab 是类 Unix 服务器中基于时间的调度服务。crontab 需要安装 php-cli 或 php-cgi。您还需要对服务器进行 SSH 访问才能进行设置。让我们使用文本编辑器创建一个新的 crontab：nano

nano /etc/cron.d/matomo-archive

，然后添加以下行：

MAILTO="youremail@example.com"
5 * * * * www-data /usr/bin/php /path/to/matomo/console core:archive --url=http://example.org/matomo/ > /home/example/matomo-archive.log

Matomo 存档脚本将每小时运行一次（过去 5 分钟）。通常，它会在不到一分钟的时间内完成。在较大的网站（10,000 次访问或更多）上，Matomo 存档最多可能需要 30 分钟。

参数明细：

MAILTO="youremail@example.com"如果在脚本执行过程中出现错误，脚本输出和错误消息将发送到 youremail@example.com 地址。
www-data是 cron 作业将由其执行的用户。用户有时是“apache”。建议以与 Web 服务器用户相同的用户身份运行 crontab（以避免文件权限不匹配）。
/usr/bin/php是 PHP 可执行文件的路径。它因服务器配置和操作系统而异。您可以在 linux shell 中执行命令“which php”或“which php”，以找出 PHP 可执行文件的路径。如果您不知道路径，请咨询您的虚拟主机或系统管理员。
/path/to/matomo/console是服务器上 Matomo 应用的路径。例如，它可能是 ./var/www/matomo/console
--url=http://example.org/matomo/是脚本中唯一必需的参数，必须设置为 Matomo 基本 URL，例如。http://analytics.example.org/ 或 http://example.org/matomo/
> /home/example/matomo-archive.log是脚本将写入输出的路径。如果您不想记录最后一个 Matomo cron 输出文本，则可以将此路径替换为 /dev/null。脚本输出包含有用的信息，例如存档了哪些网站，处理每个日期和网站所需的时间等。此日志文件应写入 Web 服务器之外的位置，以便人们无法通过浏览器查看它（因为此日志文件将包含有关 Matomo 安装的一些敏感信息）。您还可以将>替换为 >>，以便将脚本输出附加到日志文件中，而不是在每次运行时覆盖它（但我们建议您轮换此日志文件或删除它，例如每周一次）。
2> /home/example/matomo-archive-errors.log是脚本将写入错误消息的可选路径。如果您在 cron 选项卡中省略了这一点，则错误将通过电子邮件发送到您的 MAILTO 地址。如果在 crontab 中写入此内容，则错误将记录在此指定的错误日志文件中。此日志文件应写入 Web 服务器之外的位置，以便人们无法通过浏览器查看它（因为此日志文件将包含有关 Matomo 安装的一些敏感信息）。

“linux cron”实用程序的说明：cron 实用程序使用两种不同类型的配置文件：系统 crontab 和用户 crontabs。这两种格式之间的唯一区别是第六个字段。

在系统 crontab 中，第六个字段是命令运行时使用的用户名。这使系统 crontab 能够以任何用户身份运行命令。
在用户 crontab 中，第六个字段是要运行的命令，所有命令都以创建 crontab 的用户身份运行;这是一项重要的安全功能。

如果将 crontab 设置为用户 crontab，则改为编写：

5 * * * * /usr/bin/php /path/to/matomo/console core:archive --url=http://example.org/matomo/ > /dev/null

此 cron 作业将在每小时 5 分钟后触发日/周/月/年存档过程。这将确保当您访问 Matomo 仪表板时，数据已经处理过;Matomo 将快速加载。

测试 cron 命令

通过在 shell 中以 crontab 用户身份运行脚本，确保 crontab 能够实际工作：

su www-data -s /bin/bash -c "/usr/bin/php /path/to/matomo/console core:archive --url=http://example.org/matomo/"

您应该会看到脚本输出，其中包含正在存档的网站列表，以及末尾的摘要，说明没有错误。

一次启动多个归档程序

如果您有多个站点，您可能有兴趣并行运行多个存档程序以加快存档速度。我们建议不要同时启动它们，而是每隔几秒钟或几分钟启动它们，以避免并发问题。例如：

5 * * * * /usr/bin/php /path/to/matomo/console core:archive --url=http://example.org/matomo/ > /dev/null
6 * * * * /usr/bin/php /path/to/matomo/console core:archive --url=http://example.org/matomo/ > /dev/null

在上面的示例中，一个存档器将在每小时的第 5 分钟开始，另一个存档器将在一分钟后开始。或者，您也可以使用脚本同时启动多个存档程序，然后通过 cronjob 定期执行该脚本。

CONCURRENT_ARCHIVERS=2
for i in $(seq 1 $CONCURRENT_ARCHIVERS)
do
    (sleep $i && /path/to/matomo/console core:archive & ) 
done

Windows：如何使用 Windows 计划程序设置自动存档

->请参阅我们专门的常见问题解答，了解如何在 Windows 中设置计划任务。

Plesk：如何使用 Plesk 设置 Cron 脚本

在 Plesk Matomo 指南中了解有关在 Plesk 上安装 Matomo 和配置存档 crontab 的更多信息。

CPanel：如何使用 CPanel 设置 Cron 脚本

如果您使用 CPanel、Webmin 或 Plesk 等用户界面，则很容易设置自动存档。以下是 CPanel 的说明：

登录到 CPanel 以安装Matomo的域
点击“Cron Jobs”
将电子邮件留空
在“分钟”中输入 00，其余部分留空。
然后，您需要粘贴 PHP 可执行文件的路径，然后是 Matomo /console 脚本的路径，然后是带有 Matomo 基本 URL 的参数 –url=matomo.example.org/
这是Hostgator安装的示例（在此示例中，您需要将“yourcpanelsitename”更改为您的特定域cpanel用户名）/usr/local/bin/php -f /home/yourcpanelsitename/public_html/matomo/console core:archive --url=example.org/matomo/ > /home/example/matomo-archive-output.log

“YourcPanelSiteName”往往是您域的前八个字母（除非您在设置 cPanel 帐户时更改了它）
6.点击“添加新的 Cron 作业”

Matomo 将在整点自动处理您的报告。

当您的虚拟主机不支持 cron 任务时，Web cron

如果可能，我们强烈建议您运行 cron 或计划任务。但是，在某些共享主机或特定服务器配置上，运行 cron 或计划任务可能并不容易或不可能。

一些网络主机允许您设置网络cron，这是一个简单的URL，主机将在预定时间自动访问。如果您的虚拟主机允许您创建网络cron，您可以在其托管界面中输入以下URL：

https://matomo.your-server.example/path/to/matomo/misc/cron/archive.php?token_auth=XYZ

将 XYZ 替换为超级用户 32 个字符token_auth。要查找token_auth，请在 Matomo 中以超级用户身份登录，单击顶部菜单中的“管理”链接，转到“个人”，然后单击“安全”。向下滚动，您将找到创建新Token_auth的位置。
注意：

为了安全起见，如果可能，我们建议您将 token_auth 参数发送到 URL（而不是将 token_auth 作为参数发送）POSThttps://matomo.your-server.example/path/to/matomo/misc/cron/archive.phpGET
您可以通过在浏览器中粘贴 URL 来测试 Web cron，等待几分钟以完成处理，然后检查输出。
Web cron 应至少每小时触发一次。您也可以使用“网站监控”服务（免费或付费）每小时自动请求此页面。

中高流量网站的重要提示

禁用 Matomo 存档的浏览器触发器，并将 Matomo 报告限制为每小时更新一次

如上所述设置自动存档脚本后，可以设置 Matomo，以便用户界面中的请求不会触发存档，而是读取预先存档的报告。以超级用户身份登录，单击“系统”>“管理”-“常规设置”>然后选择：

从浏览器查看时存档报告：否
最多每 X 秒存档一次报告：3600 秒

单击“保存”以保存更改。现在，您已经设置了存档 cron 并更改了这两个设置，您可以在 Matomo 中享受快速预处理的近乎实时的报告！

今天的统计信息将有一个小时的生存期，这确保了每小时处理一次报告（接近实时）)

增加PHP内存限制

如果收到此错误：

Fatal error: Allowed memory size of 16777216 bytes exhausted (tried to allocate X bytes)

您必须增加分配给 PHP 的内存。要为 Matomo 提供足够的内存来处理 Web 分析报告，请增加内存限制。启用的数据较少或功能较少的站点可以使用 512M 或 2G。（如果问题仍然存在，我们建议进一步增加设置。 8G 是大中型 Matomo 实例的常见大小。

memory_limit = 512M

若要查找文件在服务器上的位置，可以按照以下步骤操作：创建一个文件并添加以下代码：php.initest.php

 <?php phpinfo(); ?>

并在浏览器中打开它，它将显示在您的网络服务器上运行的PHP实际正在读取的文件。它还将显示您当前设置的值。max_execution_time

有关 Matomo 存档的更多信息

如果您每天运行多次存档，它将重新存档今天的报表，以及日期范围（包括今天）的任何报表：当周、当月等。
您的 Matomo 数据库大小会随着时间的推移而增长，这是正常的。Matomo 将删除在不完整时间段内处理的存档（即，当您在本周中旬存档一周时），但不会删除其他存档。这意味着您将在MySQL表中拥有每天，每周，每月和每年的存档。这确保了非常快速的 UI 响应和数据访问，但确实需要磁盘空间。
Matomo 对当前报告的存档不是增量的：每天运行几次存档不会降低数周、数月或每年存档的内存需求。Matomo 将读取全天的所有日志，以处理当天的报告。
一旦每天/每周/每月/每年完成并处理完毕，它将被缓存，而不是由 Matomo 重新处理。
如果未将存档设置为自动运行，则当用户请求 Matomo 报告时，将进行存档。这可能会很慢，并提供糟糕的用户体验（用户必须等待 N 秒）。因此，我们建议您如上所述为大中型网站设置自动存档（单击了解更多信息）。
默认情况下，当您禁用 Matomo 存档的浏览器触发器时，它不会像您预期的那样完全禁用存档触发器。在一种特定情况下，浏览 Matomo 的用户仍然可以触发存档处理：使用自定义区段时。为确保 Matomo 的用户永远不会触发任何数据处理，必须在 config.ini.php 文件中在类别下方添加以下设置：[General]; disable browser trigger archiving for all requests (even those with a segment) browser_archiving_disabled_enforce = 1

core：archive 命令的帮助

以下是此命令的帮助输出：

$ ./console help core:archive
Usage:
 core:archive [--url="..."] [--skip-idsites[="..."]] [--skip-all-segments] [--force-idsites[="..."]] [--skip-segments-today] [--force-  periods[="..."]] [--force-date-last-n[="..."]] [--force-date-range[="..."]] [--force-idsegments="..."] [--concurrent-requests-per-website[="..."]] [--concurrent-archivers[="..."]] [--max-websites-to-process="..."] [--max-archives-to-process="..."] [--disable-scheduled-tasks] [--accept-invalid-ssl-certificate] [--php-cli-options[="..."]] [--force-all-websites] [--force-report[="..."]]

Options:
 --url                              Forces the value of this option to be used as the URL to Matomo. 
                                    If your system does not support archiving with CLI processes, you may need to set this in order for the   archiving HTTP requests to use the desired URLs.
 --skip-idsites                     If specified, archiving will be skipped for these websites (in case these website ids would have been archived).
 --skip-all-segments                If specified, all segments will be skipped during archiving.
 --force-idsites                    If specified, archiving will be processed only for these Sites Ids (comma separated)
 --skip-segments-today              If specified, segments will be only archived for yesterday, but not today. If the segment was created or changed recently, then it will still be archived for today and the setting will be ignored for this segment.
 --force-periods                    If specified, archiving will be processed only for these Periods (comma separated eg. day,week,month,year,range)
 --force-date-last-n                Deprecated. Please use the "process_new_segments_from" INI configuration option instead.
 --force-date-range                 If specified, archiving will be processed only for periods included in this date range. Format: YYYY-MM-DD,YYYY-MM-DD
 --force-idsegments                 If specified, only these segments will be processed (if the segment should be applied to a site in the first place).
                                    Specify stored segment IDs, not the segments themselves, eg, 1,2,3. 
                                    Note: if identical segments exist w/ different IDs, they will both be skipped, even if you only supply one ID.
 --concurrent-requests-per-website  When processing a website and its segments, number of requests to process in parallel (default: 3)
 --concurrent-archivers             The number of max archivers to run in parallel. Depending on how you start the archiver as a cronjob, you  may need to double the amount of archivers allowed if the same process appears twice in the `ps ex` output. (default: false)
 --max-websites-to-process          Maximum number of websites to process during a single execution of the archiver. Can be used to limit the process lifetime e.g. to avoid increasing memory usage.
 --max-archives-to-process          Maximum number of archives to process during a single execution of the archiver. Can be used to limit the process lifetime e.g. to avoid increasing memory usage.
 --disable-scheduled-tasks          Skips executing Scheduled tasks (sending scheduled reports, db optimization, etc.).
 --accept-invalid-ssl-certificate   It is _NOT_ recommended to use this argument. Instead, you should use a valid SSL certificate!
                                    It can be useful if you specified --url=https://... or if you are using Matomo with force_ssl=1
 --php-cli-options                  Forwards the PHP configuration options to the PHP CLI command. For example "-d memory_limit=8G". Note:  These options are only applied if the archiver actually uses CLI and not HTTP. (default: "")
 --force-all-websites               Force archiving all websites.
 --force-report                     If specified, only processes invalidations for a specific report in a specific plugin. Value must be in the format of "MyPlugin.myReport".
 --help (-h)                        Display this help message
 --quiet (-q)                       Do not output any message
 --verbose (-v|vv|vvv)              Increase the verbosity of messages: 1 for normal output, 2 for more verbose output and 3 for debug
 --version (-V)                     Display this application version
 --ansi                             Force ANSI output
 --no-ansi                          Disable ANSI output
 --no-interaction (-n)              Do not ask any interactive question
 --matomo-domain                    Matomo URL (protocol and domain) eg. "http://matomo.example.org"
 --xhprof                           Enable profiling with XHProf

Help:
 * It is recommended to run the script without any option.
 * This script should be executed every hour via crontab, or as a daemon.
 * You can also run it via http:// by specifying the Super User &token_auth=XYZ as a parameter ('Web Cron'),
   but it is recommended to run it via command line/CLI instead.
 * If you have any suggestion about this script, please let the team know at feedback@matomo.org
 * Enjoy!

理解输出`core:archive`

输出日志显示有关存档程序进程的有用信息，特别是正在处理哪些网站和区段。输出特别显示：core:archive

当前正在存档的网站 ID： .INFO [2020-03-31 21:16:29] 23146 Will pre-process for website id = 1, period = month, date = last3
此网站有多少个区段，在本例中，有 25 个区段：.INFO [2020-03-31 21:16:29] 23146 - pre-processing segment 1/25 countryName!=Algeria March 29, 2022
从此存档程序的网站队列中还剩下多少个网站需要处理，在此示例中，它已完成 2 个网站中的 3 个的处理：.INFO [2020-03-31 21:17:07] 23146 Archived website id = 3, 4 API requests, Time elapsed: 18.622s [2/3 done]
如果您正在运行多个进程，则可以通过查看时间戳后面的数字来区分不同的并发存档器：。每个不同的并发存档程序运行将具有不同的编号。因此，如果你在日志中搜索这个数字，你可以找到这个特定 core：archive 线程的输出。您还可以设置指示的无限并发存档器。
core:archive--concurrent-archiversINFO [2020-03-31 21:17:07] 23146 [...]--concurrent-archivers-1

小瑞Blog-轻言轻愿