Using pg_profile for Historical Workload Analysis in PostgreSQL

0


Every DBA responsible for database performance needs a set of historical workload analysis tools to understand the cause of system degradation and hopefully prevent it from happening again. A similar set of tools may be needed for performance tuning tasks when the DBA discovers what needs to be tuned. For PostgreSQL, the pg_profile extension is a simple historical database workload profiler. In this article, I’ll show how pg_profile provides extensive performance statistics and streamlines the ability to understand performance issues.

Most DBAs use monitoring tools to examine the performance of their Postgres database and the server it is running on for long periods of time. These monitoring tools provide basic metrics regarding processor, memory, disk usage, number of transactions, locks, and many other system characteristics. However, this information is insufficient to answer several questions, such as which workload is causing the heaviest resource usage, which statements require additional resources to run, or whether an application is running operations. Queries too often or perhaps requesting an unusually large amount of data from the database.

To answer such questions or investigate an obvious slowdown in performance signaled by a user complaint, DBAs have several choices. One approach is to configure detailed logging for more in-depth log analysis. While this approach provides access to a tremendous amount of detail, analyzing all of this data is difficult and time consuming.

Another approach is to use Server Statistical View Analysis, that is, configuring Postgres Statistics Collector views with the pg_stat_statements and pg_stat_kcache extensions. This approach allows all queries to be tracked without extensive server log generation, providing the ability to easily detect frequently called quick queries. It also tracks the statistics of database objects and there is no need to analyze huge log files. However, this approach also has some drawbacks. It does not reveal the exact values ​​of the parameters that cause a long query execution, and there is no way to get the exact execution plan of a statement.

SEE ALSO: SD-WAN Is A Hot Topic, But How Many Solutions Are Really Security Focused?

What is pg_profile?

pg_profile is a convenient way to perform analysis of server statistical views. It regularly collects data from Postgres Statistics Collector and Extensions, providing a way to detect the most resource-intensive activities for defined time periods in the past. The pg_profile extension was developed using pure pl / pgsql i.e. it has no binaries / libraries, services etc. This makes installation easier and can be used in small environments without access to the server file system. Since one or two samples per hour is usually sufficient to detect the most resource-consuming activities, pg_profile does not create much overhead. Relationships can be constructed between two samples.

pg_profile allows a DBA to set a specific time interval, such as two hours, and create a report showing many performance statistics for that time interval. pg_profile also provides a differential report comparing performance statistics for two time periods side by side. This can help DBAs see the specific differences in resource consumption that occurred during a time with a performance problem and a time without such a problem.

It is important to keep two things in mind with pg_profile. First, pg_profile should already be installed and regularly take samples when an incident occurs. Second, pg_profile does not provide any answers or draw any conclusions. The DBA must have sufficient knowledge of Postgres to analyze a report and locate the cause of the performance problem.

Run pg_profile

Running pg_profile is straightforward.

  1. Extract the extension files to your Postgres extension folder.
  2. Create the dblink and pg_profile extensions.
  3. Configure a cron-like tool to perform pg_profile samples with the desired frequency.

Using pg_profile

The content of a pg_profile example is extensive and includes:

  • pg_stat_database
  • pg_stat_tablespaces
  • pg_stat_all_tables pg_stat_all_indexes
  • pg_statio_all_tables
  • pg_settings
  • pg_stat_bgwriter
  • pg_stat_archiver
  • pg_stat_wal
  • pg_statio_all_indexes
  • pg_stat_user_functions
  • pg_stat_statements pg_stat_kcache

This information allows pg_profile to produce a general report with the following content:

Based on a set of workload data collected by pg_profile, DBAs can also create a report containing a summary of the most important database activities during the interval between two samples. This report contains the following sections:

  • Server Statistics – cluster-wide statistics for a Postgres instance
  • SQL query statistics – Lists of SQL statements sorted by various criteria
  • Schema object statistics – table and index statistics sorted by various criteria
  • User function statistics – user function execution statistics sorted by time and calls
  • Vacuum Statistics – statistics on arrays and indexes influencing the vacuum process
  • Cluster settings during reporting interval – Postgres instance settings and their changes detected in samples

A standard report

A pg_profile report is an html file. To generate a report, simply call the get_report function, providing the samples or times delimiting the required interval:

$ psql -Aqtc "SELECT profile.get_report(480,482)" -o report_480_482.html

The report can be opened in any web browser. Here are some examples of report sections.

Readings by execution time

This report shows that only the first three queries consumed 65% of the overall cluster execution time. They are the first to optimize.

Top SQL by shared blocks recovered

The red label here is a query rank in a previous table (by runtime). It is given for information only and will not appear in a real report. We can see that our time consuming queries are ahead by processing data blocks. Additionally, we found that request 581a0cb27e was consuming 41% of block processing in a cluster. 100% of the results mean that these queries do not cause reads to the file system.

Top tables by blocks retrieved

This shows the processing tables and indexes of the top blocks. 93% of the blocks recovered were made from just two tables and their indexes. We’ll need to take a closer look at these tables to make sure that we don’t have unnecessary queries from applications to these tables.

Top index by blocks recovered

Here we can see some block specific indexes recovered that were not seen in a section relating to tables.

By analyzing these reports, a DBA can determine that:

  • Top three returns consumed 65% of the time and 91% of the blocks
  • The first two tables with index consumed 93% of the blocks

We now know the exact statements and tables responsible for consuming time and blocks in our database and where optimization efforts need to be applied.

SEE ALSO: PostgreSQL 14 has arrived – what should you watch out for?

A differential report

When a database experiences sudden degradation in performance, a DBA can use a differential report to start the investigation. This report can provide information about the variation of the workload between two time intervals, for example, the same time interval for two different days. The report then displays statistics for the same objects side by side, with the first interval values ​​in red and the second interval values ​​in blue.

For example, to compare the period “11: 00-13: 00 today” (when the problem is in progress) with the same period “yesterday” (when system performance was normal), run the following command, in indicating the appropriate time terminals:

The report then displays statistics for both periods.

Database statistics

The values ​​highlighted in green and red indicate an increase in workload of several orders of magnitude for the second interval in terms of block and tuples statistics. At the same time, the transaction rate has only doubled.

Top SQL by runtime

Go further. We can see that the first three queries were running much slower during the second time interval. They took much longer for almost the same number of executions. Additionally, the first query affected many more rows in the second interval. Maybe something has changed in the app that could change the number of rows returned.

Top tables by blocks retrieved

The i6_n_m table definitely has a problem. It is the only table with a huge increase in workload in terms of blocks recovered.

By analyzing these reports, we can determine that:

  • The main declarations executed 4 orders of magnitude longer
  • Slowest query returned 4 orders of magnitude more rows
  • There are 4 orders of magnitude more blocks obtained from the top table

Again, now we know the exact objects and statements related to our problem.

Conclusion

Although pg_profile cannot explain why there was performance degradation, it is a great resource for information to help DBAs answer this question. It is also extremely easy to use with low overhead and only requires a database and a Postgres scheduler to be used. And the work on pg_profile is not finished. New reports will be incorporated into pg_profile in the future.


Leave A Reply

Your email address will not be published.