LCOV - differential code coverage report
Current view: top level - src/backend/access/transam - xlog.c (source / functions) Coverage Total Hit UNC LBC UBC GBC GNC CBC DUB DCB
Current: c70b6db34ffeab48beef1fb4ce61bcad3772b8dd vs 06473f5a344df8c9594ead90a609b86f6724cff8 Lines: 88.8 % 2518 2236 5 3 274 14 2222 6 21
Current Date: 2025-09-06 07:49:51 +0900 Functions: 98.3 % 121 119 2 14 105 1
Baseline: lcov-20250906-005545-baseline Branches: 64.0 % 1801 1152 13 4 632 4 5 1143
Baseline Date: 2025-09-05 08:21:35 +0100 Line coverage date bins:
Legend: Lines:     hit not hit
Branches: + taken - not taken # not executed
(7,30] days: 100.0 % 18 18 2 16
(30,360] days: 82.5 % 57 47 5 1 4 12 35
(360..) days: 88.9 % 2443 2171 2 270 2171
Function coverage date bins:
(30,360] days: 100.0 % 2 2 1 1
(360..) days: 98.3 % 119 117 2 13 104
Branch coverage date bins:
(7,30] days: 83.3 % 12 10 1 1 1 9
(30,360] days: 50.0 % 42 21 12 9 4 17
(360..) days: 64.2 % 1747 1121 4 622 4 1117

 Age         Owner                    Branch data    TLA  Line data    Source code
                                  1                 :                : /*-------------------------------------------------------------------------
                                  2                 :                :  *
                                  3                 :                :  * xlog.c
                                  4                 :                :  *      PostgreSQL write-ahead log manager
                                  5                 :                :  *
                                  6                 :                :  * The Write-Ahead Log (WAL) functionality is split into several source
                                  7                 :                :  * files, in addition to this one:
                                  8                 :                :  *
                                  9                 :                :  * xloginsert.c - Functions for constructing WAL records
                                 10                 :                :  * xlogrecovery.c - WAL recovery and standby code
                                 11                 :                :  * xlogreader.c - Facility for reading WAL files and parsing WAL records
                                 12                 :                :  * xlogutils.c - Helper functions for WAL redo routines
                                 13                 :                :  *
                                 14                 :                :  * This file contains functions for coordinating database startup and
                                 15                 :                :  * checkpointing, and managing the write-ahead log buffers when the
                                 16                 :                :  * system is running.
                                 17                 :                :  *
                                 18                 :                :  * StartupXLOG() is the main entry point of the startup process.  It
                                 19                 :                :  * coordinates database startup, performing WAL recovery, and the
                                 20                 :                :  * transition from WAL recovery into normal operations.
                                 21                 :                :  *
                                 22                 :                :  * XLogInsertRecord() inserts a WAL record into the WAL buffers.  Most
                                 23                 :                :  * callers should not call this directly, but use the functions in
                                 24                 :                :  * xloginsert.c to construct the WAL record.  XLogFlush() can be used
                                 25                 :                :  * to force the WAL to disk.
                                 26                 :                :  *
                                 27                 :                :  * In addition to those, there are many other functions for interrogating
                                 28                 :                :  * the current system state, and for starting/stopping backups.
                                 29                 :                :  *
                                 30                 :                :  *
                                 31                 :                :  * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
                                 32                 :                :  * Portions Copyright (c) 1994, Regents of the University of California
                                 33                 :                :  *
                                 34                 :                :  * src/backend/access/transam/xlog.c
                                 35                 :                :  *
                                 36                 :                :  *-------------------------------------------------------------------------
                                 37                 :                :  */
                                 38                 :                : 
                                 39                 :                : #include "postgres.h"
                                 40                 :                : 
                                 41                 :                : #include <ctype.h>
                                 42                 :                : #include <math.h>
                                 43                 :                : #include <time.h>
                                 44                 :                : #include <fcntl.h>
                                 45                 :                : #include <sys/stat.h>
                                 46                 :                : #include <sys/time.h>
                                 47                 :                : #include <unistd.h>
                                 48                 :                : 
                                 49                 :                : #include "access/clog.h"
                                 50                 :                : #include "access/commit_ts.h"
                                 51                 :                : #include "access/heaptoast.h"
                                 52                 :                : #include "access/multixact.h"
                                 53                 :                : #include "access/rewriteheap.h"
                                 54                 :                : #include "access/subtrans.h"
                                 55                 :                : #include "access/timeline.h"
                                 56                 :                : #include "access/transam.h"
                                 57                 :                : #include "access/twophase.h"
                                 58                 :                : #include "access/xact.h"
                                 59                 :                : #include "access/xlog_internal.h"
                                 60                 :                : #include "access/xlogarchive.h"
                                 61                 :                : #include "access/xloginsert.h"
                                 62                 :                : #include "access/xlogreader.h"
                                 63                 :                : #include "access/xlogrecovery.h"
                                 64                 :                : #include "access/xlogutils.h"
                                 65                 :                : #include "backup/basebackup.h"
                                 66                 :                : #include "catalog/catversion.h"
                                 67                 :                : #include "catalog/pg_control.h"
                                 68                 :                : #include "catalog/pg_database.h"
                                 69                 :                : #include "common/controldata_utils.h"
                                 70                 :                : #include "common/file_utils.h"
                                 71                 :                : #include "executor/instrument.h"
                                 72                 :                : #include "miscadmin.h"
                                 73                 :                : #include "pg_trace.h"
                                 74                 :                : #include "pgstat.h"
                                 75                 :                : #include "port/atomics.h"
                                 76                 :                : #include "postmaster/bgwriter.h"
                                 77                 :                : #include "postmaster/startup.h"
                                 78                 :                : #include "postmaster/walsummarizer.h"
                                 79                 :                : #include "postmaster/walwriter.h"
                                 80                 :                : #include "replication/origin.h"
                                 81                 :                : #include "replication/slot.h"
                                 82                 :                : #include "replication/snapbuild.h"
                                 83                 :                : #include "replication/walreceiver.h"
                                 84                 :                : #include "replication/walsender.h"
                                 85                 :                : #include "storage/bufmgr.h"
                                 86                 :                : #include "storage/fd.h"
                                 87                 :                : #include "storage/ipc.h"
                                 88                 :                : #include "storage/large_object.h"
                                 89                 :                : #include "storage/latch.h"
                                 90                 :                : #include "storage/predicate.h"
                                 91                 :                : #include "storage/proc.h"
                                 92                 :                : #include "storage/procarray.h"
                                 93                 :                : #include "storage/reinit.h"
                                 94                 :                : #include "storage/spin.h"
                                 95                 :                : #include "storage/sync.h"
                                 96                 :                : #include "utils/guc_hooks.h"
                                 97                 :                : #include "utils/guc_tables.h"
                                 98                 :                : #include "utils/injection_point.h"
                                 99                 :                : #include "utils/pgstat_internal.h"
                                100                 :                : #include "utils/ps_status.h"
                                101                 :                : #include "utils/relmapper.h"
                                102                 :                : #include "utils/snapmgr.h"
                                103                 :                : #include "utils/timeout.h"
                                104                 :                : #include "utils/timestamp.h"
                                105                 :                : #include "utils/varlena.h"
                                106                 :                : 
                                107                 :                : #ifdef WAL_DEBUG
                                108                 :                : #include "utils/memutils.h"
                                109                 :                : #endif
                                110                 :                : 
                                111                 :                : /* timeline ID to be used when bootstrapping */
                                112                 :                : #define BootstrapTimeLineID     1
                                113                 :                : 
                                114                 :                : /* User-settable parameters */
                                115                 :                : int         max_wal_size_mb = 1024; /* 1 GB */
                                116                 :                : int         min_wal_size_mb = 80;   /* 80 MB */
                                117                 :                : int         wal_keep_size_mb = 0;
                                118                 :                : int         XLOGbuffers = -1;
                                119                 :                : int         XLogArchiveTimeout = 0;
                                120                 :                : int         XLogArchiveMode = ARCHIVE_MODE_OFF;
                                121                 :                : char       *XLogArchiveCommand = NULL;
                                122                 :                : bool        EnableHotStandby = false;
                                123                 :                : bool        fullPageWrites = true;
                                124                 :                : bool        wal_log_hints = false;
                                125                 :                : int         wal_compression = WAL_COMPRESSION_NONE;
                                126                 :                : char       *wal_consistency_checking_string = NULL;
                                127                 :                : bool       *wal_consistency_checking = NULL;
                                128                 :                : bool        wal_init_zero = true;
                                129                 :                : bool        wal_recycle = true;
                                130                 :                : bool        log_checkpoints = true;
                                131                 :                : int         wal_sync_method = DEFAULT_WAL_SYNC_METHOD;
                                132                 :                : int         wal_level = WAL_LEVEL_REPLICA;
                                133                 :                : int         CommitDelay = 0;    /* precommit delay in microseconds */
                                134                 :                : int         CommitSiblings = 5; /* # concurrent xacts needed to sleep */
                                135                 :                : int         wal_retrieve_retry_interval = 5000;
                                136                 :                : int         max_slot_wal_keep_size_mb = -1;
                                137                 :                : int         wal_decode_buffer_size = 512 * 1024;
                                138                 :                : bool        track_wal_io_timing = false;
                                139                 :                : 
                                140                 :                : #ifdef WAL_DEBUG
                                141                 :                : bool        XLOG_DEBUG = false;
                                142                 :                : #endif
                                143                 :                : 
                                144                 :                : int         wal_segment_size = DEFAULT_XLOG_SEG_SIZE;
                                145                 :                : 
                                146                 :                : /*
                                147                 :                :  * Number of WAL insertion locks to use. A higher value allows more insertions
                                148                 :                :  * to happen concurrently, but adds some CPU overhead to flushing the WAL,
                                149                 :                :  * which needs to iterate all the locks.
                                150                 :                :  */
                                151                 :                : #define NUM_XLOGINSERT_LOCKS  8
                                152                 :                : 
                                153                 :                : /*
                                154                 :                :  * Max distance from last checkpoint, before triggering a new xlog-based
                                155                 :                :  * checkpoint.
                                156                 :                :  */
                                157                 :                : int         CheckPointSegments;
                                158                 :                : 
                                159                 :                : /* Estimated distance between checkpoints, in bytes */
                                160                 :                : static double CheckPointDistanceEstimate = 0;
                                161                 :                : static double PrevCheckPointDistance = 0;
                                162                 :                : 
                                163                 :                : /*
                                164                 :                :  * Track whether there were any deferred checks for custom resource managers
                                165                 :                :  * specified in wal_consistency_checking.
                                166                 :                :  */
                                167                 :                : static bool check_wal_consistency_checking_deferred = false;
                                168                 :                : 
                                169                 :                : /*
                                170                 :                :  * GUC support
                                171                 :                :  */
                                172                 :                : const struct config_enum_entry wal_sync_method_options[] = {
                                173                 :                :     {"fsync", WAL_SYNC_METHOD_FSYNC, false},
                                174                 :                : #ifdef HAVE_FSYNC_WRITETHROUGH
                                175                 :                :     {"fsync_writethrough", WAL_SYNC_METHOD_FSYNC_WRITETHROUGH, false},
                                176                 :                : #endif
                                177                 :                :     {"fdatasync", WAL_SYNC_METHOD_FDATASYNC, false},
                                178                 :                : #ifdef O_SYNC
                                179                 :                :     {"open_sync", WAL_SYNC_METHOD_OPEN, false},
                                180                 :                : #endif
                                181                 :                : #ifdef O_DSYNC
                                182                 :                :     {"open_datasync", WAL_SYNC_METHOD_OPEN_DSYNC, false},
                                183                 :                : #endif
                                184                 :                :     {NULL, 0, false}
                                185                 :                : };
                                186                 :                : 
                                187                 :                : 
                                188                 :                : /*
                                189                 :                :  * Although only "on", "off", and "always" are documented,
                                190                 :                :  * we accept all the likely variants of "on" and "off".
                                191                 :                :  */
                                192                 :                : const struct config_enum_entry archive_mode_options[] = {
                                193                 :                :     {"always", ARCHIVE_MODE_ALWAYS, false},
                                194                 :                :     {"on", ARCHIVE_MODE_ON, false},
                                195                 :                :     {"off", ARCHIVE_MODE_OFF, false},
                                196                 :                :     {"true", ARCHIVE_MODE_ON, true},
                                197                 :                :     {"false", ARCHIVE_MODE_OFF, true},
                                198                 :                :     {"yes", ARCHIVE_MODE_ON, true},
                                199                 :                :     {"no", ARCHIVE_MODE_OFF, true},
                                200                 :                :     {"1", ARCHIVE_MODE_ON, true},
                                201                 :                :     {"0", ARCHIVE_MODE_OFF, true},
                                202                 :                :     {NULL, 0, false}
                                203                 :                : };
                                204                 :                : 
                                205                 :                : /*
                                206                 :                :  * Statistics for current checkpoint are collected in this global struct.
                                207                 :                :  * Because only the checkpointer or a stand-alone backend can perform
                                208                 :                :  * checkpoints, this will be unused in normal backends.
                                209                 :                :  */
                                210                 :                : CheckpointStatsData CheckpointStats;
                                211                 :                : 
                                212                 :                : /*
                                213                 :                :  * During recovery, lastFullPageWrites keeps track of full_page_writes that
                                214                 :                :  * the replayed WAL records indicate. It's initialized with full_page_writes
                                215                 :                :  * that the recovery starting checkpoint record indicates, and then updated
                                216                 :                :  * each time XLOG_FPW_CHANGE record is replayed.
                                217                 :                :  */
                                218                 :                : static bool lastFullPageWrites;
                                219                 :                : 
                                220                 :                : /*
                                221                 :                :  * Local copy of the state tracked by SharedRecoveryState in shared memory,
                                222                 :                :  * It is false if SharedRecoveryState is RECOVERY_STATE_DONE.  True actually
                                223                 :                :  * means "not known, need to check the shared state".
                                224                 :                :  */
                                225                 :                : static bool LocalRecoveryInProgress = true;
                                226                 :                : 
                                227                 :                : /*
                                228                 :                :  * Local state for XLogInsertAllowed():
                                229                 :                :  *      1: unconditionally allowed to insert XLOG
                                230                 :                :  *      0: unconditionally not allowed to insert XLOG
                                231                 :                :  *      -1: must check RecoveryInProgress(); disallow until it is false
                                232                 :                :  * Most processes start with -1 and transition to 1 after seeing that recovery
                                233                 :                :  * is not in progress.  But we can also force the value for special cases.
                                234                 :                :  * The coding in XLogInsertAllowed() depends on the first two of these states
                                235                 :                :  * being numerically the same as bool true and false.
                                236                 :                :  */
                                237                 :                : static int  LocalXLogInsertAllowed = -1;
                                238                 :                : 
                                239                 :                : /*
                                240                 :                :  * ProcLastRecPtr points to the start of the last XLOG record inserted by the
                                241                 :                :  * current backend.  It is updated for all inserts.  XactLastRecEnd points to
                                242                 :                :  * end+1 of the last record, and is reset when we end a top-level transaction,
                                243                 :                :  * or start a new one; so it can be used to tell if the current transaction has
                                244                 :                :  * created any XLOG records.
                                245                 :                :  *
                                246                 :                :  * While in parallel mode, this may not be fully up to date.  When committing,
                                247                 :                :  * a transaction can assume this covers all xlog records written either by the
                                248                 :                :  * user backend or by any parallel worker which was present at any point during
                                249                 :                :  * the transaction.  But when aborting, or when still in parallel mode, other
                                250                 :                :  * parallel backends may have written WAL records at later LSNs than the value
                                251                 :                :  * stored here.  The parallel leader advances its own copy, when necessary,
                                252                 :                :  * in WaitForParallelWorkersToFinish.
                                253                 :                :  */
                                254                 :                : XLogRecPtr  ProcLastRecPtr = InvalidXLogRecPtr;
                                255                 :                : XLogRecPtr  XactLastRecEnd = InvalidXLogRecPtr;
                                256                 :                : XLogRecPtr  XactLastCommitEnd = InvalidXLogRecPtr;
                                257                 :                : 
                                258                 :                : /*
                                259                 :                :  * RedoRecPtr is this backend's local copy of the REDO record pointer
                                260                 :                :  * (which is almost but not quite the same as a pointer to the most recent
                                261                 :                :  * CHECKPOINT record).  We update this from the shared-memory copy,
                                262                 :                :  * XLogCtl->Insert.RedoRecPtr, whenever we can safely do so (ie, when we
                                263                 :                :  * hold an insertion lock).  See XLogInsertRecord for details.  We are also
                                264                 :                :  * allowed to update from XLogCtl->RedoRecPtr if we hold the info_lck;
                                265                 :                :  * see GetRedoRecPtr.
                                266                 :                :  *
                                267                 :                :  * NB: Code that uses this variable must be prepared not only for the
                                268                 :                :  * possibility that it may be arbitrarily out of date, but also for the
                                269                 :                :  * possibility that it might be set to InvalidXLogRecPtr. We used to
                                270                 :                :  * initialize it as a side effect of the first call to RecoveryInProgress(),
                                271                 :                :  * which meant that most code that might use it could assume that it had a
                                272                 :                :  * real if perhaps stale value. That's no longer the case.
                                273                 :                :  */
                                274                 :                : static XLogRecPtr RedoRecPtr;
                                275                 :                : 
                                276                 :                : /*
                                277                 :                :  * doPageWrites is this backend's local copy of (fullPageWrites ||
                                278                 :                :  * runningBackups > 0).  It is used together with RedoRecPtr to decide whether
                                279                 :                :  * a full-page image of a page need to be taken.
                                280                 :                :  *
                                281                 :                :  * NB: Initially this is false, and there's no guarantee that it will be
                                282                 :                :  * initialized to any other value before it is first used. Any code that
                                283                 :                :  * makes use of it must recheck the value after obtaining a WALInsertLock,
                                284                 :                :  * and respond appropriately if it turns out that the previous value wasn't
                                285                 :                :  * accurate.
                                286                 :                :  */
                                287                 :                : static bool doPageWrites;
                                288                 :                : 
                                289                 :                : /*----------
                                290                 :                :  * Shared-memory data structures for XLOG control
                                291                 :                :  *
                                292                 :                :  * LogwrtRqst indicates a byte position that we need to write and/or fsync
                                293                 :                :  * the log up to (all records before that point must be written or fsynced).
                                294                 :                :  * The positions already written/fsynced are maintained in logWriteResult
                                295                 :                :  * and logFlushResult using atomic access.
                                296                 :                :  * In addition to the shared variable, each backend has a private copy of
                                297                 :                :  * both in LogwrtResult, which is updated when convenient.
                                298                 :                :  *
                                299                 :                :  * The request bookkeeping is simpler: there is a shared XLogCtl->LogwrtRqst
                                300                 :                :  * (protected by info_lck), but we don't need to cache any copies of it.
                                301                 :                :  *
                                302                 :                :  * info_lck is only held long enough to read/update the protected variables,
                                303                 :                :  * so it's a plain spinlock.  The other locks are held longer (potentially
                                304                 :                :  * over I/O operations), so we use LWLocks for them.  These locks are:
                                305                 :                :  *
                                306                 :                :  * WALBufMappingLock: must be held to replace a page in the WAL buffer cache.
                                307                 :                :  * It is only held while initializing and changing the mapping.  If the
                                308                 :                :  * contents of the buffer being replaced haven't been written yet, the mapping
                                309                 :                :  * lock is released while the write is done, and reacquired afterwards.
                                310                 :                :  *
                                311                 :                :  * WALWriteLock: must be held to write WAL buffers to disk (XLogWrite or
                                312                 :                :  * XLogFlush).
                                313                 :                :  *
                                314                 :                :  * ControlFileLock: must be held to read/update control file or create
                                315                 :                :  * new log file.
                                316                 :                :  *
                                317                 :                :  *----------
                                318                 :                :  */
                                319                 :                : 
                                320                 :                : typedef struct XLogwrtRqst
                                321                 :                : {
                                322                 :                :     XLogRecPtr  Write;          /* last byte + 1 to write out */
                                323                 :                :     XLogRecPtr  Flush;          /* last byte + 1 to flush */
                                324                 :                : } XLogwrtRqst;
                                325                 :                : 
                                326                 :                : typedef struct XLogwrtResult
                                327                 :                : {
                                328                 :                :     XLogRecPtr  Write;          /* last byte + 1 written out */
                                329                 :                :     XLogRecPtr  Flush;          /* last byte + 1 flushed */
                                330                 :                : } XLogwrtResult;
                                331                 :                : 
                                332                 :                : /*
                                333                 :                :  * Inserting to WAL is protected by a small fixed number of WAL insertion
                                334                 :                :  * locks. To insert to the WAL, you must hold one of the locks - it doesn't
                                335                 :                :  * matter which one. To lock out other concurrent insertions, you must hold
                                336                 :                :  * of them. Each WAL insertion lock consists of a lightweight lock, plus an
                                337                 :                :  * indicator of how far the insertion has progressed (insertingAt).
                                338                 :                :  *
                                339                 :                :  * The insertingAt values are read when a process wants to flush WAL from
                                340                 :                :  * the in-memory buffers to disk, to check that all the insertions to the
                                341                 :                :  * region the process is about to write out have finished. You could simply
                                342                 :                :  * wait for all currently in-progress insertions to finish, but the
                                343                 :                :  * insertingAt indicator allows you to ignore insertions to later in the WAL,
                                344                 :                :  * so that you only wait for the insertions that are modifying the buffers
                                345                 :                :  * you're about to write out.
                                346                 :                :  *
                                347                 :                :  * This isn't just an optimization. If all the WAL buffers are dirty, an
                                348                 :                :  * inserter that's holding a WAL insert lock might need to evict an old WAL
                                349                 :                :  * buffer, which requires flushing the WAL. If it's possible for an inserter
                                350                 :                :  * to block on another inserter unnecessarily, deadlock can arise when two
                                351                 :                :  * inserters holding a WAL insert lock wait for each other to finish their
                                352                 :                :  * insertion.
                                353                 :                :  *
                                354                 :                :  * Small WAL records that don't cross a page boundary never update the value,
                                355                 :                :  * the WAL record is just copied to the page and the lock is released. But
                                356                 :                :  * to avoid the deadlock-scenario explained above, the indicator is always
                                357                 :                :  * updated before sleeping while holding an insertion lock.
                                358                 :                :  *
                                359                 :                :  * lastImportantAt contains the LSN of the last important WAL record inserted
                                360                 :                :  * using a given lock. This value is used to detect if there has been
                                361                 :                :  * important WAL activity since the last time some action, like a checkpoint,
                                362                 :                :  * was performed - allowing to not repeat the action if not. The LSN is
                                363                 :                :  * updated for all insertions, unless the XLOG_MARK_UNIMPORTANT flag was
                                364                 :                :  * set. lastImportantAt is never cleared, only overwritten by the LSN of newer
                                365                 :                :  * records.  Tracking the WAL activity directly in WALInsertLock has the
                                366                 :                :  * advantage of not needing any additional locks to update the value.
                                367                 :                :  */
                                368                 :                : typedef struct
                                369                 :                : {
                                370                 :                :     LWLock      lock;
                                371                 :                :     pg_atomic_uint64 insertingAt;
                                372                 :                :     XLogRecPtr  lastImportantAt;
                                373                 :                : } WALInsertLock;
                                374                 :                : 
                                375                 :                : /*
                                376                 :                :  * All the WAL insertion locks are allocated as an array in shared memory. We
                                377                 :                :  * force the array stride to be a power of 2, which saves a few cycles in
                                378                 :                :  * indexing, but more importantly also ensures that individual slots don't
                                379                 :                :  * cross cache line boundaries. (Of course, we have to also ensure that the
                                380                 :                :  * array start address is suitably aligned.)
                                381                 :                :  */
                                382                 :                : typedef union WALInsertLockPadded
                                383                 :                : {
                                384                 :                :     WALInsertLock l;
                                385                 :                :     char        pad[PG_CACHE_LINE_SIZE];
                                386                 :                : } WALInsertLockPadded;
                                387                 :                : 
                                388                 :                : /*
                                389                 :                :  * Session status of running backup, used for sanity checks in SQL-callable
                                390                 :                :  * functions to start and stop backups.
                                391                 :                :  */
                                392                 :                : static SessionBackupState sessionBackupState = SESSION_BACKUP_NONE;
                                393                 :                : 
                                394                 :                : /*
                                395                 :                :  * Shared state data for WAL insertion.
                                396                 :                :  */
                                397                 :                : typedef struct XLogCtlInsert
                                398                 :                : {
                                399                 :                :     slock_t     insertpos_lck;  /* protects CurrBytePos and PrevBytePos */
                                400                 :                : 
                                401                 :                :     /*
                                402                 :                :      * CurrBytePos is the end of reserved WAL. The next record will be
                                403                 :                :      * inserted at that position. PrevBytePos is the start position of the
                                404                 :                :      * previously inserted (or rather, reserved) record - it is copied to the
                                405                 :                :      * prev-link of the next record. These are stored as "usable byte
                                406                 :                :      * positions" rather than XLogRecPtrs (see XLogBytePosToRecPtr()).
                                407                 :                :      */
                                408                 :                :     uint64      CurrBytePos;
                                409                 :                :     uint64      PrevBytePos;
                                410                 :                : 
                                411                 :                :     /*
                                412                 :                :      * Make sure the above heavily-contended spinlock and byte positions are
                                413                 :                :      * on their own cache line. In particular, the RedoRecPtr and full page
                                414                 :                :      * write variables below should be on a different cache line. They are
                                415                 :                :      * read on every WAL insertion, but updated rarely, and we don't want
                                416                 :                :      * those reads to steal the cache line containing Curr/PrevBytePos.
                                417                 :                :      */
                                418                 :                :     char        pad[PG_CACHE_LINE_SIZE];
                                419                 :                : 
                                420                 :                :     /*
                                421                 :                :      * fullPageWrites is the authoritative value used by all backends to
                                422                 :                :      * determine whether to write full-page image to WAL. This shared value,
                                423                 :                :      * instead of the process-local fullPageWrites, is required because, when
                                424                 :                :      * full_page_writes is changed by SIGHUP, we must WAL-log it before it
                                425                 :                :      * actually affects WAL-logging by backends.  Checkpointer sets at startup
                                426                 :                :      * or after SIGHUP.
                                427                 :                :      *
                                428                 :                :      * To read these fields, you must hold an insertion lock. To modify them,
                                429                 :                :      * you must hold ALL the locks.
                                430                 :                :      */
                                431                 :                :     XLogRecPtr  RedoRecPtr;     /* current redo point for insertions */
                                432                 :                :     bool        fullPageWrites;
                                433                 :                : 
                                434                 :                :     /*
                                435                 :                :      * runningBackups is a counter indicating the number of backups currently
                                436                 :                :      * in progress. lastBackupStart is the latest checkpoint redo location
                                437                 :                :      * used as a starting point for an online backup.
                                438                 :                :      */
                                439                 :                :     int         runningBackups;
                                440                 :                :     XLogRecPtr  lastBackupStart;
                                441                 :                : 
                                442                 :                :     /*
                                443                 :                :      * WAL insertion locks.
                                444                 :                :      */
                                445                 :                :     WALInsertLockPadded *WALInsertLocks;
                                446                 :                : } XLogCtlInsert;
                                447                 :                : 
                                448                 :                : /*
                                449                 :                :  * Total shared-memory state for XLOG.
                                450                 :                :  */
                                451                 :                : typedef struct XLogCtlData
                                452                 :                : {
                                453                 :                :     XLogCtlInsert Insert;
                                454                 :                : 
                                455                 :                :     /* Protected by info_lck: */
                                456                 :                :     XLogwrtRqst LogwrtRqst;
                                457                 :                :     XLogRecPtr  RedoRecPtr;     /* a recent copy of Insert->RedoRecPtr */
                                458                 :                :     XLogRecPtr  asyncXactLSN;   /* LSN of newest async commit/abort */
                                459                 :                :     XLogRecPtr  replicationSlotMinLSN;  /* oldest LSN needed by any slot */
                                460                 :                : 
                                461                 :                :     XLogSegNo   lastRemovedSegNo;   /* latest removed/recycled XLOG segment */
                                462                 :                : 
                                463                 :                :     /* Fake LSN counter, for unlogged relations. */
                                464                 :                :     pg_atomic_uint64 unloggedLSN;
                                465                 :                : 
                                466                 :                :     /* Time and LSN of last xlog segment switch. Protected by WALWriteLock. */
                                467                 :                :     pg_time_t   lastSegSwitchTime;
                                468                 :                :     XLogRecPtr  lastSegSwitchLSN;
                                469                 :                : 
                                470                 :                :     /* These are accessed using atomics -- info_lck not needed */
                                471                 :                :     pg_atomic_uint64 logInsertResult;   /* last byte + 1 inserted to buffers */
                                472                 :                :     pg_atomic_uint64 logWriteResult;    /* last byte + 1 written out */
                                473                 :                :     pg_atomic_uint64 logFlushResult;    /* last byte + 1 flushed */
                                474                 :                : 
                                475                 :                :     /*
                                476                 :                :      * Latest initialized page in the cache (last byte position + 1).
                                477                 :                :      *
                                478                 :                :      * To change the identity of a buffer (and InitializedUpTo), you need to
                                479                 :                :      * hold WALBufMappingLock.  To change the identity of a buffer that's
                                480                 :                :      * still dirty, the old page needs to be written out first, and for that
                                481                 :                :      * you need WALWriteLock, and you need to ensure that there are no
                                482                 :                :      * in-progress insertions to the page by calling
                                483                 :                :      * WaitXLogInsertionsToFinish().
                                484                 :                :      */
                                485                 :                :     XLogRecPtr  InitializedUpTo;
                                486                 :                : 
                                487                 :                :     /*
                                488                 :                :      * These values do not change after startup, although the pointed-to pages
                                489                 :                :      * and xlblocks values certainly do.  xlblocks values are protected by
                                490                 :                :      * WALBufMappingLock.
                                491                 :                :      */
                                492                 :                :     char       *pages;          /* buffers for unwritten XLOG pages */
                                493                 :                :     pg_atomic_uint64 *xlblocks; /* 1st byte ptr-s + XLOG_BLCKSZ */
                                494                 :                :     int         XLogCacheBlck;  /* highest allocated xlog buffer index */
                                495                 :                : 
                                496                 :                :     /*
                                497                 :                :      * InsertTimeLineID is the timeline into which new WAL is being inserted
                                498                 :                :      * and flushed. It is zero during recovery, and does not change once set.
                                499                 :                :      *
                                500                 :                :      * If we create a new timeline when the system was started up,
                                501                 :                :      * PrevTimeLineID is the old timeline's ID that we forked off from.
                                502                 :                :      * Otherwise it's equal to InsertTimeLineID.
                                503                 :                :      *
                                504                 :                :      * We set these fields while holding info_lck. Most that reads these
                                505                 :                :      * values knows that recovery is no longer in progress and so can safely
                                506                 :                :      * read the value without a lock, but code that could be run either during
                                507                 :                :      * or after recovery can take info_lck while reading these values.
                                508                 :                :      */
                                509                 :                :     TimeLineID  InsertTimeLineID;
                                510                 :                :     TimeLineID  PrevTimeLineID;
                                511                 :                : 
                                512                 :                :     /*
                                513                 :                :      * SharedRecoveryState indicates if we're still in crash or archive
                                514                 :                :      * recovery.  Protected by info_lck.
                                515                 :                :      */
                                516                 :                :     RecoveryState SharedRecoveryState;
                                517                 :                : 
                                518                 :                :     /*
                                519                 :                :      * InstallXLogFileSegmentActive indicates whether the checkpointer should
                                520                 :                :      * arrange for future segments by recycling and/or PreallocXlogFiles().
                                521                 :                :      * Protected by ControlFileLock.  Only the startup process changes it.  If
                                522                 :                :      * true, anyone can use InstallXLogFileSegment().  If false, the startup
                                523                 :                :      * process owns the exclusive right to install segments, by reading from
                                524                 :                :      * the archive and possibly replacing existing files.
                                525                 :                :      */
                                526                 :                :     bool        InstallXLogFileSegmentActive;
                                527                 :                : 
                                528                 :                :     /*
                                529                 :                :      * WalWriterSleeping indicates whether the WAL writer is currently in
                                530                 :                :      * low-power mode (and hence should be nudged if an async commit occurs).
                                531                 :                :      * Protected by info_lck.
                                532                 :                :      */
                                533                 :                :     bool        WalWriterSleeping;
                                534                 :                : 
                                535                 :                :     /*
                                536                 :                :      * During recovery, we keep a copy of the latest checkpoint record here.
                                537                 :                :      * lastCheckPointRecPtr points to start of checkpoint record and
                                538                 :                :      * lastCheckPointEndPtr points to end+1 of checkpoint record.  Used by the
                                539                 :                :      * checkpointer when it wants to create a restartpoint.
                                540                 :                :      *
                                541                 :                :      * Protected by info_lck.
                                542                 :                :      */
                                543                 :                :     XLogRecPtr  lastCheckPointRecPtr;
                                544                 :                :     XLogRecPtr  lastCheckPointEndPtr;
                                545                 :                :     CheckPoint  lastCheckPoint;
                                546                 :                : 
                                547                 :                :     /*
                                548                 :                :      * lastFpwDisableRecPtr points to the start of the last replayed
                                549                 :                :      * XLOG_FPW_CHANGE record that instructs full_page_writes is disabled.
                                550                 :                :      */
                                551                 :                :     XLogRecPtr  lastFpwDisableRecPtr;
                                552                 :                : 
                                553                 :                :     slock_t     info_lck;       /* locks shared variables shown above */
                                554                 :                : } XLogCtlData;
                                555                 :                : 
                                556                 :                : /*
                                557                 :                :  * Classification of XLogInsertRecord operations.
                                558                 :                :  */
                                559                 :                : typedef enum
                                560                 :                : {
                                561                 :                :     WALINSERT_NORMAL,
                                562                 :                :     WALINSERT_SPECIAL_SWITCH,
                                563                 :                :     WALINSERT_SPECIAL_CHECKPOINT
                                564                 :                : } WalInsertClass;
                                565                 :                : 
                                566                 :                : static XLogCtlData *XLogCtl = NULL;
                                567                 :                : 
                                568                 :                : /* a private copy of XLogCtl->Insert.WALInsertLocks, for convenience */
                                569                 :                : static WALInsertLockPadded *WALInsertLocks = NULL;
                                570                 :                : 
                                571                 :                : /*
                                572                 :                :  * We maintain an image of pg_control in shared memory.
                                573                 :                :  */
                                574                 :                : static ControlFileData *ControlFile = NULL;
                                575                 :                : 
                                576                 :                : /*
                                577                 :                :  * Calculate the amount of space left on the page after 'endptr'. Beware
                                578                 :                :  * multiple evaluation!
                                579                 :                :  */
                                580                 :                : #define INSERT_FREESPACE(endptr)    \
                                581                 :                :     (((endptr) % XLOG_BLCKSZ == 0) ? 0 : (XLOG_BLCKSZ - (endptr) % XLOG_BLCKSZ))
                                582                 :                : 
                                583                 :                : /* Macro to advance to next buffer index. */
                                584                 :                : #define NextBufIdx(idx)     \
                                585                 :                :         (((idx) == XLogCtl->XLogCacheBlck) ? 0 : ((idx) + 1))
                                586                 :                : 
                                587                 :                : /*
                                588                 :                :  * XLogRecPtrToBufIdx returns the index of the WAL buffer that holds, or
                                589                 :                :  * would hold if it was in cache, the page containing 'recptr'.
                                590                 :                :  */
                                591                 :                : #define XLogRecPtrToBufIdx(recptr)  \
                                592                 :                :     (((recptr) / XLOG_BLCKSZ) % (XLogCtl->XLogCacheBlck + 1))
                                593                 :                : 
                                594                 :                : /*
                                595                 :                :  * These are the number of bytes in a WAL page usable for WAL data.
                                596                 :                :  */
                                597                 :                : #define UsableBytesInPage (XLOG_BLCKSZ - SizeOfXLogShortPHD)
                                598                 :                : 
                                599                 :                : /*
                                600                 :                :  * Convert values of GUCs measured in megabytes to equiv. segment count.
                                601                 :                :  * Rounds down.
                                602                 :                :  */
                                603                 :                : #define ConvertToXSegs(x, segsize)  XLogMBVarToSegs((x), (segsize))
                                604                 :                : 
                                605                 :                : /* The number of bytes in a WAL segment usable for WAL data. */
                                606                 :                : static int  UsableBytesInSegment;
                                607                 :                : 
                                608                 :                : /*
                                609                 :                :  * Private, possibly out-of-date copy of shared LogwrtResult.
                                610                 :                :  * See discussion above.
                                611                 :                :  */
                                612                 :                : static XLogwrtResult LogwrtResult = {0, 0};
                                613                 :                : 
                                614                 :                : /*
                                615                 :                :  * Update local copy of shared XLogCtl->log{Write,Flush}Result
                                616                 :                :  *
                                617                 :                :  * It's critical that Flush always trails Write, so the order of the reads is
                                618                 :                :  * important, as is the barrier.  See also XLogWrite.
                                619                 :                :  */
                                620                 :                : #define RefreshXLogWriteResult(_target) \
                                621                 :                :     do { \
                                622                 :                :         _target.Flush = pg_atomic_read_u64(&XLogCtl->logFlushResult); \
                                623                 :                :         pg_read_barrier(); \
                                624                 :                :         _target.Write = pg_atomic_read_u64(&XLogCtl->logWriteResult); \
                                625                 :                :     } while (0)
                                626                 :                : 
                                627                 :                : /*
                                628                 :                :  * openLogFile is -1 or a kernel FD for an open log file segment.
                                629                 :                :  * openLogSegNo identifies the segment, and openLogTLI the corresponding TLI.
                                630                 :                :  * These variables are only used to write the XLOG, and so will normally refer
                                631                 :                :  * to the active segment.
                                632                 :                :  *
                                633                 :                :  * Note: call Reserve/ReleaseExternalFD to track consumption of this FD.
                                634                 :                :  */
                                635                 :                : static int  openLogFile = -1;
                                636                 :                : static XLogSegNo openLogSegNo = 0;
                                637                 :                : static TimeLineID openLogTLI = 0;
                                638                 :                : 
                                639                 :                : /*
                                640                 :                :  * Local copies of equivalent fields in the control file.  When running
                                641                 :                :  * crash recovery, LocalMinRecoveryPoint is set to InvalidXLogRecPtr as we
                                642                 :                :  * expect to replay all the WAL available, and updateMinRecoveryPoint is
                                643                 :                :  * switched to false to prevent any updates while replaying records.
                                644                 :                :  * Those values are kept consistent as long as crash recovery runs.
                                645                 :                :  */
                                646                 :                : static XLogRecPtr LocalMinRecoveryPoint;
                                647                 :                : static TimeLineID LocalMinRecoveryPointTLI;
                                648                 :                : static bool updateMinRecoveryPoint = true;
                                649                 :                : 
                                650                 :                : /* For WALInsertLockAcquire/Release functions */
                                651                 :                : static int  MyLockNo = 0;
                                652                 :                : static bool holdingAllLocks = false;
                                653                 :                : 
                                654                 :                : #ifdef WAL_DEBUG
                                655                 :                : static MemoryContext walDebugCxt = NULL;
                                656                 :                : #endif
                                657                 :                : 
                                658                 :                : static void CleanupAfterArchiveRecovery(TimeLineID EndOfLogTLI,
                                659                 :                :                                         XLogRecPtr EndOfLog,
                                660                 :                :                                         TimeLineID newTLI);
                                661                 :                : static void CheckRequiredParameterValues(void);
                                662                 :                : static void XLogReportParameters(void);
                                663                 :                : static int  LocalSetXLogInsertAllowed(void);
                                664                 :                : static void CreateEndOfRecoveryRecord(void);
                                665                 :                : static XLogRecPtr CreateOverwriteContrecordRecord(XLogRecPtr aborted_lsn,
                                666                 :                :                                                   XLogRecPtr pagePtr,
                                667                 :                :                                                   TimeLineID newTLI);
                                668                 :                : static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);
                                669                 :                : static void KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo);
                                670                 :                : static XLogRecPtr XLogGetReplicationSlotMinimumLSN(void);
                                671                 :                : 
                                672                 :                : static void AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli,
                                673                 :                :                                   bool opportunistic);
                                674                 :                : static void XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible);
                                675                 :                : static bool InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
                                676                 :                :                                    bool find_free, XLogSegNo max_segno,
                                677                 :                :                                    TimeLineID tli);
                                678                 :                : static void XLogFileClose(void);
                                679                 :                : static void PreallocXlogFiles(XLogRecPtr endptr, TimeLineID tli);
                                680                 :                : static void RemoveTempXlogFiles(void);
                                681                 :                : static void RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr lastredoptr,
                                682                 :                :                                XLogRecPtr endptr, TimeLineID insertTLI);
                                683                 :                : static void RemoveXlogFile(const struct dirent *segment_de,
                                684                 :                :                            XLogSegNo recycleSegNo, XLogSegNo *endlogSegNo,
                                685                 :                :                            TimeLineID insertTLI);
                                686                 :                : static void UpdateLastRemovedPtr(char *filename);
                                687                 :                : static void ValidateXLOGDirectoryStructure(void);
                                688                 :                : static void CleanupBackupHistory(void);
                                689                 :                : static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force);
                                690                 :                : static bool PerformRecoveryXLogAction(void);
                                691                 :                : static void InitControlFile(uint64 sysidentifier, uint32 data_checksum_version);
                                692                 :                : static void WriteControlFile(void);
                                693                 :                : static void ReadControlFile(void);
                                694                 :                : static void UpdateControlFile(void);
                                695                 :                : static char *str_time(pg_time_t tnow, char *buf, size_t bufsize);
                                696                 :                : 
                                697                 :                : static int  get_sync_bit(int method);
                                698                 :                : 
                                699                 :                : static void CopyXLogRecordToWAL(int write_len, bool isLogSwitch,
                                700                 :                :                                 XLogRecData *rdata,
                                701                 :                :                                 XLogRecPtr StartPos, XLogRecPtr EndPos,
                                702                 :                :                                 TimeLineID tli);
                                703                 :                : static void ReserveXLogInsertLocation(int size, XLogRecPtr *StartPos,
                                704                 :                :                                       XLogRecPtr *EndPos, XLogRecPtr *PrevPtr);
                                705                 :                : static bool ReserveXLogSwitch(XLogRecPtr *StartPos, XLogRecPtr *EndPos,
                                706                 :                :                               XLogRecPtr *PrevPtr);
                                707                 :                : static XLogRecPtr WaitXLogInsertionsToFinish(XLogRecPtr upto);
                                708                 :                : static char *GetXLogBuffer(XLogRecPtr ptr, TimeLineID tli);
                                709                 :                : static XLogRecPtr XLogBytePosToRecPtr(uint64 bytepos);
                                710                 :                : static XLogRecPtr XLogBytePosToEndRecPtr(uint64 bytepos);
                                711                 :                : static uint64 XLogRecPtrToBytePos(XLogRecPtr ptr);
                                712                 :                : 
                                713                 :                : static void WALInsertLockAcquire(void);
                                714                 :                : static void WALInsertLockAcquireExclusive(void);
                                715                 :                : static void WALInsertLockRelease(void);
                                716                 :                : static void WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt);
                                717                 :                : 
                                718                 :                : /*
                                719                 :                :  * Insert an XLOG record represented by an already-constructed chain of data
                                720                 :                :  * chunks.  This is a low-level routine; to construct the WAL record header
                                721                 :                :  * and data, use the higher-level routines in xloginsert.c.
                                722                 :                :  *
                                723                 :                :  * If 'fpw_lsn' is valid, it is the oldest LSN among the pages that this
                                724                 :                :  * WAL record applies to, that were not included in the record as full page
                                725                 :                :  * images.  If fpw_lsn <= RedoRecPtr, the function does not perform the
                                726                 :                :  * insertion and returns InvalidXLogRecPtr.  The caller can then recalculate
                                727                 :                :  * which pages need a full-page image, and retry.  If fpw_lsn is invalid, the
                                728                 :                :  * record is always inserted.
                                729                 :                :  *
                                730                 :                :  * 'flags' gives more in-depth control on the record being inserted. See
                                731                 :                :  * XLogSetRecordFlags() for details.
                                732                 :                :  *
                                733                 :                :  * 'topxid_included' tells whether the top-transaction id is logged along with
                                734                 :                :  * current subtransaction. See XLogRecordAssemble().
                                735                 :                :  *
                                736                 :                :  * The first XLogRecData in the chain must be for the record header, and its
                                737                 :                :  * data must be MAXALIGNed.  XLogInsertRecord fills in the xl_prev and
                                738                 :                :  * xl_crc fields in the header, the rest of the header must already be filled
                                739                 :                :  * by the caller.
                                740                 :                :  *
                                741                 :                :  * Returns XLOG pointer to end of record (beginning of next record).
                                742                 :                :  * This can be used as LSN for data pages affected by the logged action.
                                743                 :                :  * (LSN is the XLOG point up to which the XLOG must be flushed to disk
                                744                 :                :  * before the data page can be written out.  This implements the basic
                                745                 :                :  * WAL rule "write the log before the data".)
                                746                 :                :  */
                                747                 :                : XLogRecPtr
 3180 andres@anarazel.de        748                 :CBC    13854345 : XLogInsertRecord(XLogRecData *rdata,
                                749                 :                :                  XLogRecPtr fpw_lsn,
                                750                 :                :                  uint8 flags,
                                751                 :                :                  int num_fpi,
                                752                 :                :                  bool topxid_included)
                                753                 :                : {
 8934 bruce@momjian.us          754                 :       13854345 :     XLogCtlInsert *Insert = &XLogCtl->Insert;
                                755                 :                :     pg_crc32c   rdata_crc;
                                756                 :                :     bool        inserted;
 3957 heikki.linnakangas@i      757                 :       13854345 :     XLogRecord *rechdr = (XLogRecord *) rdata->data;
 3228 tgl@sss.pgh.pa.us         758                 :       13854345 :     uint8       info = rechdr->xl_info & ~XLR_INFO_MASK;
  688 rhaas@postgresql.org      759                 :       13854345 :     WalInsertClass class = WALINSERT_NORMAL;
                                760                 :                :     XLogRecPtr  StartPos;
                                761                 :                :     XLogRecPtr  EndPos;
 2550 akapila@postgresql.o      762                 :       13854345 :     bool        prevDoPageWrites = doPageWrites;
                                763                 :                :     TimeLineID  insertTLI;
                                764                 :                : 
                                765                 :                :     /* Does this record type require special handling? */
  688 rhaas@postgresql.org      766         [ +  + ]:       13854345 :     if (unlikely(rechdr->xl_rmid == RM_XLOG_ID))
                                767                 :                :     {
                                768         [ +  + ]:         214671 :         if (info == XLOG_SWITCH)
                                769                 :            707 :             class = WALINSERT_SPECIAL_SWITCH;
                                770         [ +  + ]:         213964 :         else if (info == XLOG_CHECKPOINT_REDO)
                                771                 :            896 :             class = WALINSERT_SPECIAL_CHECKPOINT;
                                772                 :                :     }
                                773                 :                : 
                                774                 :                :     /* we assume that all of the record header is in the first chunk */
 3943 heikki.linnakangas@i      775         [ -  + ]:       13854345 :     Assert(rdata->len >= SizeOfXLogRecord);
                                776                 :                : 
                                777                 :                :     /* cross-check on whether we should be here or not */
 5916 tgl@sss.pgh.pa.us         778         [ -  + ]:       13854345 :     if (!XLogInsertAllowed())
 5916 tgl@sss.pgh.pa.us         779         [ #  # ]:UBC           0 :         elog(ERROR, "cannot make new WAL entries during recovery");
                                780                 :                : 
                                781                 :                :     /*
                                782                 :                :      * Given that we're not in recovery, InsertTimeLineID is set and can't
                                783                 :                :      * change, so we can read it without a lock.
                                784                 :                :      */
 1396 rhaas@postgresql.org      785                 :CBC    13854345 :     insertTLI = XLogCtl->InsertTimeLineID;
                                786                 :                : 
                                787                 :                :     /*----------
                                788                 :                :      *
                                789                 :                :      * We have now done all the preparatory work we can without holding a
                                790                 :                :      * lock or modifying shared state. From here on, inserting the new WAL
                                791                 :                :      * record to the shared WAL buffer cache is a two-step process:
                                792                 :                :      *
                                793                 :                :      * 1. Reserve the right amount of space from the WAL. The current head of
                                794                 :                :      *    reserved space is kept in Insert->CurrBytePos, and is protected by
                                795                 :                :      *    insertpos_lck.
                                796                 :                :      *
                                797                 :                :      * 2. Copy the record to the reserved WAL space. This involves finding the
                                798                 :                :      *    correct WAL buffer containing the reserved space, and copying the
                                799                 :                :      *    record in place. This can be done concurrently in multiple processes.
                                800                 :                :      *
                                801                 :                :      * To keep track of which insertions are still in-progress, each concurrent
                                802                 :                :      * inserter acquires an insertion lock. In addition to just indicating that
                                803                 :                :      * an insertion is in progress, the lock tells others how far the inserter
                                804                 :                :      * has progressed. There is a small fixed number of insertion locks,
                                805                 :                :      * determined by NUM_XLOGINSERT_LOCKS. When an inserter crosses a page
                                806                 :                :      * boundary, it updates the value stored in the lock to the how far it has
                                807                 :                :      * inserted, to allow the previous buffer to be flushed.
                                808                 :                :      *
                                809                 :                :      * Holding onto an insertion lock also protects RedoRecPtr and
                                810                 :                :      * fullPageWrites from changing until the insertion is finished.
                                811                 :                :      *
                                812                 :                :      * Step 2 can usually be done completely in parallel. If the required WAL
                                813                 :                :      * page is not initialized yet, you have to grab WALBufMappingLock to
                                814                 :                :      * initialize it, but the WAL writer tries to do that ahead of insertions
                                815                 :                :      * to avoid that from happening in the critical path.
                                816                 :                :      *
                                817                 :                :      *----------
                                818                 :                :      */
 4987 heikki.linnakangas@i      819                 :       13854345 :     START_CRIT_SECTION();
                                820                 :                : 
  688 rhaas@postgresql.org      821         [ +  + ]:       13854345 :     if (likely(class == WALINSERT_NORMAL))
                                822                 :                :     {
  697                           823                 :       13852742 :         WALInsertLockAcquire();
                                824                 :                : 
                                825                 :                :         /*
                                826                 :                :          * Check to see if my copy of RedoRecPtr is out of date. If so, may
                                827                 :                :          * have to go back and have the caller recompute everything. This can
                                828                 :                :          * only happen just after a checkpoint, so it's better to be slow in
                                829                 :                :          * this case and fast otherwise.
                                830                 :                :          *
                                831                 :                :          * Also check to see if fullPageWrites was just turned on or there's a
                                832                 :                :          * running backup (which forces full-page writes); if we weren't
                                833                 :                :          * already doing full-page writes then go back and recompute.
                                834                 :                :          *
                                835                 :                :          * If we aren't doing full-page writes then RedoRecPtr doesn't
                                836                 :                :          * actually affect the contents of the XLOG record, so we'll update
                                837                 :                :          * our local copy but not force a recomputation.  (If doPageWrites was
                                838                 :                :          * just turned off, we could recompute the record without full pages,
                                839                 :                :          * but we choose not to bother.)
                                840                 :                :          */
                                841         [ +  + ]:       13852742 :         if (RedoRecPtr != Insert->RedoRecPtr)
                                842                 :                :         {
                                843         [ -  + ]:           6147 :             Assert(RedoRecPtr < Insert->RedoRecPtr);
                                844                 :           6147 :             RedoRecPtr = Insert->RedoRecPtr;
                                845                 :                :         }
                                846   [ +  +  +  + ]:       13852742 :         doPageWrites = (Insert->fullPageWrites || Insert->runningBackups > 0);
                                847                 :                : 
                                848         [ +  + ]:       13852742 :         if (doPageWrites &&
                                849   [ +  +  +  + ]:       13610517 :             (!prevDoPageWrites ||
                                850         [ +  + ]:       12888867 :              (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr)))
                                851                 :                :         {
                                852                 :                :             /*
                                853                 :                :              * Oops, some buffer now needs to be backed up that the caller
                                854                 :                :              * didn't back up.  Start over.
                                855                 :                :              */
                                856                 :           6839 :             WALInsertLockRelease();
                                857         [ -  + ]:           6839 :             END_CRIT_SECTION();
                                858                 :           6839 :             return InvalidXLogRecPtr;
                                859                 :                :         }
                                860                 :                : 
                                861                 :                :         /*
                                862                 :                :          * Reserve space for the record in the WAL. This also sets the xl_prev
                                863                 :                :          * pointer.
                                864                 :                :          */
 3957 heikki.linnakangas@i      865                 :       13845903 :         ReserveXLogInsertLocation(rechdr->xl_tot_len, &StartPos, &EndPos,
                                866                 :                :                                   &rechdr->xl_prev);
                                867                 :                : 
                                868                 :                :         /* Normal records are always inserted. */
 4443                           869                 :       13845903 :         inserted = true;
                                870                 :                :     }
  688 rhaas@postgresql.org      871         [ +  + ]:           1603 :     else if (class == WALINSERT_SPECIAL_SWITCH)
                                872                 :                :     {
                                873                 :                :         /*
                                874                 :                :          * In order to insert an XLOG_SWITCH record, we need to hold all of
                                875                 :                :          * the WAL insertion locks, not just one, so that no one else can
                                876                 :                :          * begin inserting a record until we've figured out how much space
                                877                 :                :          * remains in the current WAL segment and claimed all of it.
                                878                 :                :          *
                                879                 :                :          * Nonetheless, this case is simpler than the normal cases handled
                                880                 :                :          * below, which must check for changes in doPageWrites and RedoRecPtr.
                                881                 :                :          * Those checks are only needed for records that can contain buffer
                                882                 :                :          * references, and an XLOG_SWITCH record never does.
                                883                 :                :          */
  697                           884         [ -  + ]:            707 :         Assert(fpw_lsn == InvalidXLogRecPtr);
                                885                 :            707 :         WALInsertLockAcquireExclusive();
                                886                 :            707 :         inserted = ReserveXLogSwitch(&StartPos, &EndPos, &rechdr->xl_prev);
                                887                 :                :     }
                                888                 :                :     else
                                889                 :                :     {
  688                           890         [ -  + ]:            896 :         Assert(class == WALINSERT_SPECIAL_CHECKPOINT);
                                891                 :                : 
                                892                 :                :         /*
                                893                 :                :          * We need to update both the local and shared copies of RedoRecPtr,
                                894                 :                :          * which means that we need to hold all the WAL insertion locks.
                                895                 :                :          * However, there can't be any buffer references, so as above, we need
                                896                 :                :          * not check RedoRecPtr before inserting the record; we just need to
                                897                 :                :          * update it afterwards.
                                898                 :                :          */
                                899         [ -  + ]:            896 :         Assert(fpw_lsn == InvalidXLogRecPtr);
                                900                 :            896 :         WALInsertLockAcquireExclusive();
                                901                 :            896 :         ReserveXLogInsertLocation(rechdr->xl_tot_len, &StartPos, &EndPos,
                                902                 :                :                                   &rechdr->xl_prev);
                                903                 :            896 :         RedoRecPtr = Insert->RedoRecPtr = StartPos;
                                904                 :            896 :         inserted = true;
                                905                 :                :     }
                                906                 :                : 
 4443 heikki.linnakangas@i      907         [ +  + ]:       13847506 :     if (inserted)
                                908                 :                :     {
                                909                 :                :         /*
                                910                 :                :          * Now that xl_prev has been filled in, calculate CRC of the record
                                911                 :                :          * header.
                                912                 :                :          */
 3943                           913                 :       13847439 :         rdata_crc = rechdr->xl_crc;
                                914                 :       13847439 :         COMP_CRC32C(rdata_crc, rechdr, offsetof(XLogRecord, xl_crc));
 3959                           915                 :       13847439 :         FIN_CRC32C(rdata_crc);
 4443                           916                 :       13847439 :         rechdr->xl_crc = rdata_crc;
                                917                 :                : 
                                918                 :                :         /*
                                919                 :                :          * All the record data, including the header, is now ready to be
                                920                 :                :          * inserted. Copy the record in the space reserved.
                                921                 :                :          */
  688 rhaas@postgresql.org      922                 :       13847439 :         CopyXLogRecordToWAL(rechdr->xl_tot_len,
                                923                 :                :                             class == WALINSERT_SPECIAL_SWITCH, rdata,
                                924                 :                :                             StartPos, EndPos, insertTLI);
                                925                 :                : 
                                926                 :                :         /*
                                927                 :                :          * Unless record is flagged as not important, update LSN of last
                                928                 :                :          * important record in the current slot. When holding all locks, just
                                929                 :                :          * update the first one.
                                930                 :                :          */
 3180 andres@anarazel.de        931         [ +  + ]:       13847439 :         if ((flags & XLOG_MARK_UNIMPORTANT) == 0)
                                932                 :                :         {
 3034 bruce@momjian.us          933         [ +  + ]:       13761513 :             int         lockno = holdingAllLocks ? 0 : MyLockNo;
                                934                 :                : 
 3180 andres@anarazel.de        935                 :       13761513 :             WALInsertLocks[lockno].l.lastImportantAt = StartPos;
                                936                 :                :         }
                                937                 :                :     }
                                938                 :                :     else
                                939                 :                :     {
                                940                 :                :         /*
                                941                 :                :          * This was an xlog-switch record, but the current insert location was
                                942                 :                :          * already exactly at the beginning of a segment, so there was no need
                                943                 :                :          * to do anything.
                                944                 :                :          */
                                945                 :                :     }
                                946                 :                : 
                                947                 :                :     /*
                                948                 :                :      * Done! Let others know that we're finished.
                                949                 :                :      */
 4187 heikki.linnakangas@i      950                 :       13847506 :     WALInsertLockRelease();
                                951                 :                : 
 4443                           952         [ -  + ]:       13847506 :     END_CRIT_SECTION();
                                953                 :                : 
 1404 akapila@postgresql.o      954                 :       13847506 :     MarkCurrentTransactionIdLoggedIfAny();
                                955                 :                : 
                                956                 :                :     /*
                                957                 :                :      * Mark top transaction id is logged (if needed) so that we should not try
                                958                 :                :      * to log it again with the next WAL record in the current subtransaction.
                                959                 :                :      */
                                960         [ +  + ]:       13847506 :     if (topxid_included)
                                961                 :            219 :         MarkSubxactTopXidLogged();
                                962                 :                : 
                                963                 :                :     /*
                                964                 :                :      * Update shared LogwrtRqst.Write, if we crossed page boundary.
                                965                 :                :      */
 4443 heikki.linnakangas@i      966         [ +  + ]:       13847506 :     if (StartPos / XLOG_BLCKSZ != EndPos / XLOG_BLCKSZ)
                                967                 :                :     {
 4002 andres@anarazel.de        968         [ +  + ]:        1648538 :         SpinLockAcquire(&XLogCtl->info_lck);
                                969                 :                :         /* advance global request to include new block(s) */
                                970         [ +  + ]:        1648538 :         if (XLogCtl->LogwrtRqst.Write < EndPos)
                                971                 :        1585237 :             XLogCtl->LogwrtRqst.Write = EndPos;
                                972                 :        1648538 :         SpinLockRelease(&XLogCtl->info_lck);
  519 alvherre@alvh.no-ip.      973                 :        1648538 :         RefreshXLogWriteResult(LogwrtResult);
                                974                 :                :     }
                                975                 :                : 
                                976                 :                :     /*
                                977                 :                :      * If this was an XLOG_SWITCH record, flush the record and the empty
                                978                 :                :      * padding space that fills the rest of the segment, and perform
                                979                 :                :      * end-of-segment actions (eg, notifying archiver).
                                980                 :                :      */
  688 rhaas@postgresql.org      981         [ +  + ]:       13847506 :     if (class == WALINSERT_SPECIAL_SWITCH)
                                982                 :                :     {
                                983                 :                :         TRACE_POSTGRESQL_WAL_SWITCH();
 4443 heikki.linnakangas@i      984                 :            707 :         XLogFlush(EndPos);
                                985                 :                : 
                                986                 :                :         /*
                                987                 :                :          * Even though we reserved the rest of the segment for us, which is
                                988                 :                :          * reflected in EndPos, we return a pointer to just the end of the
                                989                 :                :          * xlog-switch record.
                                990                 :                :          */
                                991         [ +  + ]:            707 :         if (inserted)
                                992                 :                :         {
                                993                 :            640 :             EndPos = StartPos + SizeOfXLogRecord;
                                994         [ -  + ]:            640 :             if (StartPos / XLOG_BLCKSZ != EndPos / XLOG_BLCKSZ)
                                995                 :                :             {
 2909 andres@anarazel.de        996                 :UBC           0 :                 uint64      offset = XLogSegmentOffset(EndPos, wal_segment_size);
                                997                 :                : 
                                998         [ #  # ]:              0 :                 if (offset == EndPos % XLOG_BLCKSZ)
 4443 heikki.linnakangas@i      999                 :              0 :                     EndPos += SizeOfXLogLongPHD;
                               1000                 :                :                 else
                               1001                 :              0 :                     EndPos += SizeOfXLogShortPHD;
                               1002                 :                :             }
                               1003                 :                :         }
                               1004                 :                :     }
                               1005                 :                : 
                               1006                 :                : #ifdef WAL_DEBUG
                               1007                 :                :     if (XLOG_DEBUG)
                               1008                 :                :     {
                               1009                 :                :         static XLogReaderState *debug_reader = NULL;
                               1010                 :                :         XLogRecord *record;
                               1011                 :                :         DecodedXLogRecord *decoded;
                               1012                 :                :         StringInfoData buf;
                               1013                 :                :         StringInfoData recordBuf;
                               1014                 :                :         char       *errormsg = NULL;
                               1015                 :                :         MemoryContext oldCxt;
                               1016                 :                : 
                               1017                 :                :         oldCxt = MemoryContextSwitchTo(walDebugCxt);
                               1018                 :                : 
                               1019                 :                :         initStringInfo(&buf);
                               1020                 :                :         appendStringInfo(&buf, "INSERT @ %X/%08X: ", LSN_FORMAT_ARGS(EndPos));
                               1021                 :                : 
                               1022                 :                :         /*
                               1023                 :                :          * We have to piece together the WAL record data from the XLogRecData
                               1024                 :                :          * entries, so that we can pass it to the rm_desc function as one
                               1025                 :                :          * contiguous chunk.
                               1026                 :                :          */
                               1027                 :                :         initStringInfo(&recordBuf);
                               1028                 :                :         for (; rdata != NULL; rdata = rdata->next)
                               1029                 :                :             appendBinaryStringInfo(&recordBuf, rdata->data, rdata->len);
                               1030                 :                : 
                               1031                 :                :         /* We also need temporary space to decode the record. */
                               1032                 :                :         record = (XLogRecord *) recordBuf.data;
                               1033                 :                :         decoded = (DecodedXLogRecord *)
                               1034                 :                :             palloc(DecodeXLogRecordRequiredSpace(record->xl_tot_len));
                               1035                 :                : 
                               1036                 :                :         if (!debug_reader)
                               1037                 :                :             debug_reader = XLogReaderAllocate(wal_segment_size, NULL,
                               1038                 :                :                                               XL_ROUTINE(.page_read = NULL,
                               1039                 :                :                                                          .segment_open = NULL,
                               1040                 :                :                                                          .segment_close = NULL),
                               1041                 :                :                                               NULL);
                               1042                 :                :         if (!debug_reader)
                               1043                 :                :         {
                               1044                 :                :             appendStringInfoString(&buf, "error decoding record: out of memory while allocating a WAL reading processor");
                               1045                 :                :         }
                               1046                 :                :         else if (!DecodeXLogRecord(debug_reader,
                               1047                 :                :                                    decoded,
                               1048                 :                :                                    record,
                               1049                 :                :                                    EndPos,
                               1050                 :                :                                    &errormsg))
                               1051                 :                :         {
                               1052                 :                :             appendStringInfo(&buf, "error decoding record: %s",
                               1053                 :                :                              errormsg ? errormsg : "no error message");
                               1054                 :                :         }
                               1055                 :                :         else
                               1056                 :                :         {
                               1057                 :                :             appendStringInfoString(&buf, " - ");
                               1058                 :                : 
                               1059                 :                :             debug_reader->record = decoded;
                               1060                 :                :             xlog_outdesc(&buf, debug_reader);
                               1061                 :                :             debug_reader->record = NULL;
                               1062                 :                :         }
                               1063                 :                :         elog(LOG, "%s", buf.data);
                               1064                 :                : 
                               1065                 :                :         pfree(decoded);
                               1066                 :                :         pfree(buf.data);
                               1067                 :                :         pfree(recordBuf.data);
                               1068                 :                :         MemoryContextSwitchTo(oldCxt);
                               1069                 :                :     }
                               1070                 :                : #endif
                               1071                 :                : 
                               1072                 :                :     /*
                               1073                 :                :      * Update our global variables
                               1074                 :                :      */
 4443 heikki.linnakangas@i     1075                 :CBC    13847506 :     ProcLastRecPtr = StartPos;
                               1076                 :       13847506 :     XactLastRecEnd = EndPos;
                               1077                 :                : 
                               1078                 :                :     /* Report WAL traffic to the instrumentation. */
 1981 akapila@postgresql.o     1079         [ +  + ]:       13847506 :     if (inserted)
                               1080                 :                :     {
                               1081                 :       13847439 :         pgWalUsage.wal_bytes += rechdr->xl_tot_len;
                               1082                 :       13847439 :         pgWalUsage.wal_records++;
 1950                          1083                 :       13847439 :         pgWalUsage.wal_fpi += num_fpi;
                               1084                 :                : 
                               1085                 :                :         /* Required for the flush of pending stats WAL data */
   40 michael@paquier.xyz      1086                 :       13847439 :         pgstat_report_fixed = true;
                               1087                 :                :     }
                               1088                 :                : 
 4443 heikki.linnakangas@i     1089                 :       13847506 :     return EndPos;
                               1090                 :                : }
                               1091                 :                : 
                               1092                 :                : /*
                               1093                 :                :  * Reserves the right amount of space for a record of given size from the WAL.
                               1094                 :                :  * *StartPos is set to the beginning of the reserved section, *EndPos to
                               1095                 :                :  * its end+1. *PrevPtr is set to the beginning of the previous record; it is
                               1096                 :                :  * used to set the xl_prev of this record.
                               1097                 :                :  *
                               1098                 :                :  * This is the performance critical part of XLogInsert that must be serialized
                               1099                 :                :  * across backends. The rest can happen mostly in parallel. Try to keep this
                               1100                 :                :  * section as short as possible, insertpos_lck can be heavily contended on a
                               1101                 :                :  * busy system.
                               1102                 :                :  *
                               1103                 :                :  * NB: The space calculation here must match the code in CopyXLogRecordToWAL,
                               1104                 :                :  * where we actually copy the record to the reserved space.
                               1105                 :                :  *
                               1106                 :                :  * NB: Testing shows that XLogInsertRecord runs faster if this code is inlined;
                               1107                 :                :  * however, because there are two call sites, the compiler is reluctant to
                               1108                 :                :  * inline. We use pg_attribute_always_inline here to try to convince it.
                               1109                 :                :  */
                               1110                 :                : static pg_attribute_always_inline void
                               1111                 :       13846799 : ReserveXLogInsertLocation(int size, XLogRecPtr *StartPos, XLogRecPtr *EndPos,
                               1112                 :                :                           XLogRecPtr *PrevPtr)
                               1113                 :                : {
 4002 andres@anarazel.de       1114                 :       13846799 :     XLogCtlInsert *Insert = &XLogCtl->Insert;
                               1115                 :                :     uint64      startbytepos;
                               1116                 :                :     uint64      endbytepos;
                               1117                 :                :     uint64      prevbytepos;
                               1118                 :                : 
 4443 heikki.linnakangas@i     1119                 :       13846799 :     size = MAXALIGN(size);
                               1120                 :                : 
                               1121                 :                :     /* All (non xlog-switch) records should contain data. */
                               1122         [ -  + ]:       13846799 :     Assert(size > SizeOfXLogRecord);
                               1123                 :                : 
                               1124                 :                :     /*
                               1125                 :                :      * The duration the spinlock needs to be held is minimized by minimizing
                               1126                 :                :      * the calculations that have to be done while holding the lock. The
                               1127                 :                :      * current tip of reserved WAL is kept in CurrBytePos, as a byte position
                               1128                 :                :      * that only counts "usable" bytes in WAL, that is, it excludes all WAL
                               1129                 :                :      * page headers. The mapping between "usable" byte positions and physical
                               1130                 :                :      * positions (XLogRecPtrs) can be done outside the locked region, and
                               1131                 :                :      * because the usable byte position doesn't include any headers, reserving
                               1132                 :                :      * X bytes from WAL is almost as simple as "CurrBytePos += X".
                               1133                 :                :      */
                               1134         [ +  + ]:       13846799 :     SpinLockAcquire(&Insert->insertpos_lck);
                               1135                 :                : 
                               1136                 :       13846799 :     startbytepos = Insert->CurrBytePos;
                               1137                 :       13846799 :     endbytepos = startbytepos + size;
                               1138                 :       13846799 :     prevbytepos = Insert->PrevBytePos;
                               1139                 :       13846799 :     Insert->CurrBytePos = endbytepos;
                               1140                 :       13846799 :     Insert->PrevBytePos = startbytepos;
                               1141                 :                : 
                               1142                 :       13846799 :     SpinLockRelease(&Insert->insertpos_lck);
                               1143                 :                : 
                               1144                 :       13846799 :     *StartPos = XLogBytePosToRecPtr(startbytepos);
                               1145                 :       13846799 :     *EndPos = XLogBytePosToEndRecPtr(endbytepos);
                               1146                 :       13846799 :     *PrevPtr = XLogBytePosToRecPtr(prevbytepos);
                               1147                 :                : 
                               1148                 :                :     /*
                               1149                 :                :      * Check that the conversions between "usable byte positions" and
                               1150                 :                :      * XLogRecPtrs work consistently in both directions.
                               1151                 :                :      */
                               1152         [ -  + ]:       13846799 :     Assert(XLogRecPtrToBytePos(*StartPos) == startbytepos);
                               1153         [ -  + ]:       13846799 :     Assert(XLogRecPtrToBytePos(*EndPos) == endbytepos);
                               1154         [ -  + ]:       13846799 :     Assert(XLogRecPtrToBytePos(*PrevPtr) == prevbytepos);
                               1155                 :       13846799 : }
                               1156                 :                : 
                               1157                 :                : /*
                               1158                 :                :  * Like ReserveXLogInsertLocation(), but for an xlog-switch record.
                               1159                 :                :  *
                               1160                 :                :  * A log-switch record is handled slightly differently. The rest of the
                               1161                 :                :  * segment will be reserved for this insertion, as indicated by the returned
                               1162                 :                :  * *EndPos value. However, if we are already at the beginning of the current
                               1163                 :                :  * segment, *StartPos and *EndPos are set to the current location without
                               1164                 :                :  * reserving any space, and the function returns false.
                               1165                 :                : */
                               1166                 :                : static bool
                               1167                 :            707 : ReserveXLogSwitch(XLogRecPtr *StartPos, XLogRecPtr *EndPos, XLogRecPtr *PrevPtr)
                               1168                 :                : {
 4002 andres@anarazel.de       1169                 :            707 :     XLogCtlInsert *Insert = &XLogCtl->Insert;
                               1170                 :                :     uint64      startbytepos;
                               1171                 :                :     uint64      endbytepos;
                               1172                 :                :     uint64      prevbytepos;
 3943 heikki.linnakangas@i     1173                 :            707 :     uint32      size = MAXALIGN(SizeOfXLogRecord);
                               1174                 :                :     XLogRecPtr  ptr;
                               1175                 :                :     uint32      segleft;
                               1176                 :                : 
                               1177                 :                :     /*
                               1178                 :                :      * These calculations are a bit heavy-weight to be done while holding a
                               1179                 :                :      * spinlock, but since we're holding all the WAL insertion locks, there
                               1180                 :                :      * are no other inserters competing for it. GetXLogInsertRecPtr() does
                               1181                 :                :      * compete for it, but that's not called very frequently.
                               1182                 :                :      */
 4443                          1183         [ -  + ]:            707 :     SpinLockAcquire(&Insert->insertpos_lck);
                               1184                 :                : 
                               1185                 :            707 :     startbytepos = Insert->CurrBytePos;
                               1186                 :                : 
                               1187                 :            707 :     ptr = XLogBytePosToEndRecPtr(startbytepos);
 2909 andres@anarazel.de       1188         [ +  + ]:            707 :     if (XLogSegmentOffset(ptr, wal_segment_size) == 0)
                               1189                 :                :     {
 4443 heikki.linnakangas@i     1190                 :             67 :         SpinLockRelease(&Insert->insertpos_lck);
                               1191                 :             67 :         *EndPos = *StartPos = ptr;
                               1192                 :             67 :         return false;
                               1193                 :                :     }
                               1194                 :                : 
                               1195                 :            640 :     endbytepos = startbytepos + size;
                               1196                 :            640 :     prevbytepos = Insert->PrevBytePos;
                               1197                 :                : 
                               1198                 :            640 :     *StartPos = XLogBytePosToRecPtr(startbytepos);
                               1199                 :            640 :     *EndPos = XLogBytePosToEndRecPtr(endbytepos);
                               1200                 :                : 
 2909 andres@anarazel.de       1201                 :            640 :     segleft = wal_segment_size - XLogSegmentOffset(*EndPos, wal_segment_size);
                               1202         [ +  - ]:            640 :     if (segleft != wal_segment_size)
                               1203                 :                :     {
                               1204                 :                :         /* consume the rest of the segment */
 4443 heikki.linnakangas@i     1205                 :            640 :         *EndPos += segleft;
                               1206                 :            640 :         endbytepos = XLogRecPtrToBytePos(*EndPos);
                               1207                 :                :     }
                               1208                 :            640 :     Insert->CurrBytePos = endbytepos;
                               1209                 :            640 :     Insert->PrevBytePos = startbytepos;
                               1210                 :                : 
                               1211                 :            640 :     SpinLockRelease(&Insert->insertpos_lck);
                               1212                 :                : 
                               1213                 :            640 :     *PrevPtr = XLogBytePosToRecPtr(prevbytepos);
                               1214                 :                : 
 2909 andres@anarazel.de       1215         [ -  + ]:            640 :     Assert(XLogSegmentOffset(*EndPos, wal_segment_size) == 0);
 4443 heikki.linnakangas@i     1216         [ -  + ]:            640 :     Assert(XLogRecPtrToBytePos(*EndPos) == endbytepos);
                               1217         [ -  + ]:            640 :     Assert(XLogRecPtrToBytePos(*StartPos) == startbytepos);
                               1218         [ -  + ]:            640 :     Assert(XLogRecPtrToBytePos(*PrevPtr) == prevbytepos);
                               1219                 :                : 
                               1220                 :            640 :     return true;
                               1221                 :                : }
                               1222                 :                : 
                               1223                 :                : /*
                               1224                 :                :  * Subroutine of XLogInsertRecord.  Copies a WAL record to an already-reserved
                               1225                 :                :  * area in the WAL.
                               1226                 :                :  */
                               1227                 :                : static void
                               1228                 :       13847439 : CopyXLogRecordToWAL(int write_len, bool isLogSwitch, XLogRecData *rdata,
                               1229                 :                :                     XLogRecPtr StartPos, XLogRecPtr EndPos, TimeLineID tli)
                               1230                 :                : {
                               1231                 :                :     char       *currpos;
                               1232                 :                :     int         freespace;
                               1233                 :                :     int         written;
                               1234                 :                :     XLogRecPtr  CurrPos;
                               1235                 :                :     XLogPageHeader pagehdr;
                               1236                 :                : 
                               1237                 :                :     /*
                               1238                 :                :      * Get a pointer to the right place in the right WAL buffer to start
                               1239                 :                :      * inserting to.
                               1240                 :                :      */
                               1241                 :       13847439 :     CurrPos = StartPos;
 1401 rhaas@postgresql.org     1242                 :       13847439 :     currpos = GetXLogBuffer(CurrPos, tli);
 4443 heikki.linnakangas@i     1243         [ +  - ]:       13847439 :     freespace = INSERT_FREESPACE(CurrPos);
                               1244                 :                : 
                               1245                 :                :     /*
                               1246                 :                :      * there should be enough space for at least the first field (xl_tot_len)
                               1247                 :                :      * on this page.
                               1248                 :                :      */
                               1249         [ -  + ]:       13847439 :     Assert(freespace >= sizeof(uint32));
                               1250                 :                : 
                               1251                 :                :     /* Copy record data */
                               1252                 :       13847439 :     written = 0;
                               1253         [ +  + ]:       66320011 :     while (rdata != NULL)
                               1254                 :                :     {
  368 peter@eisentraut.org     1255                 :       52472572 :         const char *rdata_data = rdata->data;
 4443 heikki.linnakangas@i     1256                 :       52472572 :         int         rdata_len = rdata->len;
                               1257                 :                : 
                               1258         [ +  + ]:       54240921 :         while (rdata_len > freespace)
                               1259                 :                :         {
                               1260                 :                :             /*
                               1261                 :                :              * Write what fits on this page, and continue on the next page.
                               1262                 :                :              */
                               1263   [ +  +  -  + ]:        1768349 :             Assert(CurrPos % XLOG_BLCKSZ >= SizeOfXLogShortPHD || freespace == 0);
                               1264                 :        1768349 :             memcpy(currpos, rdata_data, freespace);
                               1265                 :        1768349 :             rdata_data += freespace;
                               1266                 :        1768349 :             rdata_len -= freespace;
                               1267                 :        1768349 :             written += freespace;
                               1268                 :        1768349 :             CurrPos += freespace;
                               1269                 :                : 
                               1270                 :                :             /*
                               1271                 :                :              * Get pointer to beginning of next page, and set the xlp_rem_len
                               1272                 :                :              * in the page header. Set XLP_FIRST_IS_CONTRECORD.
                               1273                 :                :              *
                               1274                 :                :              * It's safe to set the contrecord flag and xlp_rem_len without a
                               1275                 :                :              * lock on the page. All the other flags were already set when the
                               1276                 :                :              * page was initialized, in AdvanceXLInsertBuffer, and we're the
                               1277                 :                :              * only backend that needs to set the contrecord flag.
                               1278                 :                :              */
 1401 rhaas@postgresql.org     1279                 :        1768349 :             currpos = GetXLogBuffer(CurrPos, tli);
 4443 heikki.linnakangas@i     1280                 :        1768349 :             pagehdr = (XLogPageHeader) currpos;
                               1281                 :        1768349 :             pagehdr->xlp_rem_len = write_len - written;
                               1282                 :        1768349 :             pagehdr->xlp_info |= XLP_FIRST_IS_CONTRECORD;
                               1283                 :                : 
                               1284                 :                :             /* skip over the page header */
 2909 andres@anarazel.de       1285         [ +  + ]:        1768349 :             if (XLogSegmentOffset(CurrPos, wal_segment_size) == 0)
                               1286                 :                :             {
 4443 heikki.linnakangas@i     1287                 :           1187 :                 CurrPos += SizeOfXLogLongPHD;
                               1288                 :           1187 :                 currpos += SizeOfXLogLongPHD;
                               1289                 :                :             }
                               1290                 :                :             else
                               1291                 :                :             {
                               1292                 :        1767162 :                 CurrPos += SizeOfXLogShortPHD;
                               1293                 :        1767162 :                 currpos += SizeOfXLogShortPHD;
                               1294                 :                :             }
                               1295         [ +  - ]:        1768349 :             freespace = INSERT_FREESPACE(CurrPos);
                               1296                 :                :         }
                               1297                 :                : 
                               1298   [ -  +  -  - ]:       52472572 :         Assert(CurrPos % XLOG_BLCKSZ >= SizeOfXLogShortPHD || rdata_len == 0);
                               1299                 :       52472572 :         memcpy(currpos, rdata_data, rdata_len);
                               1300                 :       52472572 :         currpos += rdata_len;
                               1301                 :       52472572 :         CurrPos += rdata_len;
                               1302                 :       52472572 :         freespace -= rdata_len;
                               1303                 :       52472572 :         written += rdata_len;
                               1304                 :                : 
                               1305                 :       52472572 :         rdata = rdata->next;
                               1306                 :                :     }
                               1307         [ -  + ]:       13847439 :     Assert(written == write_len);
                               1308                 :                : 
                               1309                 :                :     /*
                               1310                 :                :      * If this was an xlog-switch, it's not enough to write the switch record,
                               1311                 :                :      * we also have to consume all the remaining space in the WAL segment.  We
                               1312                 :                :      * have already reserved that space, but we need to actually fill it.
                               1313                 :                :      */
 2909 andres@anarazel.de       1314   [ +  +  +  - ]:       13847439 :     if (isLogSwitch && XLogSegmentOffset(CurrPos, wal_segment_size) != 0)
                               1315                 :                :     {
                               1316                 :                :         /* An xlog-switch record doesn't contain any data besides the header */
 4443 heikki.linnakangas@i     1317         [ -  + ]:            640 :         Assert(write_len == SizeOfXLogRecord);
                               1318                 :                : 
                               1319                 :                :         /* Assert that we did reserve the right amount of space */
 2909 andres@anarazel.de       1320         [ -  + ]:            640 :         Assert(XLogSegmentOffset(EndPos, wal_segment_size) == 0);
                               1321                 :                : 
                               1322                 :                :         /* Use up all the remaining space on the current page */
 4443 heikki.linnakangas@i     1323                 :            640 :         CurrPos += freespace;
                               1324                 :                : 
                               1325                 :                :         /*
                               1326                 :                :          * Cause all remaining pages in the segment to be flushed, leaving the
                               1327                 :                :          * XLog position where it should be, at the start of the next segment.
                               1328                 :                :          * We do this one page at a time, to make sure we don't deadlock
                               1329                 :                :          * against ourselves if wal_buffers < wal_segment_size.
                               1330                 :                :          */
                               1331         [ +  + ]:         529997 :         while (CurrPos < EndPos)
                               1332                 :                :         {
                               1333                 :                :             /*
                               1334                 :                :              * The minimal action to flush the page would be to call
                               1335                 :                :              * WALInsertLockUpdateInsertingAt(CurrPos) followed by
                               1336                 :                :              * AdvanceXLInsertBuffer(...).  The page would be left initialized
                               1337                 :                :              * mostly to zeros, except for the page header (always the short
                               1338                 :                :              * variant, as this is never a segment's first page).
                               1339                 :                :              *
                               1340                 :                :              * The large vistas of zeros are good for compressibility, but the
                               1341                 :                :              * headers interrupting them every XLOG_BLCKSZ (with values that
                               1342                 :                :              * differ from page to page) are not.  The effect varies with
                               1343                 :                :              * compression tool, but bzip2 for instance compresses about an
                               1344                 :                :              * order of magnitude worse if those headers are left in place.
                               1345                 :                :              *
                               1346                 :                :              * Rather than complicating AdvanceXLInsertBuffer itself (which is
                               1347                 :                :              * called in heavily-loaded circumstances as well as this lightly-
                               1348                 :                :              * loaded one) with variant behavior, we just use GetXLogBuffer
                               1349                 :                :              * (which itself calls the two methods we need) to get the pointer
                               1350                 :                :              * and zero most of the page.  Then we just zero the page header.
                               1351                 :                :              */
 1401 rhaas@postgresql.org     1352                 :         529357 :             currpos = GetXLogBuffer(CurrPos, tli);
 2717 tgl@sss.pgh.pa.us        1353   [ +  -  +  -  :        2117428 :             MemSet(currpos, 0, SizeOfXLogShortPHD);
                                     +  -  +  -  +  
                                                 + ]
                               1354                 :                : 
 4443 heikki.linnakangas@i     1355                 :         529357 :             CurrPos += XLOG_BLCKSZ;
                               1356                 :                :         }
                               1357                 :                :     }
                               1358                 :                :     else
                               1359                 :                :     {
                               1360                 :                :         /* Align the end position, so that the next record starts aligned */
 3943                          1361                 :       13846799 :         CurrPos = MAXALIGN64(CurrPos);
                               1362                 :                :     }
                               1363                 :                : 
 4443                          1364         [ -  + ]:       13847439 :     if (CurrPos != EndPos)
  521 dgustafsson@postgres     1365         [ #  # ]:UBC           0 :         ereport(PANIC,
                               1366                 :                :                 errcode(ERRCODE_DATA_CORRUPTED),
                               1367                 :                :                 errmsg_internal("space reserved for WAL record does not match what was written"));
 4443 heikki.linnakangas@i     1368                 :CBC    13847439 : }
                               1369                 :                : 
                               1370                 :                : /*
                               1371                 :                :  * Acquire a WAL insertion lock, for inserting to WAL.
                               1372                 :                :  */
                               1373                 :                : static void
 4187                          1374                 :       13852753 : WALInsertLockAcquire(void)
                               1375                 :                : {
                               1376                 :                :     bool        immed;
                               1377                 :                : 
                               1378                 :                :     /*
                               1379                 :                :      * It doesn't matter which of the WAL insertion locks we acquire, so try
                               1380                 :                :      * the one we used last time.  If the system isn't particularly busy, it's
                               1381                 :                :      * a good bet that it's still available, and it's good to have some
                               1382                 :                :      * affinity to a particular lock so that you don't unnecessarily bounce
                               1383                 :                :      * cache lines between processes when there's no contention.
                               1384                 :                :      *
                               1385                 :                :      * If this is the first time through in this backend, pick a lock
                               1386                 :                :      * (semi-)randomly.  This allows the locks to be used evenly if you have a
                               1387                 :                :      * lot of very short connections.
                               1388                 :                :      */
                               1389                 :                :     static int  lockToTry = -1;
                               1390                 :                : 
                               1391         [ +  + ]:       13852753 :     if (lockToTry == -1)
  562                          1392                 :           7202 :         lockToTry = MyProcNumber % NUM_XLOGINSERT_LOCKS;
 4187                          1393                 :       13852753 :     MyLockNo = lockToTry;
                               1394                 :                : 
                               1395                 :                :     /*
                               1396                 :                :      * The insertingAt value is initially set to 0, as we don't know our
                               1397                 :                :      * insert location yet.
                               1398                 :                :      */
 3690 andres@anarazel.de       1399                 :       13852753 :     immed = LWLockAcquire(&WALInsertLocks[MyLockNo].l.lock, LW_EXCLUSIVE);
 4187 heikki.linnakangas@i     1400         [ +  + ]:       13852753 :     if (!immed)
                               1401                 :                :     {
                               1402                 :                :         /*
                               1403                 :                :          * If we couldn't get the lock immediately, try another lock next
                               1404                 :                :          * time.  On a system with more insertion locks than concurrent
                               1405                 :                :          * inserters, this causes all the inserters to eventually migrate to a
                               1406                 :                :          * lock that no-one else is using.  On a system with more inserters
                               1407                 :                :          * than locks, it still helps to distribute the inserters evenly
                               1408                 :                :          * across the locks.
                               1409                 :                :          */
 3993                          1410                 :          18775 :         lockToTry = (lockToTry + 1) % NUM_XLOGINSERT_LOCKS;
                               1411                 :                :     }
 4443                          1412                 :       13852753 : }
                               1413                 :                : 
                               1414                 :                : /*
                               1415                 :                :  * Acquire all WAL insertion locks, to prevent other backends from inserting
                               1416                 :                :  * to WAL.
                               1417                 :                :  */
                               1418                 :                : static void
 4187                          1419                 :           4261 : WALInsertLockAcquireExclusive(void)
                               1420                 :                : {
                               1421                 :                :     int         i;
                               1422                 :                : 
                               1423                 :                :     /*
                               1424                 :                :      * When holding all the locks, all but the last lock's insertingAt
                               1425                 :                :      * indicator is set to 0xFFFFFFFFFFFFFFFF, which is higher than any real
                               1426                 :                :      * XLogRecPtr value, to make sure that no-one blocks waiting on those.
                               1427                 :                :      */
 3993                          1428         [ +  + ]:          34088 :     for (i = 0; i < NUM_XLOGINSERT_LOCKS - 1; i++)
                               1429                 :                :     {
 3690 andres@anarazel.de       1430                 :          29827 :         LWLockAcquire(&WALInsertLocks[i].l.lock, LW_EXCLUSIVE);
                               1431                 :          29827 :         LWLockUpdateVar(&WALInsertLocks[i].l.lock,
                               1432                 :          29827 :                         &WALInsertLocks[i].l.insertingAt,
                               1433                 :                :                         PG_UINT64_MAX);
                               1434                 :                :     }
                               1435                 :                :     /* Variable value reset to 0 at release */
                               1436                 :           4261 :     LWLockAcquire(&WALInsertLocks[i].l.lock, LW_EXCLUSIVE);
                               1437                 :                : 
 4187 heikki.linnakangas@i     1438                 :           4261 :     holdingAllLocks = true;
 4443                          1439                 :           4261 : }
                               1440                 :                : 
                               1441                 :                : /*
                               1442                 :                :  * Release our insertion lock (or locks, if we're holding them all).
                               1443                 :                :  *
                               1444                 :                :  * NB: Reset all variables to 0, so they cause LWLockWaitForVar to block the
                               1445                 :                :  * next time the lock is acquired.
                               1446                 :                :  */
                               1447                 :                : static void
 4187                          1448                 :       13857014 : WALInsertLockRelease(void)
                               1449                 :                : {
                               1450         [ +  + ]:       13857014 :     if (holdingAllLocks)
                               1451                 :                :     {
                               1452                 :                :         int         i;
                               1453                 :                : 
 3993                          1454         [ +  + ]:          38349 :         for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
 3690 andres@anarazel.de       1455                 :          34088 :             LWLockReleaseClearVar(&WALInsertLocks[i].l.lock,
                               1456                 :          34088 :                                   &WALInsertLocks[i].l.insertingAt,
                               1457                 :                :                                   0);
                               1458                 :                : 
 4187 heikki.linnakangas@i     1459                 :           4261 :         holdingAllLocks = false;
                               1460                 :                :     }
                               1461                 :                :     else
                               1462                 :                :     {
 3690 andres@anarazel.de       1463                 :       13852753 :         LWLockReleaseClearVar(&WALInsertLocks[MyLockNo].l.lock,
                               1464                 :       13852753 :                               &WALInsertLocks[MyLockNo].l.insertingAt,
                               1465                 :                :                               0);
                               1466                 :                :     }
 4443 heikki.linnakangas@i     1467                 :       13857014 : }
                               1468                 :                : 
                               1469                 :                : /*
                               1470                 :                :  * Update our insertingAt value, to let others know that we've finished
                               1471                 :                :  * inserting up to that point.
                               1472                 :                :  */
                               1473                 :                : static void
 4187                          1474                 :        2235189 : WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt)
                               1475                 :                : {
                               1476         [ +  + ]:        2235189 :     if (holdingAllLocks)
                               1477                 :                :     {
                               1478                 :                :         /*
                               1479                 :                :          * We use the last lock to mark our actual position, see comments in
                               1480                 :                :          * WALInsertLockAcquireExclusive.
                               1481                 :                :          */
 3993                          1482                 :         528124 :         LWLockUpdateVar(&WALInsertLocks[NUM_XLOGINSERT_LOCKS - 1].l.lock,
 2999 tgl@sss.pgh.pa.us        1483                 :         528124 :                         &WALInsertLocks[NUM_XLOGINSERT_LOCKS - 1].l.insertingAt,
                               1484                 :                :                         insertingAt);
                               1485                 :                :     }
                               1486                 :                :     else
 4187 heikki.linnakangas@i     1487                 :        1707065 :         LWLockUpdateVar(&WALInsertLocks[MyLockNo].l.lock,
                               1488                 :        1707065 :                         &WALInsertLocks[MyLockNo].l.insertingAt,
                               1489                 :                :                         insertingAt);
 4443                          1490                 :        2235189 : }
                               1491                 :                : 
                               1492                 :                : /*
                               1493                 :                :  * Wait for any WAL insertions < upto to finish.
                               1494                 :                :  *
                               1495                 :                :  * Returns the location of the oldest insertion that is still in-progress.
                               1496                 :                :  * Any WAL prior to that point has been fully copied into WAL buffers, and
                               1497                 :                :  * can be flushed out to disk. Because this waits for any insertions older
                               1498                 :                :  * than 'upto' to finish, the return value is always >= 'upto'.
                               1499                 :                :  *
                               1500                 :                :  * Note: When you are about to write out WAL, you must call this function
                               1501                 :                :  * *before* acquiring WALWriteLock, to avoid deadlocks. This function might
                               1502                 :                :  * need to wait for an insertion to finish (or at least advance to next
                               1503                 :                :  * uninitialized page), and the inserter might need to evict an old WAL buffer
                               1504                 :                :  * to make room for a new one, which in turn requires WALWriteLock.
                               1505                 :                :  */
                               1506                 :                : static XLogRecPtr
                               1507                 :        2111604 : WaitXLogInsertionsToFinish(XLogRecPtr upto)
                               1508                 :                : {
                               1509                 :                :     uint64      bytepos;
                               1510                 :                :     XLogRecPtr  inserted;
                               1511                 :                :     XLogRecPtr  reservedUpto;
                               1512                 :                :     XLogRecPtr  finishedUpto;
 4002 andres@anarazel.de       1513                 :        2111604 :     XLogCtlInsert *Insert = &XLogCtl->Insert;
                               1514                 :                :     int         i;
                               1515                 :                : 
 4443 heikki.linnakangas@i     1516         [ -  + ]:        2111604 :     if (MyProc == NULL)
 4443 heikki.linnakangas@i     1517         [ #  # ]:UBC           0 :         elog(PANIC, "cannot wait without a PGPROC structure");
                               1518                 :                : 
                               1519                 :                :     /*
                               1520                 :                :      * Check if there's any work to do.  Use a barrier to ensure we get the
                               1521                 :                :      * freshest value.
                               1522                 :                :      */
  517 alvherre@alvh.no-ip.     1523                 :CBC     2111604 :     inserted = pg_atomic_read_membarrier_u64(&XLogCtl->logInsertResult);
                               1524         [ +  + ]:        2111604 :     if (upto <= inserted)
                               1525                 :        1665819 :         return inserted;
                               1526                 :                : 
                               1527                 :                :     /* Read the current insert position */
 4443 heikki.linnakangas@i     1528         [ +  + ]:         445785 :     SpinLockAcquire(&Insert->insertpos_lck);
                               1529                 :         445785 :     bytepos = Insert->CurrBytePos;
                               1530                 :         445785 :     SpinLockRelease(&Insert->insertpos_lck);
                               1531                 :         445785 :     reservedUpto = XLogBytePosToEndRecPtr(bytepos);
                               1532                 :                : 
                               1533                 :                :     /*
                               1534                 :                :      * No-one should request to flush a piece of WAL that hasn't even been
                               1535                 :                :      * reserved yet. However, it can happen if there is a block with a bogus
                               1536                 :                :      * LSN on disk, for example. XLogFlush checks for that situation and
                               1537                 :                :      * complains, but only after the flush. Here we just assume that to mean
                               1538                 :                :      * that all WAL that has been reserved needs to be finished. In this
                               1539                 :                :      * corner-case, the return value can be smaller than 'upto' argument.
                               1540                 :                :      */
                               1541         [ -  + ]:         445785 :     if (upto > reservedUpto)
                               1542                 :                :     {
 1737 peter@eisentraut.org     1543         [ #  # ]:UBC           0 :         ereport(LOG,
                               1544                 :                :                 errmsg("request to flush past end of generated WAL; request %X/%08X, current position %X/%08X",
                               1545                 :                :                        LSN_FORMAT_ARGS(upto), LSN_FORMAT_ARGS(reservedUpto)));
 4443 heikki.linnakangas@i     1546                 :              0 :         upto = reservedUpto;
                               1547                 :                :     }
                               1548                 :                : 
                               1549                 :                :     /*
                               1550                 :                :      * Loop through all the locks, sleeping on any in-progress insert older
                               1551                 :                :      * than 'upto'.
                               1552                 :                :      *
                               1553                 :                :      * finishedUpto is our return value, indicating the point upto which all
                               1554                 :                :      * the WAL insertions have been finished. Initialize it to the head of
                               1555                 :                :      * reserved WAL, and as we iterate through the insertion locks, back it
                               1556                 :                :      * out for any insertion that's still in progress.
                               1557                 :                :      */
 4443 heikki.linnakangas@i     1558                 :CBC      445785 :     finishedUpto = reservedUpto;
 3993                          1559         [ +  + ]:        4012065 :     for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
                               1560                 :                :     {
 4141 bruce@momjian.us         1561                 :        3566280 :         XLogRecPtr  insertingat = InvalidXLogRecPtr;
                               1562                 :                : 
                               1563                 :                :         do
                               1564                 :                :         {
                               1565                 :                :             /*
                               1566                 :                :              * See if this insertion is in progress.  LWLockWaitForVar will
                               1567                 :                :              * wait for the lock to be released, or for the 'value' to be set
                               1568                 :                :              * by a LWLockUpdateVar call.  When a lock is initially acquired,
                               1569                 :                :              * its value is 0 (InvalidXLogRecPtr), which means that we don't
                               1570                 :                :              * know where it's inserting yet.  We will have to wait for it. If
                               1571                 :                :              * it's a small insertion, the record will most likely fit on the
                               1572                 :                :              * same page and the inserter will release the lock without ever
                               1573                 :                :              * calling LWLockUpdateVar.  But if it has to sleep, it will
                               1574                 :                :              * advertise the insertion point with LWLockUpdateVar before
                               1575                 :                :              * sleeping.
                               1576                 :                :              *
                               1577                 :                :              * In this loop we are only waiting for insertions that started
                               1578                 :                :              * before WaitXLogInsertionsToFinish was called.  The lack of
                               1579                 :                :              * memory barriers in the loop means that we might see locks as
                               1580                 :                :              * "unused" that have since become used.  This is fine because
                               1581                 :                :              * they only can be used for later insertions that we would not
                               1582                 :                :              * want to wait on anyway.  Not taking a lock to acquire the
                               1583                 :                :              * current insertingAt value means that we might see older
                               1584                 :                :              * insertingAt values.  This is also fine, because if we read a
                               1585                 :                :              * value too old, we will add ourselves to the wait queue, which
                               1586                 :                :              * contains atomic operations.
                               1587                 :                :              */
 4187 heikki.linnakangas@i     1588         [ +  + ]:        3684573 :             if (LWLockWaitForVar(&WALInsertLocks[i].l.lock,
                               1589                 :        3684573 :                                  &WALInsertLocks[i].l.insertingAt,
                               1590                 :                :                                  insertingat, &insertingat))
                               1591                 :                :             {
                               1592                 :                :                 /* the lock was free, so no insertion in progress */
                               1593                 :        2543781 :                 insertingat = InvalidXLogRecPtr;
                               1594                 :        2543781 :                 break;
                               1595                 :                :             }
                               1596                 :                : 
                               1597                 :                :             /*
                               1598                 :                :              * This insertion is still in progress. Have to wait, unless the
                               1599                 :                :              * inserter has proceeded past 'upto'.
                               1600                 :                :              */
                               1601         [ +  + ]:        1140792 :         } while (insertingat < upto);
                               1602                 :                : 
                               1603   [ +  +  +  + ]:        3566280 :         if (insertingat != InvalidXLogRecPtr && insertingat < finishedUpto)
                               1604                 :         388762 :             finishedUpto = insertingat;
                               1605                 :                :     }
                               1606                 :                : 
                               1607                 :                :     /*
                               1608                 :                :      * Advance the limit we know to have been inserted and return the freshest
                               1609                 :                :      * value we know of, which might be beyond what we requested if somebody
                               1610                 :                :      * is concurrently doing this with an 'upto' pointer ahead of us.
                               1611                 :                :      */
  517 alvherre@alvh.no-ip.     1612                 :         445785 :     finishedUpto = pg_atomic_monotonic_advance_u64(&XLogCtl->logInsertResult,
                               1613                 :                :                                                    finishedUpto);
                               1614                 :                : 
 4443 heikki.linnakangas@i     1615                 :         445785 :     return finishedUpto;
                               1616                 :                : }
                               1617                 :                : 
                               1618                 :                : /*
                               1619                 :                :  * Get a pointer to the right location in the WAL buffer containing the
                               1620                 :                :  * given XLogRecPtr.
                               1621                 :                :  *
                               1622                 :                :  * If the page is not initialized yet, it is initialized. That might require
                               1623                 :                :  * evicting an old dirty buffer from the buffer cache, which means I/O.
                               1624                 :                :  *
                               1625                 :                :  * The caller must ensure that the page containing the requested location
                               1626                 :                :  * isn't evicted yet, and won't be evicted. The way to ensure that is to
                               1627                 :                :  * hold onto a WAL insertion lock with the insertingAt position set to
                               1628                 :                :  * something <= ptr. GetXLogBuffer() will update insertingAt if it needs
                               1629                 :                :  * to evict an old page from the buffer. (This means that once you call
                               1630                 :                :  * GetXLogBuffer() with a given 'ptr', you must not access anything before
                               1631                 :                :  * that point anymore, and must not call GetXLogBuffer() with an older 'ptr'
                               1632                 :                :  * later, because older buffers might be recycled already)
                               1633                 :                :  */
                               1634                 :                : static char *
 1401 rhaas@postgresql.org     1635                 :       16145156 : GetXLogBuffer(XLogRecPtr ptr, TimeLineID tli)
                               1636                 :                : {
                               1637                 :                :     int         idx;
                               1638                 :                :     XLogRecPtr  endptr;
                               1639                 :                :     static uint64 cachedPage = 0;
                               1640                 :                :     static char *cachedPos = NULL;
                               1641                 :                :     XLogRecPtr  expectedEndPtr;
                               1642                 :                : 
                               1643                 :                :     /*
                               1644                 :                :      * Fast path for the common case that we need to access again the same
                               1645                 :                :      * page as last time.
                               1646                 :                :      */
 4443 heikki.linnakangas@i     1647         [ +  + ]:       16145156 :     if (ptr / XLOG_BLCKSZ == cachedPage)
                               1648                 :                :     {
                               1649         [ -  + ]:       13514383 :         Assert(((XLogPageHeader) cachedPos)->xlp_magic == XLOG_PAGE_MAGIC);
                               1650         [ -  + ]:       13514383 :         Assert(((XLogPageHeader) cachedPos)->xlp_pageaddr == ptr - (ptr % XLOG_BLCKSZ));
                               1651                 :       13514383 :         return cachedPos + ptr % XLOG_BLCKSZ;
                               1652                 :                :     }
                               1653                 :                : 
                               1654                 :                :     /*
                               1655                 :                :      * The XLog buffer cache is organized so that a page is always loaded to a
                               1656                 :                :      * particular buffer.  That way we can easily calculate the buffer a given
                               1657                 :                :      * page must be loaded into, from the XLogRecPtr alone.
                               1658                 :                :      */
                               1659                 :        2630773 :     idx = XLogRecPtrToBufIdx(ptr);
                               1660                 :                : 
                               1661                 :                :     /*
                               1662                 :                :      * See what page is loaded in the buffer at the moment. It could be the
                               1663                 :                :      * page we're looking for, or something older. It can't be anything newer
                               1664                 :                :      * - that would imply the page we're looking for has already been written
                               1665                 :                :      * out to disk and evicted, and the caller is responsible for making sure
                               1666                 :                :      * that doesn't happen.
                               1667                 :                :      *
                               1668                 :                :      * We don't hold a lock while we read the value. If someone is just about
                               1669                 :                :      * to initialize or has just initialized the page, it's possible that we
                               1670                 :                :      * get InvalidXLogRecPtr. That's ok, we'll grab the mapping lock (in
                               1671                 :                :      * AdvanceXLInsertBuffer) and retry if we see anything other than the page
                               1672                 :                :      * we're looking for.
                               1673                 :                :      */
                               1674                 :        2630773 :     expectedEndPtr = ptr;
                               1675                 :        2630773 :     expectedEndPtr += XLOG_BLCKSZ - ptr % XLOG_BLCKSZ;
                               1676                 :                : 
  627 jdavis@postgresql.or     1677                 :        2630773 :     endptr = pg_atomic_read_u64(&XLogCtl->xlblocks[idx]);
 4443 heikki.linnakangas@i     1678         [ +  + ]:        2630773 :     if (expectedEndPtr != endptr)
                               1679                 :                :     {
                               1680                 :                :         XLogRecPtr  initializedUpto;
                               1681                 :                : 
                               1682                 :                :         /*
                               1683                 :                :          * Before calling AdvanceXLInsertBuffer(), which can block, let others
                               1684                 :                :          * know how far we're finished with inserting the record.
                               1685                 :                :          *
                               1686                 :                :          * NB: If 'ptr' points to just after the page header, advertise a
                               1687                 :                :          * position at the beginning of the page rather than 'ptr' itself. If
                               1688                 :                :          * there are no other insertions running, someone might try to flush
                               1689                 :                :          * up to our advertised location. If we advertised a position after
                               1690                 :                :          * the page header, someone might try to flush the page header, even
                               1691                 :                :          * though page might actually not be initialized yet. As the first
                               1692                 :                :          * inserter on the page, we are effectively responsible for making
                               1693                 :                :          * sure that it's initialized, before we let insertingAt to move past
                               1694                 :                :          * the page header.
                               1695                 :                :          */
 3688                          1696         [ +  + ]:        2235189 :         if (ptr % XLOG_BLCKSZ == SizeOfXLogShortPHD &&
 2909 andres@anarazel.de       1697         [ +  - ]:           5302 :             XLogSegmentOffset(ptr, wal_segment_size) > XLOG_BLCKSZ)
 3688 heikki.linnakangas@i     1698                 :           5302 :             initializedUpto = ptr - SizeOfXLogShortPHD;
                               1699         [ +  + ]:        2229887 :         else if (ptr % XLOG_BLCKSZ == SizeOfXLogLongPHD &&
 2909 andres@anarazel.de       1700         [ +  + ]:            717 :                  XLogSegmentOffset(ptr, wal_segment_size) < XLOG_BLCKSZ)
 3688 heikki.linnakangas@i     1701                 :            521 :             initializedUpto = ptr - SizeOfXLogLongPHD;
                               1702                 :                :         else
                               1703                 :        2229366 :             initializedUpto = ptr;
                               1704                 :                : 
                               1705                 :        2235189 :         WALInsertLockUpdateInsertingAt(initializedUpto);
                               1706                 :                : 
 1401 rhaas@postgresql.org     1707                 :        2235189 :         AdvanceXLInsertBuffer(ptr, tli, false);
  627 jdavis@postgresql.or     1708                 :        2235189 :         endptr = pg_atomic_read_u64(&XLogCtl->xlblocks[idx]);
                               1709                 :                : 
 4443 heikki.linnakangas@i     1710         [ -  + ]:        2235189 :         if (expectedEndPtr != endptr)
   61 alvherre@kurilemu.de     1711         [ #  # ]:UNC           0 :             elog(PANIC, "could not find WAL buffer for %X/%08X",
                               1712                 :                :                  LSN_FORMAT_ARGS(ptr));
                               1713                 :                :     }
                               1714                 :                :     else
                               1715                 :                :     {
                               1716                 :                :         /*
                               1717                 :                :          * Make sure the initialization of the page is visible to us, and
                               1718                 :                :          * won't arrive later to overwrite the WAL data we write on the page.
                               1719                 :                :          */
 4443 heikki.linnakangas@i     1720                 :CBC      395584 :         pg_memory_barrier();
                               1721                 :                :     }
                               1722                 :                : 
                               1723                 :                :     /*
                               1724                 :                :      * Found the buffer holding this page. Return a pointer to the right
                               1725                 :                :      * offset within the page.
                               1726                 :                :      */
                               1727                 :        2630773 :     cachedPage = ptr / XLOG_BLCKSZ;
                               1728                 :        2630773 :     cachedPos = XLogCtl->pages + idx * (Size) XLOG_BLCKSZ;
                               1729                 :                : 
                               1730         [ -  + ]:        2630773 :     Assert(((XLogPageHeader) cachedPos)->xlp_magic == XLOG_PAGE_MAGIC);
                               1731         [ -  + ]:        2630773 :     Assert(((XLogPageHeader) cachedPos)->xlp_pageaddr == ptr - (ptr % XLOG_BLCKSZ));
                               1732                 :                : 
                               1733                 :        2630773 :     return cachedPos + ptr % XLOG_BLCKSZ;
                               1734                 :                : }
                               1735                 :                : 
                               1736                 :                : /*
                               1737                 :                :  * Read WAL data directly from WAL buffers, if available. Returns the number
                               1738                 :                :  * of bytes read successfully.
                               1739                 :                :  *
                               1740                 :                :  * Fewer than 'count' bytes may be read if some of the requested WAL data has
                               1741                 :                :  * already been evicted.
                               1742                 :                :  *
                               1743                 :                :  * No locks are taken.
                               1744                 :                :  *
                               1745                 :                :  * Caller should ensure that it reads no further than LogwrtResult.Write
                               1746                 :                :  * (which should have been updated by the caller when determining how far to
                               1747                 :                :  * read). The 'tli' argument is only used as a convenient safety check so that
                               1748                 :                :  * callers do not read from WAL buffers on a historical timeline.
                               1749                 :                :  */
                               1750                 :                : Size
  572 jdavis@postgresql.or     1751                 :          99037 : WALReadFromBuffers(char *dstbuf, XLogRecPtr startptr, Size count,
                               1752                 :                :                    TimeLineID tli)
                               1753                 :                : {
                               1754                 :          99037 :     char       *pdst = dstbuf;
                               1755                 :          99037 :     XLogRecPtr  recptr = startptr;
                               1756                 :                :     XLogRecPtr  inserted;
  568                          1757                 :          99037 :     Size        nbytes = count;
                               1758                 :                : 
  572                          1759   [ +  +  +  + ]:          99037 :     if (RecoveryInProgress() || tli != GetWALInsertionTimeLine())
                               1760                 :            845 :         return 0;
                               1761                 :                : 
                               1762         [ -  + ]:          98192 :     Assert(!XLogRecPtrIsInvalid(startptr));
                               1763                 :                : 
                               1764                 :                :     /*
                               1765                 :                :      * Caller should ensure that the requested data has been inserted into WAL
                               1766                 :                :      * buffers before we try to read it.
                               1767                 :                :      */
  517 alvherre@alvh.no-ip.     1768                 :          98192 :     inserted = pg_atomic_read_u64(&XLogCtl->logInsertResult);
                               1769         [ -  + ]:          98192 :     if (startptr + count > inserted)
  517 alvherre@alvh.no-ip.     1770         [ #  # ]:UBC           0 :         ereport(ERROR,
                               1771                 :                :                 errmsg("cannot read past end of generated WAL: requested %X/%08X, current position %X/%08X",
                               1772                 :                :                        LSN_FORMAT_ARGS(startptr + count),
                               1773                 :                :                        LSN_FORMAT_ARGS(inserted)));
                               1774                 :                : 
                               1775                 :                :     /*
                               1776                 :                :      * Loop through the buffers without a lock. For each buffer, atomically
                               1777                 :                :      * read and verify the end pointer, then copy the data out, and finally
                               1778                 :                :      * re-read and re-verify the end pointer.
                               1779                 :                :      *
                               1780                 :                :      * Once a page is evicted, it never returns to the WAL buffers, so if the
                               1781                 :                :      * end pointer matches the expected end pointer before and after we copy
                               1782                 :                :      * the data, then the right page must have been present during the data
                               1783                 :                :      * copy. Read barriers are necessary to ensure that the data copy actually
                               1784                 :                :      * happens between the two verification steps.
                               1785                 :                :      *
                               1786                 :                :      * If either verification fails, we simply terminate the loop and return
                               1787                 :                :      * with the data that had been already copied out successfully.
                               1788                 :                :      */
  572 jdavis@postgresql.or     1789         [ +  + ]:CBC      121573 :     while (nbytes > 0)
                               1790                 :                :     {
                               1791                 :         115011 :         uint32      offset = recptr % XLOG_BLCKSZ;
                               1792                 :         115011 :         int         idx = XLogRecPtrToBufIdx(recptr);
                               1793                 :                :         XLogRecPtr  expectedEndPtr;
                               1794                 :                :         XLogRecPtr  endptr;
                               1795                 :                :         const char *page;
                               1796                 :                :         const char *psrc;
                               1797                 :                :         Size        npagebytes;
                               1798                 :                : 
                               1799                 :                :         /*
                               1800                 :                :          * Calculate the end pointer we expect in the xlblocks array if the
                               1801                 :                :          * correct page is present.
                               1802                 :                :          */
                               1803                 :         115011 :         expectedEndPtr = recptr + (XLOG_BLCKSZ - offset);
                               1804                 :                : 
                               1805                 :                :         /*
                               1806                 :                :          * First verification step: check that the correct page is present in
                               1807                 :                :          * the WAL buffers.
                               1808                 :                :          */
                               1809                 :         115011 :         endptr = pg_atomic_read_u64(&XLogCtl->xlblocks[idx]);
                               1810         [ +  + ]:         115011 :         if (expectedEndPtr != endptr)
                               1811                 :          91630 :             break;
                               1812                 :                : 
                               1813                 :                :         /*
                               1814                 :                :          * The correct page is present (or was at the time the endptr was
                               1815                 :                :          * read; must re-verify later). Calculate pointer to source data and
                               1816                 :                :          * determine how much data to read from this page.
                               1817                 :                :          */
                               1818                 :          23381 :         page = XLogCtl->pages + idx * (Size) XLOG_BLCKSZ;
                               1819                 :          23381 :         psrc = page + offset;
                               1820                 :          23381 :         npagebytes = Min(nbytes, XLOG_BLCKSZ - offset);
                               1821                 :                : 
                               1822                 :                :         /*
                               1823                 :                :          * Ensure that the data copy and the first verification step are not
                               1824                 :                :          * reordered.
                               1825                 :                :          */
                               1826                 :          23381 :         pg_read_barrier();
                               1827                 :                : 
                               1828                 :                :         /* data copy */
                               1829                 :          23381 :         memcpy(pdst, psrc, npagebytes);
                               1830                 :                : 
                               1831                 :                :         /*
                               1832                 :                :          * Ensure that the data copy and the second verification step are not
                               1833                 :                :          * reordered.
                               1834                 :                :          */
                               1835                 :          23381 :         pg_read_barrier();
                               1836                 :                : 
                               1837                 :                :         /*
                               1838                 :                :          * Second verification step: check that the page we read from wasn't
                               1839                 :                :          * evicted while we were copying the data.
                               1840                 :                :          */
                               1841                 :          23381 :         endptr = pg_atomic_read_u64(&XLogCtl->xlblocks[idx]);
                               1842         [ -  + ]:          23381 :         if (expectedEndPtr != endptr)
  572 jdavis@postgresql.or     1843                 :UBC           0 :             break;
                               1844                 :                : 
  572 jdavis@postgresql.or     1845                 :CBC       23381 :         pdst += npagebytes;
                               1846                 :          23381 :         recptr += npagebytes;
                               1847                 :          23381 :         nbytes -= npagebytes;
                               1848                 :                :     }
                               1849                 :                : 
                               1850         [ -  + ]:          98192 :     Assert(pdst - dstbuf <= count);
                               1851                 :                : 
                               1852                 :          98192 :     return pdst - dstbuf;
                               1853                 :                : }
                               1854                 :                : 
                               1855                 :                : /*
                               1856                 :                :  * Converts a "usable byte position" to XLogRecPtr. A usable byte position
                               1857                 :                :  * is the position starting from the beginning of WAL, excluding all WAL
                               1858                 :                :  * page headers.
                               1859                 :                :  */
                               1860                 :                : static XLogRecPtr
 4443 heikki.linnakangas@i     1861                 :       27697401 : XLogBytePosToRecPtr(uint64 bytepos)
                               1862                 :                : {
                               1863                 :                :     uint64      fullsegs;
                               1864                 :                :     uint64      fullpages;
                               1865                 :                :     uint64      bytesleft;
                               1866                 :                :     uint32      seg_offset;
                               1867                 :                :     XLogRecPtr  result;
                               1868                 :                : 
                               1869                 :       27697401 :     fullsegs = bytepos / UsableBytesInSegment;
                               1870                 :       27697401 :     bytesleft = bytepos % UsableBytesInSegment;
                               1871                 :                : 
                               1872         [ +  + ]:       27697401 :     if (bytesleft < XLOG_BLCKSZ - SizeOfXLogLongPHD)
                               1873                 :                :     {
                               1874                 :                :         /* fits on first page of segment */
                               1875                 :          50569 :         seg_offset = bytesleft + SizeOfXLogLongPHD;
                               1876                 :                :     }
                               1877                 :                :     else
                               1878                 :                :     {
                               1879                 :                :         /* account for the first page on segment with long header */
                               1880                 :       27646832 :         seg_offset = XLOG_BLCKSZ;
                               1881                 :       27646832 :         bytesleft -= XLOG_BLCKSZ - SizeOfXLogLongPHD;
                               1882                 :                : 
                               1883                 :       27646832 :         fullpages = bytesleft / UsableBytesInPage;
                               1884                 :       27646832 :         bytesleft = bytesleft % UsableBytesInPage;
                               1885                 :                : 
                               1886                 :       27646832 :         seg_offset += fullpages * XLOG_BLCKSZ + bytesleft + SizeOfXLogShortPHD;
                               1887                 :                :     }
                               1888                 :                : 
 2616 alvherre@alvh.no-ip.     1889                 :       27697401 :     XLogSegNoOffsetToRecPtr(fullsegs, seg_offset, wal_segment_size, result);
                               1890                 :                : 
 4443 heikki.linnakangas@i     1891                 :       27697401 :     return result;
                               1892                 :                : }
                               1893                 :                : 
                               1894                 :                : /*
                               1895                 :                :  * Like XLogBytePosToRecPtr, but if the position is at a page boundary,
                               1896                 :                :  * returns a pointer to the beginning of the page (ie. before page header),
                               1897                 :                :  * not to where the first xlog record on that page would go to. This is used
                               1898                 :                :  * when converting a pointer to the end of a record.
                               1899                 :                :  */
                               1900                 :                : static XLogRecPtr
                               1901                 :       14293931 : XLogBytePosToEndRecPtr(uint64 bytepos)
                               1902                 :                : {
                               1903                 :                :     uint64      fullsegs;
                               1904                 :                :     uint64      fullpages;
                               1905                 :                :     uint64      bytesleft;
                               1906                 :                :     uint32      seg_offset;
                               1907                 :                :     XLogRecPtr  result;
                               1908                 :                : 
                               1909                 :       14293931 :     fullsegs = bytepos / UsableBytesInSegment;
                               1910                 :       14293931 :     bytesleft = bytepos % UsableBytesInSegment;
                               1911                 :                : 
                               1912         [ +  + ]:       14293931 :     if (bytesleft < XLOG_BLCKSZ - SizeOfXLogLongPHD)
                               1913                 :                :     {
                               1914                 :                :         /* fits on first page of segment */
                               1915         [ +  + ]:          80723 :         if (bytesleft == 0)
                               1916                 :          54202 :             seg_offset = 0;
                               1917                 :                :         else
                               1918                 :          26521 :             seg_offset = bytesleft + SizeOfXLogLongPHD;
                               1919                 :                :     }
                               1920                 :                :     else
                               1921                 :                :     {
                               1922                 :                :         /* account for the first page on segment with long header */
                               1923                 :       14213208 :         seg_offset = XLOG_BLCKSZ;
                               1924                 :       14213208 :         bytesleft -= XLOG_BLCKSZ - SizeOfXLogLongPHD;
                               1925                 :                : 
                               1926                 :       14213208 :         fullpages = bytesleft / UsableBytesInPage;
                               1927                 :       14213208 :         bytesleft = bytesleft % UsableBytesInPage;
                               1928                 :                : 
                               1929         [ +  + ]:       14213208 :         if (bytesleft == 0)
                               1930                 :          13500 :             seg_offset += fullpages * XLOG_BLCKSZ + bytesleft;
                               1931                 :                :         else
                               1932                 :       14199708 :             seg_offset += fullpages * XLOG_BLCKSZ + bytesleft + SizeOfXLogShortPHD;
                               1933                 :                :     }
                               1934                 :                : 
 2616 alvherre@alvh.no-ip.     1935                 :       14293931 :     XLogSegNoOffsetToRecPtr(fullsegs, seg_offset, wal_segment_size, result);
                               1936                 :                : 
 4443 heikki.linnakangas@i     1937                 :       14293931 :     return result;
                               1938                 :                : }
                               1939                 :                : 
                               1940                 :                : /*
                               1941                 :                :  * Convert an XLogRecPtr to a "usable byte position".
                               1942                 :                :  */
                               1943                 :                : static uint64
                               1944                 :       41544621 : XLogRecPtrToBytePos(XLogRecPtr ptr)
                               1945                 :                : {
                               1946                 :                :     uint64      fullsegs;
                               1947                 :                :     uint32      fullpages;
                               1948                 :                :     uint32      offset;
                               1949                 :                :     uint64      result;
                               1950                 :                : 
 2909 andres@anarazel.de       1951                 :       41544621 :     XLByteToSeg(ptr, fullsegs, wal_segment_size);
                               1952                 :                : 
                               1953                 :       41544621 :     fullpages = (XLogSegmentOffset(ptr, wal_segment_size)) / XLOG_BLCKSZ;
 4443 heikki.linnakangas@i     1954                 :       41544621 :     offset = ptr % XLOG_BLCKSZ;
                               1955                 :                : 
                               1956         [ +  + ]:       41544621 :     if (fullpages == 0)
                               1957                 :                :     {
                               1958                 :          76561 :         result = fullsegs * UsableBytesInSegment;
                               1959         [ +  + ]:          76561 :         if (offset > 0)
                               1960                 :                :         {
                               1961         [ -  + ]:          75242 :             Assert(offset >= SizeOfXLogLongPHD);
                               1962                 :          75242 :             result += offset - SizeOfXLogLongPHD;
                               1963                 :                :         }
                               1964                 :                :     }
                               1965                 :                :     else
                               1966                 :                :     {
                               1967                 :       41468060 :         result = fullsegs * UsableBytesInSegment +
 4141 bruce@momjian.us         1968                 :       41468060 :             (XLOG_BLCKSZ - SizeOfXLogLongPHD) + /* account for first page */
 2999 tgl@sss.pgh.pa.us        1969                 :       41468060 :             (fullpages - 1) * UsableBytesInPage;    /* full pages */
 4443 heikki.linnakangas@i     1970         [ +  + ]:       41468060 :         if (offset > 0)
                               1971                 :                :         {
                               1972         [ -  + ]:       41454786 :             Assert(offset >= SizeOfXLogShortPHD);
                               1973                 :       41454786 :             result += offset - SizeOfXLogShortPHD;
                               1974                 :                :         }
                               1975                 :                :     }
                               1976                 :                : 
                               1977                 :       41544621 :     return result;
                               1978                 :                : }
                               1979                 :                : 
                               1980                 :                : /*
                               1981                 :                :  * Initialize XLOG buffers, writing out old buffers if they still contain
                               1982                 :                :  * unwritten data, upto the page containing 'upto'. Or if 'opportunistic' is
                               1983                 :                :  * true, initialize as many pages as we can without having to write out
                               1984                 :                :  * unwritten data. Any new pages are initialized to zeros, with pages headers
                               1985                 :                :  * initialized properly.
                               1986                 :                :  */
                               1987                 :                : static void
 1401 rhaas@postgresql.org     1988                 :        2238351 : AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, bool opportunistic)
                               1989                 :                : {
 8943 tgl@sss.pgh.pa.us        1990                 :        2238351 :     XLogCtlInsert *Insert = &XLogCtl->Insert;
                               1991                 :                :     int         nextidx;
                               1992                 :                :     XLogRecPtr  OldPageRqstPtr;
                               1993                 :                :     XLogwrtRqst WriteRqst;
 4443 heikki.linnakangas@i     1994                 :        2238351 :     XLogRecPtr  NewPageEndPtr = InvalidXLogRecPtr;
                               1995                 :                :     XLogRecPtr  NewPageBeginPtr;
                               1996                 :                :     XLogPageHeader NewPage;
 1082 tgl@sss.pgh.pa.us        1997                 :        2238351 :     int         npages pg_attribute_unused() = 0;
                               1998                 :                : 
   15 akorotkov@postgresql     1999                 :        2238351 :     LWLockAcquire(WALBufMappingLock, LW_EXCLUSIVE);
                               2000                 :                : 
                               2001                 :                :     /*
                               2002                 :                :      * Now that we have the lock, check if someone initialized the page
                               2003                 :                :      * already.
                               2004                 :                :      */
                               2005   [ +  +  +  + ]:        6575582 :     while (upto >= XLogCtl->InitializedUpTo || opportunistic)
                               2006                 :                :     {
                               2007                 :        4340393 :         nextidx = XLogRecPtrToBufIdx(XLogCtl->InitializedUpTo);
                               2008                 :                : 
                               2009                 :                :         /*
                               2010                 :                :          * Get ending-offset of the buffer page we need to replace (this may
                               2011                 :                :          * be zero if the buffer hasn't been used yet).  Fall through if it's
                               2012                 :                :          * already written out.
                               2013                 :                :          */
                               2014                 :        4340393 :         OldPageRqstPtr = pg_atomic_read_u64(&XLogCtl->xlblocks[nextidx]);
                               2015         [ +  + ]:        4340393 :         if (LogwrtResult.Write < OldPageRqstPtr)
                               2016                 :                :         {
                               2017                 :                :             /*
                               2018                 :                :              * Nope, got work to do. If we just want to pre-initialize as much
                               2019                 :                :              * as we can without flushing, give up now.
                               2020                 :                :              */
                               2021         [ +  + ]:        1996994 :             if (opportunistic)
                               2022                 :           3162 :                 break;
                               2023                 :                : 
                               2024                 :                :             /* Advance shared memory write request position */
 4002 andres@anarazel.de       2025         [ +  + ]:        1993832 :             SpinLockAcquire(&XLogCtl->info_lck);
                               2026         [ +  + ]:        1993832 :             if (XLogCtl->LogwrtRqst.Write < OldPageRqstPtr)
                               2027                 :         501253 :                 XLogCtl->LogwrtRqst.Write = OldPageRqstPtr;
                               2028                 :        1993832 :             SpinLockRelease(&XLogCtl->info_lck);
                               2029                 :                : 
                               2030                 :                :             /*
                               2031                 :                :              * Acquire an up-to-date LogwrtResult value and see if we still
                               2032                 :                :              * need to write it or if someone else already did.
                               2033                 :                :              */
  519 alvherre@alvh.no-ip.     2034                 :        1993832 :             RefreshXLogWriteResult(LogwrtResult);
 4443 heikki.linnakangas@i     2035         [ +  + ]:        1993832 :             if (LogwrtResult.Write < OldPageRqstPtr)
                               2036                 :                :             {
                               2037                 :                :                 /*
                               2038                 :                :                  * Must acquire write lock. Release WALBufMappingLock first,
                               2039                 :                :                  * to make sure that all insertions that we need to wait for
                               2040                 :                :                  * can finish (up to this same position). Otherwise we risk
                               2041                 :                :                  * deadlock.
                               2042                 :                :                  */
   15 akorotkov@postgresql     2043                 :        1979610 :                 LWLockRelease(WALBufMappingLock);
                               2044                 :                : 
 4443 heikki.linnakangas@i     2045                 :        1979610 :                 WaitXLogInsertionsToFinish(OldPageRqstPtr);
                               2046                 :                : 
                               2047                 :        1979610 :                 LWLockAcquire(WALWriteLock, LW_EXCLUSIVE);
                               2048                 :                : 
  521 alvherre@alvh.no-ip.     2049                 :        1979610 :                 RefreshXLogWriteResult(LogwrtResult);
 4443 heikki.linnakangas@i     2050         [ +  + ]:        1979610 :                 if (LogwrtResult.Write >= OldPageRqstPtr)
                               2051                 :                :                 {
                               2052                 :                :                     /* OK, someone wrote it already */
                               2053                 :         115476 :                     LWLockRelease(WALWriteLock);
                               2054                 :                :                 }
                               2055                 :                :                 else
                               2056                 :                :                 {
                               2057                 :                :                     /* Have to write it ourselves */
                               2058                 :                :                     TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_START();
                               2059                 :        1864134 :                     WriteRqst.Write = OldPageRqstPtr;
                               2060                 :        1864134 :                     WriteRqst.Flush = 0;
 1401 rhaas@postgresql.org     2061                 :        1864134 :                     XLogWrite(WriteRqst, tli, false);
 4443 heikki.linnakangas@i     2062                 :        1864134 :                     LWLockRelease(WALWriteLock);
  201 michael@paquier.xyz      2063                 :        1864134 :                     pgWalUsage.wal_buffers_full++;
                               2064                 :                :                     TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_DONE();
                               2065                 :                : 
                               2066                 :                :                     /*
                               2067                 :                :                      * Required for the flush of pending stats WAL data, per
                               2068                 :                :                      * update of pgWalUsage.
                               2069                 :                :                      */
   40                          2070                 :        1864134 :                     pgstat_report_fixed = true;
                               2071                 :                :                 }
                               2072                 :                :                 /* Re-acquire WALBufMappingLock and retry */
   15 akorotkov@postgresql     2073                 :        1979610 :                 LWLockAcquire(WALBufMappingLock, LW_EXCLUSIVE);
                               2074                 :        1979610 :                 continue;
                               2075                 :                :             }
                               2076                 :                :         }
                               2077                 :                : 
                               2078                 :                :         /*
                               2079                 :                :          * Now the next buffer slot is free and we can set it up to be the
                               2080                 :                :          * next output page.
                               2081                 :                :          */
                               2082                 :        2357621 :         NewPageBeginPtr = XLogCtl->InitializedUpTo;
 4443 heikki.linnakangas@i     2083                 :        2357621 :         NewPageEndPtr = NewPageBeginPtr + XLOG_BLCKSZ;
                               2084                 :                : 
   15 akorotkov@postgresql     2085         [ -  + ]:        2357621 :         Assert(XLogRecPtrToBufIdx(NewPageBeginPtr) == nextidx);
                               2086                 :                : 
 4443 heikki.linnakangas@i     2087                 :        2357621 :         NewPage = (XLogPageHeader) (XLogCtl->pages + nextidx * (Size) XLOG_BLCKSZ);
                               2088                 :                : 
                               2089                 :                :         /*
                               2090                 :                :          * Mark the xlblock with InvalidXLogRecPtr and issue a write barrier
                               2091                 :                :          * before initializing. Otherwise, the old page may be partially
                               2092                 :                :          * zeroed but look valid.
                               2093                 :                :          */
  627 jdavis@postgresql.or     2094                 :        2357621 :         pg_atomic_write_u64(&XLogCtl->xlblocks[nextidx], InvalidXLogRecPtr);
                               2095                 :        2357621 :         pg_write_barrier();
                               2096                 :                : 
                               2097                 :                :         /*
                               2098                 :                :          * Be sure to re-zero the buffer so that bytes beyond what we've
                               2099                 :                :          * written will look like zeroes and not valid XLOG records...
                               2100                 :                :          */
  206 peter@eisentraut.org     2101   [ +  -  +  -  :        2357621 :         MemSet(NewPage, 0, XLOG_BLCKSZ);
                                     +  -  -  +  -  
                                                 - ]
                               2102                 :                : 
                               2103                 :                :         /*
                               2104                 :                :          * Fill the new page's header
                               2105                 :                :          */
 3759 bruce@momjian.us         2106                 :        2357621 :         NewPage->xlp_magic = XLOG_PAGE_MAGIC;
                               2107                 :                : 
                               2108                 :                :         /* NewPage->xlp_info = 0; */ /* done by memset */
 1401 rhaas@postgresql.org     2109                 :        2357621 :         NewPage->xlp_tli = tli;
 3759 bruce@momjian.us         2110                 :        2357621 :         NewPage->xlp_pageaddr = NewPageBeginPtr;
                               2111                 :                : 
                               2112                 :                :         /* NewPage->xlp_rem_len = 0; */  /* done by memset */
                               2113                 :                : 
                               2114                 :                :         /*
                               2115                 :                :          * If online backup is not in progress, mark the header to indicate
                               2116                 :                :          * that WAL records beginning in this page have removable backup
                               2117                 :                :          * blocks.  This allows the WAL archiver to know whether it is safe to
                               2118                 :                :          * compress archived WAL data by transforming full-block records into
                               2119                 :                :          * the non-full-block format.  It is sufficient to record this at the
                               2120                 :                :          * page level because we force a page switch (in fact a segment
                               2121                 :                :          * switch) when starting a backup, so the flag will be off before any
                               2122                 :                :          * records can be written during the backup.  At the end of a backup,
                               2123                 :                :          * the last page will be marked as all unsafe when perhaps only part
                               2124                 :                :          * is unsafe, but at worst the archiver would miss the opportunity to
                               2125                 :                :          * compress a few records.
                               2126                 :                :          */
 1053 alvherre@alvh.no-ip.     2127         [ +  + ]:        2357621 :         if (Insert->runningBackups == 0)
 3759 bruce@momjian.us         2128                 :        2233387 :             NewPage->xlp_info |= XLP_BKP_REMOVABLE;
                               2129                 :                : 
                               2130                 :                :         /*
                               2131                 :                :          * If first page of an XLOG segment file, make it a long header.
                               2132                 :                :          */
 2909 andres@anarazel.de       2133         [ +  + ]:        2357621 :         if ((XLogSegmentOffset(NewPage->xlp_pageaddr, wal_segment_size)) == 0)
                               2134                 :                :         {
 4443 heikki.linnakangas@i     2135                 :           1752 :             XLogLongPageHeader NewLongPage = (XLogLongPageHeader) NewPage;
                               2136                 :                : 
                               2137                 :           1752 :             NewLongPage->xlp_sysid = ControlFile->system_identifier;
 2909 andres@anarazel.de       2138                 :           1752 :             NewLongPage->xlp_seg_size = wal_segment_size;
 4443 heikki.linnakangas@i     2139                 :           1752 :             NewLongPage->xlp_xlog_blcksz = XLOG_BLCKSZ;
 3759 bruce@momjian.us         2140                 :           1752 :             NewPage->xlp_info |= XLP_LONG_HEADER;
                               2141                 :                :         }
                               2142                 :                : 
                               2143                 :                :         /*
                               2144                 :                :          * Make sure the initialization of the page becomes visible to others
                               2145                 :                :          * before the xlblocks update. GetXLogBuffer() reads xlblocks without
                               2146                 :                :          * holding a lock.
                               2147                 :                :          */
 4443 heikki.linnakangas@i     2148                 :        2357621 :         pg_write_barrier();
                               2149                 :                : 
  627 jdavis@postgresql.or     2150                 :        2357621 :         pg_atomic_write_u64(&XLogCtl->xlblocks[nextidx], NewPageEndPtr);
   15 akorotkov@postgresql     2151                 :        2357621 :         XLogCtl->InitializedUpTo = NewPageEndPtr;
                               2152                 :                : 
 4443 heikki.linnakangas@i     2153                 :        2357621 :         npages++;
                               2154                 :                :     }
   15 akorotkov@postgresql     2155                 :        2238351 :     LWLockRelease(WALBufMappingLock);
                               2156                 :                : 
                               2157                 :                : #ifdef WAL_DEBUG
                               2158                 :                :     if (XLOG_DEBUG && npages > 0)
                               2159                 :                :     {
                               2160                 :                :         elog(DEBUG1, "initialized %d pages, up to %X/%08X",
                               2161                 :                :              npages, LSN_FORMAT_ARGS(NewPageEndPtr));
                               2162                 :                :     }
                               2163                 :                : #endif
 9476 vadim4o@yahoo.com        2164                 :        2238351 : }
                               2165                 :                : 
                               2166                 :                : /*
                               2167                 :                :  * Calculate CheckPointSegments based on max_wal_size_mb and
                               2168                 :                :  * checkpoint_completion_target.
                               2169                 :                :  */
                               2170                 :                : static void
 3848 heikki.linnakangas@i     2171                 :           7451 : CalculateCheckpointSegments(void)
                               2172                 :                : {
                               2173                 :                :     double      target;
                               2174                 :                : 
                               2175                 :                :     /*-------
                               2176                 :                :      * Calculate the distance at which to trigger a checkpoint, to avoid
                               2177                 :                :      * exceeding max_wal_size_mb. This is based on two assumptions:
                               2178                 :                :      *
                               2179                 :                :      * a) we keep WAL for only one checkpoint cycle (prior to PG11 we kept
                               2180                 :                :      *    WAL for two checkpoint cycles to allow us to recover from the
                               2181                 :                :      *    secondary checkpoint if the first checkpoint failed, though we
                               2182                 :                :      *    only did this on the primary anyway, not on standby. Keeping just
                               2183                 :                :      *    one checkpoint simplifies processing and reduces disk space in
                               2184                 :                :      *    many smaller databases.)
                               2185                 :                :      * b) during checkpoint, we consume checkpoint_completion_target *
                               2186                 :                :      *    number of segments consumed between checkpoints.
                               2187                 :                :      *-------
                               2188                 :                :      */
 2909 andres@anarazel.de       2189                 :           7451 :     target = (double) ConvertToXSegs(max_wal_size_mb, wal_segment_size) /
 2860 simon@2ndQuadrant.co     2190                 :           7451 :         (1.0 + CheckPointCompletionTarget);
                               2191                 :                : 
                               2192                 :                :     /* round down */
 3848 heikki.linnakangas@i     2193                 :           7451 :     CheckPointSegments = (int) target;
                               2194                 :                : 
                               2195         [ +  + ]:           7451 :     if (CheckPointSegments < 1)
                               2196                 :             10 :         CheckPointSegments = 1;
                               2197                 :           7451 : }
                               2198                 :                : 
                               2199                 :                : void
                               2200                 :           5447 : assign_max_wal_size(int newval, void *extra)
                               2201                 :                : {
 3077 simon@2ndQuadrant.co     2202                 :           5447 :     max_wal_size_mb = newval;
 3848 heikki.linnakangas@i     2203                 :           5447 :     CalculateCheckpointSegments();
                               2204                 :           5447 : }
                               2205                 :                : 
                               2206                 :                : void
                               2207                 :           1067 : assign_checkpoint_completion_target(double newval, void *extra)
                               2208                 :                : {
                               2209                 :           1067 :     CheckPointCompletionTarget = newval;
                               2210                 :           1067 :     CalculateCheckpointSegments();
                               2211                 :           1067 : }
                               2212                 :                : 
                               2213                 :                : bool
  740 peter@eisentraut.org     2214                 :           2055 : check_wal_segment_size(int *newval, void **extra, GucSource source)
                               2215                 :                : {
                               2216   [ +  -  +  -  :           2055 :     if (!IsValidWalSegSize(*newval))
                                        +  -  -  + ]
                               2217                 :                :     {
  740 peter@eisentraut.org     2218                 :UBC           0 :         GUC_check_errdetail("The WAL segment size must be a power of two between 1 MB and 1 GB.");
                               2219                 :              0 :         return false;
                               2220                 :                :     }
                               2221                 :                : 
  740 peter@eisentraut.org     2222                 :CBC        2055 :     return true;
                               2223                 :                : }
                               2224                 :                : 
                               2225                 :                : /*
                               2226                 :                :  * At a checkpoint, how many WAL segments to recycle as preallocated future
                               2227                 :                :  * XLOG segments? Returns the highest segment that should be preallocated.
                               2228                 :                :  */
                               2229                 :                : static XLogSegNo
 2089 michael@paquier.xyz      2230                 :           1677 : XLOGfileslop(XLogRecPtr lastredoptr)
                               2231                 :                : {
                               2232                 :                :     XLogSegNo   minSegNo;
                               2233                 :                :     XLogSegNo   maxSegNo;
                               2234                 :                :     double      distance;
                               2235                 :                :     XLogSegNo   recycleSegNo;
                               2236                 :                : 
                               2237                 :                :     /*
                               2238                 :                :      * Calculate the segment numbers that min_wal_size_mb and max_wal_size_mb
                               2239                 :                :      * correspond to. Always recycle enough segments to meet the minimum, and
                               2240                 :                :      * remove enough segments to stay below the maximum.
                               2241                 :                :      */
                               2242                 :           1677 :     minSegNo = lastredoptr / wal_segment_size +
 2909 andres@anarazel.de       2243                 :           1677 :         ConvertToXSegs(min_wal_size_mb, wal_segment_size) - 1;
 2089 michael@paquier.xyz      2244                 :           1677 :     maxSegNo = lastredoptr / wal_segment_size +
 2909 andres@anarazel.de       2245                 :           1677 :         ConvertToXSegs(max_wal_size_mb, wal_segment_size) - 1;
                               2246                 :                : 
                               2247                 :                :     /*
                               2248                 :                :      * Between those limits, recycle enough segments to get us through to the
                               2249                 :                :      * estimated end of next checkpoint.
                               2250                 :                :      *
                               2251                 :                :      * To estimate where the next checkpoint will finish, assume that the
                               2252                 :                :      * system runs steadily consuming CheckPointDistanceEstimate bytes between
                               2253                 :                :      * every checkpoint.
                               2254                 :                :      */
 2860 simon@2ndQuadrant.co     2255                 :           1677 :     distance = (1.0 + CheckPointCompletionTarget) * CheckPointDistanceEstimate;
                               2256                 :                :     /* add 10% for good measure. */
 3848 heikki.linnakangas@i     2257                 :           1677 :     distance *= 1.10;
                               2258                 :                : 
 2089 michael@paquier.xyz      2259                 :           1677 :     recycleSegNo = (XLogSegNo) ceil(((double) lastredoptr + distance) /
                               2260                 :                :                                     wal_segment_size);
                               2261                 :                : 
 3848 heikki.linnakangas@i     2262         [ +  + ]:           1677 :     if (recycleSegNo < minSegNo)
                               2263                 :           1174 :         recycleSegNo = minSegNo;
                               2264         [ +  + ]:           1677 :     if (recycleSegNo > maxSegNo)
                               2265                 :            393 :         recycleSegNo = maxSegNo;
                               2266                 :                : 
                               2267                 :           1677 :     return recycleSegNo;
                               2268                 :                : }
                               2269                 :                : 
                               2270                 :                : /*
                               2271                 :                :  * Check whether we've consumed enough xlog space that a checkpoint is needed.
                               2272                 :                :  *
                               2273                 :                :  * new_segno indicates a log file that has just been filled up (or read
                               2274                 :                :  * during recovery). We measure the distance from RedoRecPtr to new_segno
                               2275                 :                :  * and see if that exceeds CheckPointSegments.
                               2276                 :                :  *
                               2277                 :                :  * Note: it is caller's responsibility that RedoRecPtr is up-to-date.
                               2278                 :                :  */
                               2279                 :                : bool
 4822                          2280                 :           4606 : XLogCheckpointNeeded(XLogSegNo new_segno)
                               2281                 :                : {
                               2282                 :                :     XLogSegNo   old_segno;
                               2283                 :                : 
 2909 andres@anarazel.de       2284                 :           4606 :     XLByteToSeg(RedoRecPtr, old_segno, wal_segment_size);
                               2285                 :                : 
 4822 heikki.linnakangas@i     2286         [ +  + ]:           4606 :     if (new_segno >= old_segno + (uint64) (CheckPointSegments - 1))
 6539 tgl@sss.pgh.pa.us        2287                 :           2894 :         return true;
                               2288                 :           1712 :     return false;
                               2289                 :                : }
                               2290                 :                : 
                               2291                 :                : /*
                               2292                 :                :  * Write and/or fsync the log at least as far as WriteRqst indicates.
                               2293                 :                :  *
                               2294                 :                :  * If flexible == true, we don't have to write as far as WriteRqst, but
                               2295                 :                :  * may stop at any convenient boundary (such as a cache or logfile boundary).
                               2296                 :                :  * This option allows us to avoid uselessly issuing multiple writes when a
                               2297                 :                :  * single one would do.
                               2298                 :                :  *
                               2299                 :                :  * Must be called with WALWriteLock held. WaitXLogInsertionsToFinish(WriteRqst)
                               2300                 :                :  * must be called before grabbing the lock, to make sure the data is ready to
                               2301                 :                :  * write.
                               2302                 :                :  */
                               2303                 :                : static void
 1401 rhaas@postgresql.org     2304                 :        1990750 : XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
                               2305                 :                : {
                               2306                 :                :     bool        ispartialpage;
                               2307                 :                :     bool        last_iteration;
                               2308                 :                :     bool        finishing_seg;
                               2309                 :                :     int         curridx;
                               2310                 :                :     int         npages;
                               2311                 :                :     int         startidx;
                               2312                 :                :     uint32      startoffset;
                               2313                 :                : 
                               2314                 :                :     /* We should always be inside a critical section here */
 7449 tgl@sss.pgh.pa.us        2315         [ -  + ]:        1990750 :     Assert(CritSectionCount > 0);
                               2316                 :                : 
                               2317                 :                :     /*
                               2318                 :                :      * Update local LogwrtResult (caller probably did this already, but...)
                               2319                 :                :      */
  521 alvherre@alvh.no-ip.     2320                 :        1990750 :     RefreshXLogWriteResult(LogwrtResult);
                               2321                 :                : 
                               2322                 :                :     /*
                               2323                 :                :      * Since successive pages in the xlog cache are consecutively allocated,
                               2324                 :                :      * we can usually gather multiple pages together and issue just one
                               2325                 :                :      * write() call.  npages is the number of pages we have determined can be
                               2326                 :                :      * written together; startidx is the cache block index of the first one,
                               2327                 :                :      * and startoffset is the file offset at which it should go. The latter
                               2328                 :                :      * two variables are only valid when npages > 0, but we must initialize
                               2329                 :                :      * all of them to keep the compiler quiet.
                               2330                 :                :      */
 7320 tgl@sss.pgh.pa.us        2331                 :        1990750 :     npages = 0;
                               2332                 :        1990750 :     startidx = 0;
                               2333                 :        1990750 :     startoffset = 0;
                               2334                 :                : 
                               2335                 :                :     /*
                               2336                 :                :      * Within the loop, curridx is the cache block index of the page to
                               2337                 :                :      * consider writing.  Begin at the buffer containing the next unwritten
                               2338                 :                :      * page, or last partially written page.
                               2339                 :                :      */
 4434 heikki.linnakangas@i     2340                 :        1990750 :     curridx = XLogRecPtrToBufIdx(LogwrtResult.Write);
                               2341                 :                : 
 4635 alvherre@alvh.no-ip.     2342         [ +  + ]:        4294038 :     while (LogwrtResult.Write < WriteRqst.Write)
                               2343                 :                :     {
                               2344                 :                :         /*
                               2345                 :                :          * Make sure we're not ahead of the insert process.  This could happen
                               2346                 :                :          * if we're passed a bogus WriteRqst.Write that is past the end of the
                               2347                 :                :          * last page that's been initialized by AdvanceXLInsertBuffer.
                               2348                 :                :          */
  627 jdavis@postgresql.or     2349                 :        2424893 :         XLogRecPtr  EndPtr = pg_atomic_read_u64(&XLogCtl->xlblocks[curridx]);
                               2350                 :                : 
 4443 heikki.linnakangas@i     2351         [ -  + ]:        2424893 :         if (LogwrtResult.Write >= EndPtr)
   61 alvherre@kurilemu.de     2352         [ #  # ]:UNC           0 :             elog(PANIC, "xlog write request %X/%08X is past end of log %X/%08X",
                               2353                 :                :                  LSN_FORMAT_ARGS(LogwrtResult.Write),
                               2354                 :                :                  LSN_FORMAT_ARGS(EndPtr));
                               2355                 :                : 
                               2356                 :                :         /* Advance LogwrtResult.Write to end of current buffer page */
 4443 heikki.linnakangas@i     2357                 :CBC     2424893 :         LogwrtResult.Write = EndPtr;
 4635 alvherre@alvh.no-ip.     2358                 :        2424893 :         ispartialpage = WriteRqst.Write < LogwrtResult.Write;
                               2359                 :                : 
 2909 andres@anarazel.de       2360         [ +  + ]:        2424893 :         if (!XLByteInPrevSeg(LogwrtResult.Write, openLogSegNo,
                               2361                 :                :                              wal_segment_size))
                               2362                 :                :         {
                               2363                 :                :             /*
                               2364                 :                :              * Switch to new logfile segment.  We cannot have any pending
                               2365                 :                :              * pages here (since we dump what we have at segment end).
                               2366                 :                :              */
 7320 tgl@sss.pgh.pa.us        2367         [ -  + ]:          12708 :             Assert(npages == 0);
 8943                          2368         [ +  + ]:          12708 :             if (openLogFile >= 0)
 7023 bruce@momjian.us         2369                 :           6017 :                 XLogFileClose();
 2909 andres@anarazel.de       2370                 :          12708 :             XLByteToPrevSeg(LogwrtResult.Write, openLogSegNo,
                               2371                 :                :                             wal_segment_size);
 1401 rhaas@postgresql.org     2372                 :          12708 :             openLogTLI = tli;
                               2373                 :                : 
                               2374                 :                :             /* create/use new log file */
                               2375                 :          12708 :             openLogFile = XLogFileInit(openLogSegNo, tli);
 2021 tgl@sss.pgh.pa.us        2376                 :          12708 :             ReserveExternalFD();
                               2377                 :                :         }
                               2378                 :                : 
                               2379                 :                :         /* Make sure we have the current logfile open */
 8943                          2380         [ -  + ]:        2424893 :         if (openLogFile < 0)
                               2381                 :                :         {
 2909 andres@anarazel.de       2382                 :UBC           0 :             XLByteToPrevSeg(LogwrtResult.Write, openLogSegNo,
                               2383                 :                :                             wal_segment_size);
 1401 rhaas@postgresql.org     2384                 :              0 :             openLogTLI = tli;
                               2385                 :              0 :             openLogFile = XLogFileOpen(openLogSegNo, tli);
 2021 tgl@sss.pgh.pa.us        2386                 :              0 :             ReserveExternalFD();
                               2387                 :                :         }
                               2388                 :                : 
                               2389                 :                :         /* Add current page to the set of pending pages-to-dump */
 7320 tgl@sss.pgh.pa.us        2390         [ +  + ]:CBC     2424893 :         if (npages == 0)
                               2391                 :                :         {
                               2392                 :                :             /* first of group */
                               2393                 :        2003075 :             startidx = curridx;
 2909 andres@anarazel.de       2394                 :        2003075 :             startoffset = XLogSegmentOffset(LogwrtResult.Write - XLOG_BLCKSZ,
                               2395                 :                :                                             wal_segment_size);
                               2396                 :                :         }
 7320 tgl@sss.pgh.pa.us        2397                 :        2424893 :         npages++;
                               2398                 :                : 
                               2399                 :                :         /*
                               2400                 :                :          * Dump the set if this will be the last loop iteration, or if we are
                               2401                 :                :          * at the last page of the cache area (since the next page won't be
                               2402                 :                :          * contiguous in memory), or if we are at the end of the logfile
                               2403                 :                :          * segment.
                               2404                 :                :          */
 4635 alvherre@alvh.no-ip.     2405                 :        2424893 :         last_iteration = WriteRqst.Write <= LogwrtResult.Write;
                               2406                 :                : 
 7320 tgl@sss.pgh.pa.us        2407         [ +  + ]:        4730598 :         finishing_seg = !ispartialpage &&
 2909 andres@anarazel.de       2408         [ +  + ]:        2305705 :             (startoffset + npages * XLOG_BLCKSZ) >= wal_segment_size;
                               2409                 :                : 
 6971 tgl@sss.pgh.pa.us        2410         [ +  + ]:        2424893 :         if (last_iteration ||
 7320                          2411   [ +  +  -  + ]:         436394 :             curridx == XLogCtl->XLogCacheBlck ||
                               2412                 :                :             finishing_seg)
                               2413                 :                :         {
                               2414                 :                :             char       *from;
                               2415                 :                :             Size        nbytes;
                               2416                 :                :             Size        nleft;
                               2417                 :                :             ssize_t     written;
                               2418                 :                :             instr_time  start;
                               2419                 :                : 
                               2420                 :                :             /* OK to write the page(s) */
 7096                          2421                 :        2003075 :             from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ;
                               2422                 :        2003075 :             nbytes = npages * (Size) XLOG_BLCKSZ;
 4450 heikki.linnakangas@i     2423                 :        2003075 :             nleft = nbytes;
                               2424                 :                :             do
                               2425                 :                :             {
                               2426                 :        2003075 :                 errno = 0;
                               2427                 :                : 
                               2428                 :                :                 /*
                               2429                 :                :                  * Measure I/O timing to write WAL data, for pg_stat_io.
                               2430                 :                :                  */
  192 michael@paquier.xyz      2431                 :        2003075 :                 start = pgstat_prepare_io_time(track_wal_io_timing);
                               2432                 :                : 
 3094 rhaas@postgresql.org     2433                 :        2003075 :                 pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE);
 1073 tmunro@postgresql.or     2434                 :        2003075 :                 written = pg_pwrite(openLogFile, from, nleft, startoffset);
 3094 rhaas@postgresql.org     2435                 :        2003075 :                 pgstat_report_wait_end();
                               2436                 :                : 
  214 michael@paquier.xyz      2437                 :        2003075 :                 pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL,
                               2438                 :                :                                         IOOP_WRITE, start, 1, written);
                               2439                 :                : 
 4450 heikki.linnakangas@i     2440         [ -  + ]:        2003075 :                 if (written <= 0)
                               2441                 :                :                 {
                               2442                 :                :                     char        xlogfname[MAXFNAMELEN];
                               2443                 :                :                     int         save_errno;
                               2444                 :                : 
 4450 heikki.linnakangas@i     2445         [ #  # ]:UBC           0 :                     if (errno == EINTR)
                               2446                 :              0 :                         continue;
                               2447                 :                : 
 2104 michael@paquier.xyz      2448                 :              0 :                     save_errno = errno;
 1401 rhaas@postgresql.org     2449                 :              0 :                     XLogFileName(xlogfname, tli, openLogSegNo,
                               2450                 :                :                                  wal_segment_size);
 2104 michael@paquier.xyz      2451                 :              0 :                     errno = save_errno;
 4450 heikki.linnakangas@i     2452         [ #  # ]:              0 :                     ereport(PANIC,
                               2453                 :                :                             (errcode_for_file_access(),
                               2454                 :                :                              errmsg("could not write to log file \"%s\" at offset %u, length %zu: %m",
                               2455                 :                :                                     xlogfname, startoffset, nleft)));
                               2456                 :                :                 }
 4450 heikki.linnakangas@i     2457                 :CBC     2003075 :                 nleft -= written;
                               2458                 :        2003075 :                 from += written;
 2495 tmunro@postgresql.or     2459                 :        2003075 :                 startoffset += written;
 4450 heikki.linnakangas@i     2460         [ -  + ]:        2003075 :             } while (nleft > 0);
                               2461                 :                : 
 7320 tgl@sss.pgh.pa.us        2462                 :        2003075 :             npages = 0;
                               2463                 :                : 
                               2464                 :                :             /*
                               2465                 :                :              * If we just wrote the whole last page of a logfile segment,
                               2466                 :                :              * fsync the segment immediately.  This avoids having to go back
                               2467                 :                :              * and re-open prior segments when an fsync request comes along
                               2468                 :                :              * later. Doing it here ensures that one and only one backend will
                               2469                 :                :              * perform this fsync.
                               2470                 :                :              *
                               2471                 :                :              * This is also the right place to notify the Archiver that the
                               2472                 :                :              * segment is ready to copy to archival storage, and to update the
                               2473                 :                :              * timer for archive_timeout, and to signal for a checkpoint if
                               2474                 :                :              * too many logfile segments have been used since the last
                               2475                 :                :              * checkpoint.
                               2476                 :                :              */
 4443 heikki.linnakangas@i     2477         [ +  + ]:        2003075 :             if (finishing_seg)
                               2478                 :                :             {
 1401 rhaas@postgresql.org     2479                 :           1843 :                 issue_xlog_fsync(openLogFile, openLogSegNo, tli);
                               2480                 :                : 
                               2481                 :                :                 /* signal that we need to wakeup walsenders later */
 4814                          2482                 :           1843 :                 WalSndWakeupRequest();
                               2483                 :                : 
 2999 tgl@sss.pgh.pa.us        2484                 :           1843 :                 LogwrtResult.Flush = LogwrtResult.Write;    /* end of page */
                               2485                 :                : 
 7320                          2486   [ +  +  -  +  :           1843 :                 if (XLogArchivingActive())
                                              +  + ]
 1401 rhaas@postgresql.org     2487                 :            401 :                     XLogArchiveNotifySeg(openLogSegNo, tli);
                               2488                 :                : 
 4434 heikki.linnakangas@i     2489                 :           1843 :                 XLogCtl->lastSegSwitchTime = (pg_time_t) time(NULL);
 3180 andres@anarazel.de       2490                 :           1843 :                 XLogCtl->lastSegSwitchLSN = LogwrtResult.Flush;
                               2491                 :                : 
                               2492                 :                :                 /*
                               2493                 :                :                  * Request a checkpoint if we've consumed too much xlog since
                               2494                 :                :                  * the last one.  For speed, we first check using the local
                               2495                 :                :                  * copy of RedoRecPtr, which might be out of date; if it looks
                               2496                 :                :                  * like a checkpoint is needed, forcibly update RedoRecPtr and
                               2497                 :                :                  * recheck.
                               2498                 :                :                  */
 4822 heikki.linnakangas@i     2499   [ +  +  +  + ]:           1843 :                 if (IsUnderPostmaster && XLogCheckpointNeeded(openLogSegNo))
                               2500                 :                :                 {
 6539 tgl@sss.pgh.pa.us        2501                 :            234 :                     (void) GetRedoRecPtr();
 4822 heikki.linnakangas@i     2502         [ +  + ]:            234 :                     if (XLogCheckpointNeeded(openLogSegNo))
 6643 tgl@sss.pgh.pa.us        2503                 :            190 :                         RequestCheckpoint(CHECKPOINT_CAUSE_XLOG);
                               2504                 :                :                 }
                               2505                 :                :             }
                               2506                 :                :         }
                               2507                 :                : 
 8943                          2508         [ +  + ]:        2424893 :         if (ispartialpage)
                               2509                 :                :         {
                               2510                 :                :             /* Only asked to write a partial page */
                               2511                 :         119188 :             LogwrtResult.Write = WriteRqst.Write;
                               2512                 :         119188 :             break;
                               2513                 :                :         }
 7320                          2514         [ +  + ]:        2305705 :         curridx = NextBufIdx(curridx);
                               2515                 :                : 
                               2516                 :                :         /* If flexible, break out of loop as soon as we wrote something */
                               2517   [ +  +  +  + ]:        2305705 :         if (flexible && npages == 0)
                               2518                 :           2417 :             break;
                               2519                 :                :     }
                               2520                 :                : 
                               2521         [ -  + ]:        1990750 :     Assert(npages == 0);
                               2522                 :                : 
                               2523                 :                :     /*
                               2524                 :                :      * If asked to flush, do so
                               2525                 :                :      */
 4635 alvherre@alvh.no-ip.     2526         [ +  + ]:        1990750 :     if (LogwrtResult.Flush < WriteRqst.Flush &&
                               2527         [ +  + ]:         125959 :         LogwrtResult.Flush < LogwrtResult.Write)
                               2528                 :                :     {
                               2529                 :                :         /*
                               2530                 :                :          * Could get here without iterating above loop, in which case we might
                               2531                 :                :          * have no open file or the wrong one.  However, we do not need to
                               2532                 :                :          * fsync more than one file.
                               2533                 :                :          */
  694 nathan@postgresql.or     2534         [ +  - ]:         125894 :         if (wal_sync_method != WAL_SYNC_METHOD_OPEN &&
                               2535         [ +  - ]:         125894 :             wal_sync_method != WAL_SYNC_METHOD_OPEN_DSYNC)
                               2536                 :                :         {
 8940 tgl@sss.pgh.pa.us        2537         [ +  + ]:         125894 :             if (openLogFile >= 0 &&
 2909 andres@anarazel.de       2538         [ +  + ]:         125873 :                 !XLByteInPrevSeg(LogwrtResult.Write, openLogSegNo,
                               2539                 :                :                                  wal_segment_size))
 7023 bruce@momjian.us         2540                 :            120 :                 XLogFileClose();
 8940 tgl@sss.pgh.pa.us        2541         [ +  + ]:         125894 :             if (openLogFile < 0)
                               2542                 :                :             {
 2909 andres@anarazel.de       2543                 :            141 :                 XLByteToPrevSeg(LogwrtResult.Write, openLogSegNo,
                               2544                 :                :                                 wal_segment_size);
 1401 rhaas@postgresql.org     2545                 :            141 :                 openLogTLI = tli;
                               2546                 :            141 :                 openLogFile = XLogFileOpen(openLogSegNo, tli);
 2021 tgl@sss.pgh.pa.us        2547                 :            141 :                 ReserveExternalFD();
                               2548                 :                :             }
                               2549                 :                : 
 1401 rhaas@postgresql.org     2550                 :         125894 :             issue_xlog_fsync(openLogFile, openLogSegNo, tli);
                               2551                 :                :         }
                               2552                 :                : 
                               2553                 :                :         /* signal that we need to wakeup walsenders later */
 4814                          2554                 :         125894 :         WalSndWakeupRequest();
                               2555                 :                : 
 8943 tgl@sss.pgh.pa.us        2556                 :         125894 :         LogwrtResult.Flush = LogwrtResult.Write;
                               2557                 :                :     }
                               2558                 :                : 
                               2559                 :                :     /*
                               2560                 :                :      * Update shared-memory status
                               2561                 :                :      *
                               2562                 :                :      * We make sure that the shared 'request' values do not fall behind the
                               2563                 :                :      * 'result' values.  This is not absolutely essential, but it saves some
                               2564                 :                :      * code in a couple of places.
                               2565                 :                :      */
  519 alvherre@alvh.no-ip.     2566         [ +  + ]:        1990750 :     SpinLockAcquire(&XLogCtl->info_lck);
                               2567         [ +  + ]:        1990750 :     if (XLogCtl->LogwrtRqst.Write < LogwrtResult.Write)
                               2568                 :         111719 :         XLogCtl->LogwrtRqst.Write = LogwrtResult.Write;
                               2569         [ +  + ]:        1990750 :     if (XLogCtl->LogwrtRqst.Flush < LogwrtResult.Flush)
                               2570                 :         127342 :         XLogCtl->LogwrtRqst.Flush = LogwrtResult.Flush;
                               2571                 :        1990750 :     SpinLockRelease(&XLogCtl->info_lck);
                               2572                 :                : 
                               2573                 :                :     /*
                               2574                 :                :      * We write Write first, bar, then Flush.  When reading, the opposite must
                               2575                 :                :      * be done (with a matching barrier in between), so that we always see a
                               2576                 :                :      * Flush value that trails behind the Write value seen.
                               2577                 :                :      */
                               2578                 :        1990750 :     pg_atomic_write_u64(&XLogCtl->logWriteResult, LogwrtResult.Write);
                               2579                 :        1990750 :     pg_write_barrier();
                               2580                 :        1990750 :     pg_atomic_write_u64(&XLogCtl->logFlushResult, LogwrtResult.Flush);
                               2581                 :                : 
                               2582                 :                : #ifdef USE_ASSERT_CHECKING
                               2583                 :                :     {
                               2584                 :                :         XLogRecPtr  Flush;
                               2585                 :                :         XLogRecPtr  Write;
                               2586                 :                :         XLogRecPtr  Insert;
                               2587                 :                : 
                               2588                 :        1990750 :         Flush = pg_atomic_read_u64(&XLogCtl->logFlushResult);
                               2589                 :        1990750 :         pg_read_barrier();
                               2590                 :        1990750 :         Write = pg_atomic_read_u64(&XLogCtl->logWriteResult);
  517                          2591                 :        1990750 :         pg_read_barrier();
                               2592                 :        1990750 :         Insert = pg_atomic_read_u64(&XLogCtl->logInsertResult);
                               2593                 :                : 
                               2594                 :                :         /* WAL written to disk is always ahead of WAL flushed */
  519                          2595         [ -  + ]:        1990750 :         Assert(Write >= Flush);
                               2596                 :                : 
                               2597                 :                :         /* WAL inserted to buffers is always ahead of WAL written */
  517                          2598         [ -  + ]:        1990750 :         Assert(Insert >= Write);
                               2599                 :                :     }
                               2600                 :                : #endif
 8943 tgl@sss.pgh.pa.us        2601                 :        1990750 : }
                               2602                 :                : 
                               2603                 :                : /*
                               2604                 :                :  * Record the LSN for an asynchronous transaction commit/abort
                               2605                 :                :  * and nudge the WALWriter if there is work for it to do.
                               2606                 :                :  * (This should not be called for synchronous commits.)
                               2607                 :                :  */
                               2608                 :                : void
 5518 simon@2ndQuadrant.co     2609                 :          29263 : XLogSetAsyncXactLSN(XLogRecPtr asyncXactLSN)
                               2610                 :                : {
 5046                          2611                 :          29263 :     XLogRecPtr  WriteRqstPtr = asyncXactLSN;
                               2612                 :                :     bool        sleeping;
  649 heikki.linnakangas@i     2613                 :          29263 :     bool        wakeup = false;
                               2614                 :                :     XLogRecPtr  prevAsyncXactLSN;
                               2615                 :                : 
 4002 andres@anarazel.de       2616         [ +  + ]:          29263 :     SpinLockAcquire(&XLogCtl->info_lck);
                               2617                 :          29263 :     sleeping = XLogCtl->WalWriterSleeping;
  649 heikki.linnakangas@i     2618                 :          29263 :     prevAsyncXactLSN = XLogCtl->asyncXactLSN;
 4002 andres@anarazel.de       2619         [ +  + ]:          29263 :     if (XLogCtl->asyncXactLSN < asyncXactLSN)
                               2620                 :          28799 :         XLogCtl->asyncXactLSN = asyncXactLSN;
                               2621                 :          29263 :     SpinLockRelease(&XLogCtl->info_lck);
                               2622                 :                : 
                               2623                 :                :     /*
                               2624                 :                :      * If somebody else already called this function with a more aggressive
                               2625                 :                :      * LSN, they will have done what we needed (and perhaps more).
                               2626                 :                :      */
  649 heikki.linnakangas@i     2627         [ +  + ]:          29263 :     if (asyncXactLSN <= prevAsyncXactLSN)
                               2628                 :            464 :         return;
                               2629                 :                : 
                               2630                 :                :     /*
                               2631                 :                :      * If the WALWriter is sleeping, kick it to make it come out of low-power
                               2632                 :                :      * mode, so that this async commit will reach disk within the expected
                               2633                 :                :      * amount of time.  Otherwise, determine whether it has enough WAL
                               2634                 :                :      * available to flush, the same way that XLogBackgroundFlush() does.
                               2635                 :                :      */
                               2636         [ +  + ]:          28799 :     if (sleeping)
                               2637                 :              8 :         wakeup = true;
                               2638                 :                :     else
                               2639                 :                :     {
                               2640                 :                :         int         flushblocks;
                               2641                 :                : 
  519 alvherre@alvh.no-ip.     2642                 :          28791 :         RefreshXLogWriteResult(LogwrtResult);
                               2643                 :                : 
  649 heikki.linnakangas@i     2644                 :          28791 :         flushblocks =
                               2645                 :          28791 :             WriteRqstPtr / XLOG_BLCKSZ - LogwrtResult.Flush / XLOG_BLCKSZ;
                               2646                 :                : 
                               2647   [ +  -  +  + ]:          28791 :         if (WalWriterFlushAfter == 0 || flushblocks >= WalWriterFlushAfter)
                               2648                 :           3701 :             wakeup = true;
                               2649                 :                :     }
                               2650                 :                : 
  309                          2651         [ +  + ]:          28799 :     if (wakeup)
                               2652                 :                :     {
                               2653                 :           3709 :         volatile PROC_HDR *procglobal = ProcGlobal;
                               2654                 :           3709 :         ProcNumber  walwriterProc = procglobal->walwriterProc;
                               2655                 :                : 
                               2656         [ +  + ]:           3709 :         if (walwriterProc != INVALID_PROC_NUMBER)
                               2657                 :            181 :             SetLatch(&GetPGProcByNumber(walwriterProc)->procLatch);
                               2658                 :                :     }
                               2659                 :                : }
                               2660                 :                : 
                               2661                 :                : /*
                               2662                 :                :  * Record the LSN up to which we can remove WAL because it's not required by
                               2663                 :                :  * any replication slot.
                               2664                 :                :  */
                               2665                 :                : void
 4236 rhaas@postgresql.org     2666                 :          64526 : XLogSetReplicationSlotMinimumLSN(XLogRecPtr lsn)
                               2667                 :                : {
 4002 andres@anarazel.de       2668         [ +  + ]:          64526 :     SpinLockAcquire(&XLogCtl->info_lck);
                               2669                 :          64526 :     XLogCtl->replicationSlotMinLSN = lsn;
                               2670                 :          64526 :     SpinLockRelease(&XLogCtl->info_lck);
 4236 rhaas@postgresql.org     2671                 :          64526 : }
                               2672                 :                : 
                               2673                 :                : 
                               2674                 :                : /*
                               2675                 :                :  * Return the oldest LSN we must retain to satisfy the needs of some
                               2676                 :                :  * replication slot.
                               2677                 :                :  */
                               2678                 :                : static XLogRecPtr
                               2679                 :           2042 : XLogGetReplicationSlotMinimumLSN(void)
                               2680                 :                : {
                               2681                 :                :     XLogRecPtr  retval;
                               2682                 :                : 
 4002 andres@anarazel.de       2683         [ -  + ]:           2042 :     SpinLockAcquire(&XLogCtl->info_lck);
                               2684                 :           2042 :     retval = XLogCtl->replicationSlotMinLSN;
                               2685                 :           2042 :     SpinLockRelease(&XLogCtl->info_lck);
                               2686                 :                : 
 4236 rhaas@postgresql.org     2687                 :           2042 :     return retval;
                               2688                 :                : }
                               2689                 :                : 
                               2690                 :                : /*
                               2691                 :                :  * Advance minRecoveryPoint in control file.
                               2692                 :                :  *
                               2693                 :                :  * If we crash during recovery, we must reach this point again before the
                               2694                 :                :  * database is consistent.
                               2695                 :                :  *
                               2696                 :                :  * If 'force' is true, 'lsn' argument is ignored. Otherwise, minRecoveryPoint
                               2697                 :                :  * is only updated if it's not already greater than or equal to 'lsn'.
                               2698                 :                :  */
                               2699                 :                : static void
 6044 heikki.linnakangas@i     2700                 :         104889 : UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force)
                               2701                 :                : {
                               2702                 :                :     /* Quick check using our local copy of the variable */
 1298                          2703   [ +  +  +  +  :         104889 :     if (!updateMinRecoveryPoint || (!force && lsn <= LocalMinRecoveryPoint))
                                              +  + ]
 6044                          2704                 :          98199 :         return;
                               2705                 :                : 
                               2706                 :                :     /*
                               2707                 :                :      * An invalid minRecoveryPoint means that we need to recover all the WAL,
                               2708                 :                :      * i.e., we're doing crash recovery.  We never modify the control file's
                               2709                 :                :      * value in that case, so we can short-circuit future checks here too. The
                               2710                 :                :      * local values of minRecoveryPoint and minRecoveryPointTLI should not be
                               2711                 :                :      * updated until crash recovery finishes.  We only do this for the startup
                               2712                 :                :      * process as it should not update its own reference of minRecoveryPoint
                               2713                 :                :      * until it has finished crash recovery to make sure that all WAL
                               2714                 :                :      * available is replayed in this case.  This also saves from extra locks
                               2715                 :                :      * taken on the control file from the startup process.
                               2716                 :                :      */
 1298                          2717   [ +  +  +  + ]:           6690 :     if (XLogRecPtrIsInvalid(LocalMinRecoveryPoint) && InRecovery)
                               2718                 :                :     {
 2620 michael@paquier.xyz      2719                 :             31 :         updateMinRecoveryPoint = false;
                               2720                 :             31 :         return;
                               2721                 :                :     }
                               2722                 :                : 
 6044 heikki.linnakangas@i     2723                 :           6659 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               2724                 :                : 
                               2725                 :                :     /* update local copy */
 1298                          2726                 :           6659 :     LocalMinRecoveryPoint = ControlFile->minRecoveryPoint;
                               2727                 :           6659 :     LocalMinRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
                               2728                 :                : 
                               2729         [ +  + ]:           6659 :     if (XLogRecPtrIsInvalid(LocalMinRecoveryPoint))
 2563 michael@paquier.xyz      2730                 :              1 :         updateMinRecoveryPoint = false;
 1298 heikki.linnakangas@i     2731   [ +  +  +  + ]:           6658 :     else if (force || LocalMinRecoveryPoint < lsn)
                               2732                 :                :     {
                               2733                 :                :         XLogRecPtr  newMinRecoveryPoint;
                               2734                 :                :         TimeLineID  newMinRecoveryPointTLI;
                               2735                 :                : 
                               2736                 :                :         /*
                               2737                 :                :          * To avoid having to update the control file too often, we update it
                               2738                 :                :          * all the way to the last record being replayed, even though 'lsn'
                               2739                 :                :          * would suffice for correctness.  This also allows the 'force' case
                               2740                 :                :          * to not need a valid 'lsn' value.
                               2741                 :                :          *
                               2742                 :                :          * Another important reason for doing it this way is that the passed
                               2743                 :                :          * 'lsn' value could be bogus, i.e., past the end of available WAL, if
                               2744                 :                :          * the caller got it from a corrupted heap page.  Accepting such a
                               2745                 :                :          * value as the min recovery point would prevent us from coming up at
                               2746                 :                :          * all.  Instead, we just log a warning and continue with recovery.
                               2747                 :                :          * (See also the comments about corrupt LSNs in XLogFlush.)
                               2748                 :                :          */
                               2749                 :           5307 :         newMinRecoveryPoint = GetCurrentReplayRecPtr(&newMinRecoveryPointTLI);
 4635 alvherre@alvh.no-ip.     2750   [ +  +  -  + ]:           5307 :         if (!force && newMinRecoveryPoint < lsn)
 5916 tgl@sss.pgh.pa.us        2751         [ #  # ]:UBC           0 :             elog(WARNING,
                               2752                 :                :                  "xlog min recovery request %X/%08X is past current point %X/%08X",
                               2753                 :                :                  LSN_FORMAT_ARGS(lsn), LSN_FORMAT_ARGS(newMinRecoveryPoint));
                               2754                 :                : 
                               2755                 :                :         /* update control file */
 4635 alvherre@alvh.no-ip.     2756         [ +  + ]:CBC        5307 :         if (ControlFile->minRecoveryPoint < newMinRecoveryPoint)
                               2757                 :                :         {
 6044 heikki.linnakangas@i     2758                 :           4961 :             ControlFile->minRecoveryPoint = newMinRecoveryPoint;
 4659                          2759                 :           4961 :             ControlFile->minRecoveryPointTLI = newMinRecoveryPointTLI;
 6044                          2760                 :           4961 :             UpdateControlFile();
 1298                          2761                 :           4961 :             LocalMinRecoveryPoint = newMinRecoveryPoint;
                               2762                 :           4961 :             LocalMinRecoveryPointTLI = newMinRecoveryPointTLI;
                               2763                 :                : 
 6044                          2764         [ +  + ]:           4961 :             ereport(DEBUG2,
                               2765                 :                :                     errmsg_internal("updated min recovery point to %X/%08X on timeline %u",
                               2766                 :                :                                     LSN_FORMAT_ARGS(newMinRecoveryPoint),
                               2767                 :                :                                     newMinRecoveryPointTLI));
                               2768                 :                :         }
                               2769                 :                :     }
                               2770                 :           6659 :     LWLockRelease(ControlFileLock);
                               2771                 :                : }
                               2772                 :                : 
                               2773                 :                : /*
                               2774                 :                :  * Ensure that all XLOG data through the given position is flushed to disk.
                               2775                 :                :  *
                               2776                 :                :  * NOTE: this differs from XLogWrite mainly in that the WALWriteLock is not
                               2777                 :                :  * already held, and we try to avoid acquiring it if possible.
                               2778                 :                :  */
                               2779                 :                : void
 8943 tgl@sss.pgh.pa.us        2780                 :         677629 : XLogFlush(XLogRecPtr record)
                               2781                 :                : {
                               2782                 :                :     XLogRecPtr  WriteRqstPtr;
                               2783                 :                :     XLogwrtRqst WriteRqst;
 1396 rhaas@postgresql.org     2784                 :         677629 :     TimeLineID  insertTLI = XLogCtl->InsertTimeLineID;
                               2785                 :                : 
                               2786                 :                :     /*
                               2787                 :                :      * During REDO, we are reading not writing WAL.  Therefore, instead of
                               2788                 :                :      * trying to flush the WAL, we should update minRecoveryPoint instead. We
                               2789                 :                :      * test XLogInsertAllowed(), not InRecovery, because we need checkpointer
                               2790                 :                :      * to act this way too, and because when it tries to write the
                               2791                 :                :      * end-of-recovery checkpoint, it should indeed flush.
                               2792                 :                :      */
 5916 tgl@sss.pgh.pa.us        2793         [ +  + ]:         677629 :     if (!XLogInsertAllowed())
                               2794                 :                :     {
 6044 heikki.linnakangas@i     2795                 :         104452 :         UpdateMinRecoveryPoint(record, false);
 8943 tgl@sss.pgh.pa.us        2796                 :         540924 :         return;
                               2797                 :                :     }
                               2798                 :                : 
                               2799                 :                :     /* Quick exit if already known flushed */
 4635 alvherre@alvh.no-ip.     2800         [ +  + ]:         573177 :     if (record <= LogwrtResult.Flush)
 8943 tgl@sss.pgh.pa.us        2801                 :         436472 :         return;
                               2802                 :                : 
                               2803                 :                : #ifdef WAL_DEBUG
                               2804                 :                :     if (XLOG_DEBUG)
                               2805                 :                :         elog(LOG, "xlog flush request %X/%08X; write %X/%08X; flush %X/%08X",
                               2806                 :                :              LSN_FORMAT_ARGS(record),
                               2807                 :                :              LSN_FORMAT_ARGS(LogwrtResult.Write),
                               2808                 :                :              LSN_FORMAT_ARGS(LogwrtResult.Flush));
                               2809                 :                : #endif
                               2810                 :                : 
                               2811                 :         136705 :     START_CRIT_SECTION();
                               2812                 :                : 
                               2813                 :                :     /*
                               2814                 :                :      * Since fsync is usually a horribly expensive operation, we try to
                               2815                 :                :      * piggyback as much data as we can on each fsync: if we see any more data
                               2816                 :                :      * entered into the xlog buffer, we'll write and fsync that too, so that
                               2817                 :                :      * the final value of LogwrtResult.Flush is as large as possible. This
                               2818                 :                :      * gives us some chance of avoiding another fsync immediately after.
                               2819                 :                :      */
                               2820                 :                : 
                               2821                 :                :     /* initialize to given target; may increase below */
                               2822                 :         136705 :     WriteRqstPtr = record;
                               2823                 :                : 
                               2824                 :                :     /*
                               2825                 :                :      * Now wait until we get the write lock, or someone else does the flush
                               2826                 :                :      * for us.
                               2827                 :                :      */
                               2828                 :                :     for (;;)
 8653                          2829                 :           2247 :     {
                               2830                 :                :         XLogRecPtr  insertpos;
                               2831                 :                : 
                               2832                 :                :         /* done already? */
  519 alvherre@alvh.no-ip.     2833                 :         138952 :         RefreshXLogWriteResult(LogwrtResult);
 4635                          2834         [ +  + ]:         138952 :         if (record <= LogwrtResult.Flush)
 4968 heikki.linnakangas@i     2835                 :          10120 :             break;
                               2836                 :                : 
                               2837                 :                :         /*
                               2838                 :                :          * Before actually performing the write, wait for all in-flight
                               2839                 :                :          * insertions to the pages we're about to write to finish.
                               2840                 :                :          */
  519 alvherre@alvh.no-ip.     2841         [ +  + ]:         128832 :         SpinLockAcquire(&XLogCtl->info_lck);
                               2842         [ +  + ]:         128832 :         if (WriteRqstPtr < XLogCtl->LogwrtRqst.Write)
                               2843                 :           8575 :             WriteRqstPtr = XLogCtl->LogwrtRqst.Write;
                               2844                 :         128832 :         SpinLockRelease(&XLogCtl->info_lck);
 4443 heikki.linnakangas@i     2845                 :         128832 :         insertpos = WaitXLogInsertionsToFinish(WriteRqstPtr);
                               2846                 :                : 
                               2847                 :                :         /*
                               2848                 :                :          * Try to get the write lock. If we can't get it immediately, wait
                               2849                 :                :          * until it's released, and recheck if we still need to do the flush
                               2850                 :                :          * or if the backend that held the lock did it for us already. This
                               2851                 :                :          * helps to maintain a good rate of group committing when the system
                               2852                 :                :          * is bottlenecked by the speed of fsyncing.
                               2853                 :                :          */
 4959                          2854         [ +  + ]:         128832 :         if (!LWLockAcquireOrWait(WALWriteLock, LW_EXCLUSIVE))
                               2855                 :                :         {
                               2856                 :                :             /*
                               2857                 :                :              * The lock is now free, but we didn't acquire it yet. Before we
                               2858                 :                :              * do, loop back to check if someone else flushed the record for
                               2859                 :                :              * us already.
                               2860                 :                :              */
 4968                          2861                 :           2247 :             continue;
                               2862                 :                :         }
                               2863                 :                : 
                               2864                 :                :         /* Got the lock; recheck whether request is satisfied */
  521 alvherre@alvh.no-ip.     2865                 :         126585 :         RefreshXLogWriteResult(LogwrtResult);
 4635                          2866         [ +  + ]:         126585 :         if (record <= LogwrtResult.Flush)
                               2867                 :                :         {
 4814 rhaas@postgresql.org     2868                 :           3072 :             LWLockRelease(WALWriteLock);
                               2869                 :           3072 :             break;
                               2870                 :                :         }
                               2871                 :                : 
                               2872                 :                :         /*
                               2873                 :                :          * Sleep before flush! By adding a delay here, we may give further
                               2874                 :                :          * backends the opportunity to join the backlog of group commit
                               2875                 :                :          * followers; this can significantly improve transaction throughput,
                               2876                 :                :          * at the risk of increasing transaction latency.
                               2877                 :                :          *
                               2878                 :                :          * We do not sleep if enableFsync is not turned on, nor if there are
                               2879                 :                :          * fewer than CommitSiblings other backends with active transactions.
                               2880                 :                :          */
                               2881   [ -  +  -  -  :         123513 :         if (CommitDelay > 0 && enableFsync &&
                                              -  - ]
 4814 rhaas@postgresql.org     2882                 :UBC           0 :             MinimumActiveBackends(CommitSiblings))
                               2883                 :                :         {
                               2884                 :              0 :             pg_usleep(CommitDelay);
                               2885                 :                : 
                               2886                 :                :             /*
                               2887                 :                :              * Re-check how far we can now flush the WAL. It's generally not
                               2888                 :                :              * safe to call WaitXLogInsertionsToFinish while holding
                               2889                 :                :              * WALWriteLock, because an in-progress insertion might need to
                               2890                 :                :              * also grab WALWriteLock to make progress. But we know that all
                               2891                 :                :              * the insertions up to insertpos have already finished, because
                               2892                 :                :              * that's what the earlier WaitXLogInsertionsToFinish() returned.
                               2893                 :                :              * We're only calling it again to allow insertpos to be moved
                               2894                 :                :              * further forward, not to actually wait for anyone.
                               2895                 :                :              */
 4443 heikki.linnakangas@i     2896                 :              0 :             insertpos = WaitXLogInsertionsToFinish(insertpos);
                               2897                 :                :         }
                               2898                 :                : 
                               2899                 :                :         /* try to write/flush later additions to XLOG as well */
 4443 heikki.linnakangas@i     2900                 :CBC      123513 :         WriteRqst.Write = insertpos;
                               2901                 :         123513 :         WriteRqst.Flush = insertpos;
                               2902                 :                : 
 1401 rhaas@postgresql.org     2903                 :         123513 :         XLogWrite(WriteRqst, insertTLI, false);
                               2904                 :                : 
 8743 tgl@sss.pgh.pa.us        2905                 :         123513 :         LWLockRelease(WALWriteLock);
                               2906                 :                :         /* done */
 4968 heikki.linnakangas@i     2907                 :         123513 :         break;
                               2908                 :                :     }
                               2909                 :                : 
 8943 tgl@sss.pgh.pa.us        2910         [ -  + ]:         136705 :     END_CRIT_SECTION();
                               2911                 :                : 
                               2912                 :                :     /* wake up walsenders now that we've released heavily contended locks */
  882 andres@anarazel.de       2913                 :         136705 :     WalSndWakeupProcessRequests(true, !RecoveryInProgress());
                               2914                 :                : 
                               2915                 :                :     /*
                               2916                 :                :      * If we still haven't flushed to the request point then we have a
                               2917                 :                :      * problem; most likely, the requested flush point is past end of XLOG.
                               2918                 :                :      * This has been seen to occur when a disk page has a corrupted LSN.
                               2919                 :                :      *
                               2920                 :                :      * Formerly we treated this as a PANIC condition, but that hurts the
                               2921                 :                :      * system's robustness rather than helping it: we do not want to take down
                               2922                 :                :      * the whole system due to corruption on one data page.  In particular, if
                               2923                 :                :      * the bad page is encountered again during recovery then we would be
                               2924                 :                :      * unable to restart the database at all!  (This scenario actually
                               2925                 :                :      * happened in the field several times with 7.1 releases.)  As of 8.4, bad
                               2926                 :                :      * LSNs encountered during recovery are UpdateMinRecoveryPoint's problem;
                               2927                 :                :      * the only time we can reach here during recovery is while flushing the
                               2928                 :                :      * end-of-recovery checkpoint record, and we don't expect that to have a
                               2929                 :                :      * bad LSN.
                               2930                 :                :      *
                               2931                 :                :      * Note that for calls from xact.c, the ERROR will be promoted to PANIC
                               2932                 :                :      * since xact.c calls this routine inside a critical section.  However,
                               2933                 :                :      * calls from bufmgr.c are not within critical sections and so we will not
                               2934                 :                :      * force a restart for a bad LSN on a data page.
                               2935                 :                :      */
 4635 alvherre@alvh.no-ip.     2936         [ -  + ]:         136705 :     if (LogwrtResult.Flush < record)
 5916 tgl@sss.pgh.pa.us        2937         [ #  # ]:UBC           0 :         elog(ERROR,
                               2938                 :                :              "xlog flush request %X/%08X is not satisfied --- flushed only to %X/%08X",
                               2939                 :                :              LSN_FORMAT_ARGS(record),
                               2940                 :                :              LSN_FORMAT_ARGS(LogwrtResult.Flush));
                               2941                 :                : }
                               2942                 :                : 
                               2943                 :                : /*
                               2944                 :                :  * Write & flush xlog, but without specifying exactly where to.
                               2945                 :                :  *
                               2946                 :                :  * We normally write only completed blocks; but if there is nothing to do on
                               2947                 :                :  * that basis, we check for unwritten async commits in the current incomplete
                               2948                 :                :  * block, and write through the latest one of those.  Thus, if async commits
                               2949                 :                :  * are not being used, we will write complete blocks only.
                               2950                 :                :  *
                               2951                 :                :  * If, based on the above, there's anything to write we do so immediately. But
                               2952                 :                :  * to avoid calling fsync, fdatasync et. al. at a rate that'd impact
                               2953                 :                :  * concurrent IO, we only flush WAL every wal_writer_delay ms, or if there's
                               2954                 :                :  * more than wal_writer_flush_after unflushed blocks.
                               2955                 :                :  *
                               2956                 :                :  * We can guarantee that async commits reach disk after at most three
                               2957                 :                :  * wal_writer_delay cycles. (When flushing complete blocks, we allow XLogWrite
                               2958                 :                :  * to write "flexibly", meaning it can stop at the end of the buffer ring;
                               2959                 :                :  * this makes a difference only with very high load or long wal_writer_delay,
                               2960                 :                :  * but imposes one extra cycle for the worst case for async commits.)
                               2961                 :                :  *
                               2962                 :                :  * This routine is invoked periodically by the background walwriter process.
                               2963                 :                :  *
                               2964                 :                :  * Returns true if there was any work to do, even if we skipped flushing due
                               2965                 :                :  * to wal_writer_delay/wal_writer_flush_after.
                               2966                 :                :  */
                               2967                 :                : bool
 6619 tgl@sss.pgh.pa.us        2968                 :CBC       13060 : XLogBackgroundFlush(void)
                               2969                 :                : {
                               2970                 :                :     XLogwrtRqst WriteRqst;
                               2971                 :          13060 :     bool        flexible = true;
                               2972                 :                :     static TimestampTz lastflush;
                               2973                 :                :     TimestampTz now;
                               2974                 :                :     int         flushblocks;
                               2975                 :                :     TimeLineID  insertTLI;
                               2976                 :                : 
                               2977                 :                :     /* XLOG doesn't need flushing during recovery */
 6044 heikki.linnakangas@i     2978         [ -  + ]:          13060 :     if (RecoveryInProgress())
 4869 tgl@sss.pgh.pa.us        2979                 :UBC           0 :         return false;
                               2980                 :                : 
                               2981                 :                :     /*
                               2982                 :                :      * Since we're not in recovery, InsertTimeLineID is set and can't change,
                               2983                 :                :      * so we can read it without a lock.
                               2984                 :                :      */
 1396 rhaas@postgresql.org     2985                 :CBC       13060 :     insertTLI = XLogCtl->InsertTimeLineID;
                               2986                 :                : 
                               2987                 :                :     /* read updated LogwrtRqst */
 4002 andres@anarazel.de       2988         [ -  + ]:          13060 :     SpinLockAcquire(&XLogCtl->info_lck);
 3491                          2989                 :          13060 :     WriteRqst = XLogCtl->LogwrtRqst;
 4002                          2990                 :          13060 :     SpinLockRelease(&XLogCtl->info_lck);
                               2991                 :                : 
                               2992                 :                :     /* back off to last completed page boundary */
 3491                          2993                 :          13060 :     WriteRqst.Write -= WriteRqst.Write % XLOG_BLCKSZ;
                               2994                 :                : 
                               2995                 :                :     /* if we have already flushed that far, consider async commit records */
  519 alvherre@alvh.no-ip.     2996                 :          13060 :     RefreshXLogWriteResult(LogwrtResult);
 3491 andres@anarazel.de       2997         [ +  + ]:          13060 :     if (WriteRqst.Write <= LogwrtResult.Flush)
                               2998                 :                :     {
 4002                          2999         [ +  + ]:          10522 :         SpinLockAcquire(&XLogCtl->info_lck);
 3491                          3000                 :          10522 :         WriteRqst.Write = XLogCtl->asyncXactLSN;
 4002                          3001                 :          10522 :         SpinLockRelease(&XLogCtl->info_lck);
 6619 tgl@sss.pgh.pa.us        3002                 :          10522 :         flexible = false;       /* ensure it all gets written */
                               3003                 :                :     }
                               3004                 :                : 
                               3005                 :                :     /*
                               3006                 :                :      * If already known flushed, we're done. Just need to check if we are
                               3007                 :                :      * holding an open file handle to a logfile that's no longer in use,
                               3008                 :                :      * preventing the file from being deleted.
                               3009                 :                :      */
 3491 andres@anarazel.de       3010         [ +  + ]:          13060 :     if (WriteRqst.Write <= LogwrtResult.Flush)
                               3011                 :                :     {
 5541 bruce@momjian.us         3012         [ +  + ]:           9898 :         if (openLogFile >= 0)
                               3013                 :                :         {
 2909 andres@anarazel.de       3014         [ +  + ]:           4650 :             if (!XLByteInPrevSeg(LogwrtResult.Write, openLogSegNo,
                               3015                 :                :                                  wal_segment_size))
                               3016                 :                :             {
 5568 magnus@hagander.net      3017                 :            201 :                 XLogFileClose();
                               3018                 :                :             }
                               3019                 :                :         }
 4869 tgl@sss.pgh.pa.us        3020                 :           9898 :         return false;
                               3021                 :                :     }
                               3022                 :                : 
                               3023                 :                :     /*
                               3024                 :                :      * Determine how far to flush WAL, based on the wal_writer_delay and
                               3025                 :                :      * wal_writer_flush_after GUCs.
                               3026                 :                :      *
                               3027                 :                :      * Note that XLogSetAsyncXactLSN() performs similar calculation based on
                               3028                 :                :      * wal_writer_flush_after, to decide when to wake us up.  Make sure the
                               3029                 :                :      * logic is the same in both places if you change this.
                               3030                 :                :      */
 3491 andres@anarazel.de       3031                 :           3162 :     now = GetCurrentTimestamp();
  649 heikki.linnakangas@i     3032                 :           3162 :     flushblocks =
 3491 andres@anarazel.de       3033                 :           3162 :         WriteRqst.Write / XLOG_BLCKSZ - LogwrtResult.Flush / XLOG_BLCKSZ;
                               3034                 :                : 
                               3035   [ +  -  +  + ]:           3162 :     if (WalWriterFlushAfter == 0 || lastflush == 0)
                               3036                 :                :     {
                               3037                 :                :         /* first call, or block based limits disabled */
                               3038                 :            260 :         WriteRqst.Flush = WriteRqst.Write;
                               3039                 :            260 :         lastflush = now;
                               3040                 :                :     }
                               3041         [ +  + ]:           2902 :     else if (TimestampDifferenceExceeds(lastflush, now, WalWriterDelay))
                               3042                 :                :     {
                               3043                 :                :         /*
                               3044                 :                :          * Flush the writes at least every WalWriterDelay ms. This is
                               3045                 :                :          * important to bound the amount of time it takes for an asynchronous
                               3046                 :                :          * commit to hit disk.
                               3047                 :                :          */
                               3048                 :           2755 :         WriteRqst.Flush = WriteRqst.Write;
                               3049                 :           2755 :         lastflush = now;
                               3050                 :                :     }
  649 heikki.linnakangas@i     3051         [ +  + ]:            147 :     else if (flushblocks >= WalWriterFlushAfter)
                               3052                 :                :     {
                               3053                 :                :         /* exceeded wal_writer_flush_after blocks, flush */
 3491 andres@anarazel.de       3054                 :            135 :         WriteRqst.Flush = WriteRqst.Write;
                               3055                 :            135 :         lastflush = now;
                               3056                 :                :     }
                               3057                 :                :     else
                               3058                 :                :     {
                               3059                 :                :         /* no flushing, this time round */
                               3060                 :             12 :         WriteRqst.Flush = 0;
                               3061                 :                :     }
                               3062                 :                : 
                               3063                 :                : #ifdef WAL_DEBUG
                               3064                 :                :     if (XLOG_DEBUG)
                               3065                 :                :         elog(LOG, "xlog bg flush request write %X/%08X; flush: %X/%08X, current is write %X/%08X; flush %X/%08X",
                               3066                 :                :              LSN_FORMAT_ARGS(WriteRqst.Write),
                               3067                 :                :              LSN_FORMAT_ARGS(WriteRqst.Flush),
                               3068                 :                :              LSN_FORMAT_ARGS(LogwrtResult.Write),
                               3069                 :                :              LSN_FORMAT_ARGS(LogwrtResult.Flush));
                               3070                 :                : #endif
                               3071                 :                : 
 6619 tgl@sss.pgh.pa.us        3072                 :           3162 :     START_CRIT_SECTION();
                               3073                 :                : 
                               3074                 :                :     /* now wait for any in-progress insertions to finish and get write lock */
 3491 andres@anarazel.de       3075                 :           3162 :     WaitXLogInsertionsToFinish(WriteRqst.Write);
 6619 tgl@sss.pgh.pa.us        3076                 :           3162 :     LWLockAcquire(WALWriteLock, LW_EXCLUSIVE);
  521 alvherre@alvh.no-ip.     3077                 :           3162 :     RefreshXLogWriteResult(LogwrtResult);
 3491 andres@anarazel.de       3078         [ +  + ]:           3162 :     if (WriteRqst.Write > LogwrtResult.Write ||
                               3079         [ +  + ]:            128 :         WriteRqst.Flush > LogwrtResult.Flush)
                               3080                 :                :     {
 1401 rhaas@postgresql.org     3081                 :           3103 :         XLogWrite(WriteRqst, insertTLI, flexible);
                               3082                 :                :     }
 6619 tgl@sss.pgh.pa.us        3083                 :           3162 :     LWLockRelease(WALWriteLock);
                               3084                 :                : 
                               3085         [ -  + ]:           3162 :     END_CRIT_SECTION();
                               3086                 :                : 
                               3087                 :                :     /* wake up walsenders now that we've released heavily contended locks */
  882 andres@anarazel.de       3088                 :           3162 :     WalSndWakeupProcessRequests(true, !RecoveryInProgress());
                               3089                 :                : 
                               3090                 :                :     /*
                               3091                 :                :      * Great, done. To take some work off the critical path, try to initialize
                               3092                 :                :      * as many of the no-longer-needed WAL buffers for future use as we can.
                               3093                 :                :      */
 1401 rhaas@postgresql.org     3094                 :           3162 :     AdvanceXLInsertBuffer(InvalidXLogRecPtr, insertTLI, true);
                               3095                 :                : 
                               3096                 :                :     /*
                               3097                 :                :      * If we determined that we need to write data, but somebody else
                               3098                 :                :      * wrote/flushed already, it should be considered as being active, to
                               3099                 :                :      * avoid hibernating too early.
                               3100                 :                :      */
 3491 andres@anarazel.de       3101                 :           3162 :     return true;
                               3102                 :                : }
                               3103                 :                : 
                               3104                 :                : /*
                               3105                 :                :  * Test whether XLOG data has been flushed up to (at least) the given position.
                               3106                 :                :  *
                               3107                 :                :  * Returns true if a flush is still needed.  (It may be that someone else
                               3108                 :                :  * is already in process of flushing that far, however.)
                               3109                 :                :  */
                               3110                 :                : bool
 6674 tgl@sss.pgh.pa.us        3111                 :        8599683 : XLogNeedsFlush(XLogRecPtr record)
                               3112                 :                : {
                               3113                 :                :     /*
                               3114                 :                :      * During recovery, we don't flush WAL but update minRecoveryPoint
                               3115                 :                :      * instead. So "needs flush" is taken to mean whether minRecoveryPoint
                               3116                 :                :      * would need to be updated.
                               3117                 :                :      */
 6044 heikki.linnakangas@i     3118         [ +  + ]:        8599683 :     if (RecoveryInProgress())
                               3119                 :                :     {
                               3120                 :                :         /*
                               3121                 :                :          * An invalid minRecoveryPoint means that we need to recover all the
                               3122                 :                :          * WAL, i.e., we're doing crash recovery.  We never modify the control
                               3123                 :                :          * file's value in that case, so we can short-circuit future checks
                               3124                 :                :          * here too.  This triggers a quick exit path for the startup process,
                               3125                 :                :          * which cannot update its local copy of minRecoveryPoint as long as
                               3126                 :                :          * it has not replayed all WAL available when doing crash recovery.
                               3127                 :                :          */
 1298                          3128   [ +  +  -  + ]:         614243 :         if (XLogRecPtrIsInvalid(LocalMinRecoveryPoint) && InRecovery)
 2620 michael@paquier.xyz      3129                 :UBC           0 :             updateMinRecoveryPoint = false;
                               3130                 :                : 
                               3131                 :                :         /* Quick exit if already known to be updated or cannot be updated */
 1298 heikki.linnakangas@i     3132   [ +  +  -  + ]:CBC      614243 :         if (record <= LocalMinRecoveryPoint || !updateMinRecoveryPoint)
 5740 simon@2ndQuadrant.co     3133                 :         605537 :             return false;
                               3134                 :                : 
                               3135                 :                :         /*
                               3136                 :                :          * Update local copy of minRecoveryPoint. But if the lock is busy,
                               3137                 :                :          * just return a conservative guess.
                               3138                 :                :          */
                               3139         [ -  + ]:           8706 :         if (!LWLockConditionalAcquire(ControlFileLock, LW_SHARED))
 5740 simon@2ndQuadrant.co     3140                 :UBC           0 :             return true;
 1298 heikki.linnakangas@i     3141                 :CBC        8706 :         LocalMinRecoveryPoint = ControlFile->minRecoveryPoint;
                               3142                 :           8706 :         LocalMinRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 5740 simon@2ndQuadrant.co     3143                 :           8706 :         LWLockRelease(ControlFileLock);
                               3144                 :                : 
                               3145                 :                :         /*
                               3146                 :                :          * Check minRecoveryPoint for any other process than the startup
                               3147                 :                :          * process doing crash recovery, which should not update the control
                               3148                 :                :          * file value if crash recovery is still running.
                               3149                 :                :          */
 1298 heikki.linnakangas@i     3150         [ -  + ]:           8706 :         if (XLogRecPtrIsInvalid(LocalMinRecoveryPoint))
 2563 michael@paquier.xyz      3151                 :UBC           0 :             updateMinRecoveryPoint = false;
                               3152                 :                : 
                               3153                 :                :         /* check again */
 1298 heikki.linnakangas@i     3154   [ +  +  -  + ]:CBC        8706 :         if (record <= LocalMinRecoveryPoint || !updateMinRecoveryPoint)
 2563 michael@paquier.xyz      3155                 :             71 :             return false;
                               3156                 :                :         else
                               3157                 :           8635 :             return true;
                               3158                 :                :     }
                               3159                 :                : 
                               3160                 :                :     /* Quick exit if already known flushed */
 4635 alvherre@alvh.no-ip.     3161         [ +  + ]:        7985440 :     if (record <= LogwrtResult.Flush)
 6674 tgl@sss.pgh.pa.us        3162                 :        7834091 :         return false;
                               3163                 :                : 
                               3164                 :                :     /* read LogwrtResult and update local state */
  521 alvherre@alvh.no-ip.     3165                 :         151349 :     RefreshXLogWriteResult(LogwrtResult);
                               3166                 :                : 
                               3167                 :                :     /* check again */
 4635                          3168         [ +  + ]:         151349 :     if (record <= LogwrtResult.Flush)
 6674 tgl@sss.pgh.pa.us        3169                 :           3678 :         return false;
                               3170                 :                : 
                               3171                 :         147671 :     return true;
                               3172                 :                : }
                               3173                 :                : 
                               3174                 :                : /*
                               3175                 :                :  * Try to make a given XLOG file segment exist.
                               3176                 :                :  *
                               3177                 :                :  * logsegno: identify segment.
                               3178                 :                :  *
                               3179                 :                :  * *added: on return, true if this call raised the number of extant segments.
                               3180                 :                :  *
                               3181                 :                :  * path: on return, this char[MAXPGPATH] has the path to the logsegno file.
                               3182                 :                :  *
                               3183                 :                :  * Returns -1 or FD of opened file.  A -1 here is not an error; a caller
                               3184                 :                :  * wanting an open segment should attempt to open "path", which usually will
                               3185                 :                :  * succeed.  (This is weird, but it's efficient for the callers.)
                               3186                 :                :  */
                               3187                 :                : static int
 1401 rhaas@postgresql.org     3188                 :          13808 : XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli,
                               3189                 :                :                      bool *added, char *path)
                               3190                 :                : {
                               3191                 :                :     char        tmppath[MAXPGPATH];
                               3192                 :                :     XLogSegNo   installed_segno;
                               3193                 :                :     XLogSegNo   max_segno;
                               3194                 :                :     int         fd;
                               3195                 :                :     int         save_errno;
  882 tmunro@postgresql.or     3196                 :          13808 :     int         open_flags = O_RDWR | O_CREAT | O_EXCL | PG_BINARY;
                               3197                 :                :     instr_time  io_start;
                               3198                 :                : 
 1401 rhaas@postgresql.org     3199         [ -  + ]:          13808 :     Assert(logtli != 0);
                               3200                 :                : 
                               3201                 :          13808 :     XLogFilePath(path, logtli, logsegno, wal_segment_size);
                               3202                 :                : 
                               3203                 :                :     /*
                               3204                 :                :      * Try to use existent file (checkpoint maker may have created it already)
                               3205                 :                :      */
 1531 noah@leadboat.com        3206                 :          13808 :     *added = false;
  918 tmunro@postgresql.or     3207                 :          13808 :     fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC |
  694 nathan@postgresql.or     3208                 :          13808 :                        get_sync_bit(wal_sync_method));
 1531 noah@leadboat.com        3209         [ +  + ]:          13808 :     if (fd < 0)
                               3210                 :                :     {
                               3211         [ -  + ]:           1357 :         if (errno != ENOENT)
 1531 noah@leadboat.com        3212         [ #  # ]:UBC           0 :             ereport(ERROR,
                               3213                 :                :                     (errcode_for_file_access(),
                               3214                 :                :                      errmsg("could not open file \"%s\": %m", path)));
                               3215                 :                :     }
                               3216                 :                :     else
 1531 noah@leadboat.com        3217                 :CBC       12451 :         return fd;
                               3218                 :                : 
                               3219                 :                :     /*
                               3220                 :                :      * Initialize an empty (all zeroes) segment.  NOTE: it is possible that
                               3221                 :                :      * another process is doing the same thing.  If so, we will end up
                               3222                 :                :      * pre-creating an extra log segment.  That seems OK, and better than
                               3223                 :                :      * holding the lock throughout this lengthy process.
                               3224                 :                :      */
 6643 tgl@sss.pgh.pa.us        3225         [ +  + ]:           1357 :     elog(DEBUG2, "creating and filling new WAL file");
                               3226                 :                : 
 7369                          3227                 :           1357 :     snprintf(tmppath, MAXPGPATH, XLOGDIR "/xlogtemp.%d", (int) getpid());
                               3228                 :                : 
 8939                          3229                 :           1357 :     unlink(tmppath);
                               3230                 :                : 
  882 tmunro@postgresql.or     3231         [ -  + ]:           1357 :     if (io_direct_flags & IO_DIRECT_WAL_INIT)
  882 tmunro@postgresql.or     3232                 :UBC           0 :         open_flags |= PG_O_DIRECT;
                               3233                 :                : 
                               3234                 :                :     /* do not use get_sync_bit() here --- want to fsync only at end of fill */
  882 tmunro@postgresql.or     3235                 :CBC        1357 :     fd = BasicOpenFile(tmppath, open_flags);
 9476 vadim4o@yahoo.com        3236         [ -  + ]:           1357 :     if (fd < 0)
 7449 tgl@sss.pgh.pa.us        3237         [ #  # ]:UBC           0 :         ereport(ERROR,
                               3238                 :                :                 (errcode_for_file_access(),
                               3239                 :                :                  errmsg("could not create file \"%s\": %m", tmppath)));
                               3240                 :                : 
                               3241                 :                :     /* Measure I/O timing when initializing segment */
  192 michael@paquier.xyz      3242                 :CBC        1357 :     io_start = pgstat_prepare_io_time(track_wal_io_timing);
                               3243                 :                : 
 2349 tmunro@postgresql.or     3244                 :           1357 :     pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_WRITE);
                               3245                 :           1357 :     save_errno = 0;
                               3246         [ +  - ]:           1357 :     if (wal_init_zero)
                               3247                 :                :     {
                               3248                 :                :         ssize_t     rc;
                               3249                 :                : 
                               3250                 :                :         /*
                               3251                 :                :          * Zero-fill the file.  With this setting, we do this the hard way to
                               3252                 :                :          * ensure that all the file space has really been allocated.  On
                               3253                 :                :          * platforms that allow "holes" in files, just seeking to the end
                               3254                 :                :          * doesn't allocate intermediate space.  This way, we know that we
                               3255                 :                :          * have all the space and (after the fsync below) that all the
                               3256                 :                :          * indirect blocks are down on disk.  Therefore, fdatasync(2) or
                               3257                 :                :          * O_DSYNC will be sufficient to sync future writes to the log file.
                               3258                 :                :          */
  915 michael@paquier.xyz      3259                 :           1357 :         rc = pg_pwrite_zeros(fd, wal_segment_size, 0);
                               3260                 :                : 
 1033                          3261         [ -  + ]:           1357 :         if (rc < 0)
 1033 michael@paquier.xyz      3262                 :UBC           0 :             save_errno = errno;
                               3263                 :                :     }
                               3264                 :                :     else
                               3265                 :                :     {
                               3266                 :                :         /*
                               3267                 :                :          * Otherwise, seeking to the end and writing a solitary byte is
                               3268                 :                :          * enough.
                               3269                 :                :          */
 4385 jdavis@postgresql.or     3270                 :              0 :         errno = 0;
 1033 michael@paquier.xyz      3271         [ #  # ]:              0 :         if (pg_pwrite(fd, "\0", 1, wal_segment_size - 1) != 1)
                               3272                 :                :         {
                               3273                 :                :             /* if write didn't set errno, assume no disk space */
 2349 tmunro@postgresql.or     3274         [ #  # ]:              0 :             save_errno = errno ? errno : ENOSPC;
                               3275                 :                :         }
                               3276                 :                :     }
 2349 tmunro@postgresql.or     3277                 :CBC        1357 :     pgstat_report_wait_end();
                               3278                 :                : 
                               3279                 :                :     /*
                               3280                 :                :      * A full segment worth of data is written when using wal_init_zero. One
                               3281                 :                :      * byte is written when not using it.
                               3282                 :                :      */
  214 michael@paquier.xyz      3283                 :           1357 :     pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_INIT, IOOP_WRITE,
                               3284                 :                :                             io_start, 1,
                               3285         [ +  - ]:           1357 :                             wal_init_zero ? wal_segment_size : 1);
                               3286                 :                : 
 2349 tmunro@postgresql.or     3287         [ -  + ]:           1357 :     if (save_errno)
                               3288                 :                :     {
                               3289                 :                :         /*
                               3290                 :                :          * If we fail to make the file, delete it to release disk space
                               3291                 :                :          */
 2349 tmunro@postgresql.or     3292                 :UBC           0 :         unlink(tmppath);
                               3293                 :                : 
                               3294                 :              0 :         close(fd);
                               3295                 :                : 
                               3296                 :              0 :         errno = save_errno;
                               3297                 :                : 
                               3298         [ #  # ]:              0 :         ereport(ERROR,
                               3299                 :                :                 (errcode_for_file_access(),
                               3300                 :                :                  errmsg("could not write to file \"%s\": %m", tmppath)));
                               3301                 :                :     }
                               3302                 :                : 
                               3303                 :                :     /* Measure I/O timing when flushing segment */
  192 michael@paquier.xyz      3304                 :CBC        1357 :     io_start = pgstat_prepare_io_time(track_wal_io_timing);
                               3305                 :                : 
 3094 rhaas@postgresql.org     3306                 :           1357 :     pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_SYNC);
 9038 tgl@sss.pgh.pa.us        3307         [ -  + ]:           1357 :     if (pg_fsync(fd) != 0)
                               3308                 :                :     {
 1107 drowley@postgresql.o     3309                 :UBC           0 :         save_errno = errno;
 4666 heikki.linnakangas@i     3310                 :              0 :         close(fd);
 2630 michael@paquier.xyz      3311                 :              0 :         errno = save_errno;
 7449 tgl@sss.pgh.pa.us        3312         [ #  # ]:              0 :         ereport(ERROR,
                               3313                 :                :                 (errcode_for_file_access(),
                               3314                 :                :                  errmsg("could not fsync file \"%s\": %m", tmppath)));
                               3315                 :                :     }
 3094 rhaas@postgresql.org     3316                 :CBC        1357 :     pgstat_report_wait_end();
                               3317                 :                : 
  214 michael@paquier.xyz      3318                 :           1357 :     pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_INIT,
                               3319                 :                :                             IOOP_FSYNC, io_start, 1, 0);
                               3320                 :                : 
 2254 peter@eisentraut.org     3321         [ -  + ]:           1357 :     if (close(fd) != 0)
 7449 tgl@sss.pgh.pa.us        3322         [ #  # ]:UBC           0 :         ereport(ERROR,
                               3323                 :                :                 (errcode_for_file_access(),
                               3324                 :                :                  errmsg("could not close file \"%s\": %m", tmppath)));
                               3325                 :                : 
                               3326                 :                :     /*
                               3327                 :                :      * Now move the segment into place with its final name.  Cope with
                               3328                 :                :      * possibility that someone else has created the file while we were
                               3329                 :                :      * filling ours: if so, use ours to pre-create a future log segment.
                               3330                 :                :      */
 4822 heikki.linnakangas@i     3331                 :CBC        1357 :     installed_segno = logsegno;
                               3332                 :                : 
                               3333                 :                :     /*
                               3334                 :                :      * XXX: What should we use as max_segno? We used to use XLOGfileslop when
                               3335                 :                :      * that was a constant, but that was always a bit dubious: normally, at a
                               3336                 :                :      * checkpoint, XLOGfileslop was the offset from the checkpoint record, but
                               3337                 :                :      * here, it was the offset from the insert location. We can't do the
                               3338                 :                :      * normal XLOGfileslop calculation here because we don't have access to
                               3339                 :                :      * the prior checkpoint's redo location. So somewhat arbitrarily, just use
                               3340                 :                :      * CheckPointSegments.
                               3341                 :                :      */
 3848                          3342                 :           1357 :     max_segno = logsegno + CheckPointSegments;
 1401 rhaas@postgresql.org     3343         [ +  - ]:           1357 :     if (InstallXLogFileSegment(&installed_segno, tmppath, true, max_segno,
                               3344                 :                :                                logtli))
                               3345                 :                :     {
 1531 noah@leadboat.com        3346                 :           1357 :         *added = true;
                               3347         [ +  + ]:           1357 :         elog(DEBUG2, "done creating and filling new WAL file");
                               3348                 :                :     }
                               3349                 :                :     else
                               3350                 :                :     {
                               3351                 :                :         /*
                               3352                 :                :          * No need for any more future segments, or InstallXLogFileSegment()
                               3353                 :                :          * failed to rename the file into place. If the rename failed, a
                               3354                 :                :          * caller opening the file may fail.
                               3355                 :                :          */
 8815 tgl@sss.pgh.pa.us        3356                 :UBC           0 :         unlink(tmppath);
 1531 noah@leadboat.com        3357         [ #  # ]:              0 :         elog(DEBUG2, "abandoned new WAL file");
                               3358                 :                :     }
                               3359                 :                : 
 1531 noah@leadboat.com        3360                 :CBC        1357 :     return -1;
                               3361                 :                : }
                               3362                 :                : 
                               3363                 :                : /*
                               3364                 :                :  * Create a new XLOG file segment, or open a pre-existing one.
                               3365                 :                :  *
                               3366                 :                :  * logsegno: identify segment to be created/opened.
                               3367                 :                :  *
                               3368                 :                :  * Returns FD of opened file.
                               3369                 :                :  *
                               3370                 :                :  * Note: errors here are ERROR not PANIC because we might or might not be
                               3371                 :                :  * inside a critical section (eg, during checkpoint there is no reason to
                               3372                 :                :  * take down the system on failure).  They will promote to PANIC if we are
                               3373                 :                :  * in a critical section.
                               3374                 :                :  */
                               3375                 :                : int
 1401 rhaas@postgresql.org     3376                 :          13584 : XLogFileInit(XLogSegNo logsegno, TimeLineID logtli)
                               3377                 :                : {
                               3378                 :                :     bool        ignore_added;
                               3379                 :                :     char        path[MAXPGPATH];
                               3380                 :                :     int         fd;
                               3381                 :                : 
                               3382         [ -  + ]:          13584 :     Assert(logtli != 0);
                               3383                 :                : 
                               3384                 :          13584 :     fd = XLogFileInitInternal(logsegno, logtli, &ignore_added, path);
 1531 noah@leadboat.com        3385         [ +  + ]:          13584 :     if (fd >= 0)
                               3386                 :          12276 :         return fd;
                               3387                 :                : 
                               3388                 :                :     /* Now open original target segment (might not be file I just made) */
  918 tmunro@postgresql.or     3389                 :           1308 :     fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC |
  694 nathan@postgresql.or     3390                 :           1308 :                        get_sync_bit(wal_sync_method));
 8815 tgl@sss.pgh.pa.us        3391         [ -  + ]:           1308 :     if (fd < 0)
 7449 tgl@sss.pgh.pa.us        3392         [ #  # ]:UBC           0 :         ereport(ERROR,
                               3393                 :                :                 (errcode_for_file_access(),
                               3394                 :                :                  errmsg("could not open file \"%s\": %m", path)));
 7178 neilc@samurai.com        3395                 :CBC        1308 :     return fd;
                               3396                 :                : }
                               3397                 :                : 
                               3398                 :                : /*
                               3399                 :                :  * Create a new XLOG file segment by copying a pre-existing one.
                               3400                 :                :  *
                               3401                 :                :  * destsegno: identify segment to be created.
                               3402                 :                :  *
                               3403                 :                :  * srcTLI, srcsegno: identify segment to be copied (could be from
                               3404                 :                :  *      a different timeline)
                               3405                 :                :  *
                               3406                 :                :  * upto: how much of the source file to copy (the rest is filled with
                               3407                 :                :  *      zeros)
                               3408                 :                :  *
                               3409                 :                :  * Currently this is only used during recovery, and so there are no locking
                               3410                 :                :  * considerations.  But we should be just as tense as XLogFileInit to avoid
                               3411                 :                :  * emplacing a bogus file.
                               3412                 :                :  */
                               3413                 :                : static void
 1401 rhaas@postgresql.org     3414                 :             37 : XLogFileCopy(TimeLineID destTLI, XLogSegNo destsegno,
                               3415                 :                :              TimeLineID srcTLI, XLogSegNo srcsegno,
                               3416                 :                :              int upto)
                               3417                 :                : {
                               3418                 :                :     char        path[MAXPGPATH];
                               3419                 :                :     char        tmppath[MAXPGPATH];
                               3420                 :                :     PGAlignedXLogBlock buffer;
                               3421                 :                :     int         srcfd;
                               3422                 :                :     int         fd;
                               3423                 :                :     int         nbytes;
                               3424                 :                : 
                               3425                 :                :     /*
                               3426                 :                :      * Open the source file
                               3427                 :                :      */
 2909 andres@anarazel.de       3428                 :             37 :     XLogFilePath(path, srcTLI, srcsegno, wal_segment_size);
 2905 peter_e@gmx.net          3429                 :             37 :     srcfd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
 7717 tgl@sss.pgh.pa.us        3430         [ -  + ]:             37 :     if (srcfd < 0)
 7449 tgl@sss.pgh.pa.us        3431         [ #  # ]:UBC           0 :         ereport(ERROR,
                               3432                 :                :                 (errcode_for_file_access(),
                               3433                 :                :                  errmsg("could not open file \"%s\": %m", path)));
                               3434                 :                : 
                               3435                 :                :     /*
                               3436                 :                :      * Copy into a temp file name.
                               3437                 :                :      */
 7369 tgl@sss.pgh.pa.us        3438                 :CBC          37 :     snprintf(tmppath, MAXPGPATH, XLOGDIR "/xlogtemp.%d", (int) getpid());
                               3439                 :                : 
 7717                          3440                 :             37 :     unlink(tmppath);
                               3441                 :                : 
                               3442                 :                :     /* do not use get_sync_bit() here --- want to fsync only at end of fill */
 2905 peter_e@gmx.net          3443                 :             37 :     fd = OpenTransientFile(tmppath, O_RDWR | O_CREAT | O_EXCL | PG_BINARY);
 7717 tgl@sss.pgh.pa.us        3444         [ -  + ]:             37 :     if (fd < 0)
 7449 tgl@sss.pgh.pa.us        3445         [ #  # ]:UBC           0 :         ereport(ERROR,
                               3446                 :                :                 (errcode_for_file_access(),
                               3447                 :                :                  errmsg("could not create file \"%s\": %m", tmppath)));
                               3448                 :                : 
                               3449                 :                :     /*
                               3450                 :                :      * Do the data copying.
                               3451                 :                :      */
 2909 andres@anarazel.de       3452         [ +  + ]:CBC       75813 :     for (nbytes = 0; nbytes < wal_segment_size; nbytes += sizeof(buffer))
                               3453                 :                :     {
                               3454                 :                :         int         nread;
                               3455                 :                : 
 3915 heikki.linnakangas@i     3456                 :          75776 :         nread = upto - nbytes;
                               3457                 :                : 
                               3458                 :                :         /*
                               3459                 :                :          * The part that is not read from the source file is filled with
                               3460                 :                :          * zeros.
                               3461                 :                :          */
                               3462         [ +  + ]:          75776 :         if (nread < sizeof(buffer))
 2562 tgl@sss.pgh.pa.us        3463                 :             37 :             memset(buffer.data, 0, sizeof(buffer));
                               3464                 :                : 
 3915 heikki.linnakangas@i     3465         [ +  + ]:          75776 :         if (nread > 0)
                               3466                 :                :         {
                               3467                 :                :             int         r;
                               3468                 :                : 
                               3469         [ +  + ]:           2166 :             if (nread > sizeof(buffer))
                               3470                 :           2129 :                 nread = sizeof(buffer);
 3094 rhaas@postgresql.org     3471                 :           2166 :             pgstat_report_wait_start(WAIT_EVENT_WAL_COPY_READ);
 2562 tgl@sss.pgh.pa.us        3472                 :           2166 :             r = read(srcfd, buffer.data, nread);
 2607 michael@paquier.xyz      3473         [ -  + ]:           2166 :             if (r != nread)
                               3474                 :                :             {
 2607 michael@paquier.xyz      3475         [ #  # ]:UBC           0 :                 if (r < 0)
 3915 heikki.linnakangas@i     3476         [ #  # ]:              0 :                     ereport(ERROR,
                               3477                 :                :                             (errcode_for_file_access(),
                               3478                 :                :                              errmsg("could not read file \"%s\": %m",
                               3479                 :                :                                     path)));
                               3480                 :                :                 else
                               3481         [ #  # ]:              0 :                     ereport(ERROR,
                               3482                 :                :                             (errcode(ERRCODE_DATA_CORRUPTED),
                               3483                 :                :                              errmsg("could not read file \"%s\": read %d of %zu",
                               3484                 :                :                                     path, r, (Size) nread)));
                               3485                 :                :             }
 3094 rhaas@postgresql.org     3486                 :CBC        2166 :             pgstat_report_wait_end();
                               3487                 :                :         }
 7717 tgl@sss.pgh.pa.us        3488                 :          75776 :         errno = 0;
 3094 rhaas@postgresql.org     3489                 :          75776 :         pgstat_report_wait_start(WAIT_EVENT_WAL_COPY_WRITE);
 2562 tgl@sss.pgh.pa.us        3490         [ -  + ]:          75776 :         if ((int) write(fd, buffer.data, sizeof(buffer)) != (int) sizeof(buffer))
                               3491                 :                :         {
 7717 tgl@sss.pgh.pa.us        3492                 :UBC           0 :             int         save_errno = errno;
                               3493                 :                : 
                               3494                 :                :             /*
                               3495                 :                :              * If we fail to make the file, delete it to release disk space
                               3496                 :                :              */
                               3497                 :              0 :             unlink(tmppath);
                               3498                 :                :             /* if write didn't set errno, assume problem is no disk space */
                               3499         [ #  # ]:              0 :             errno = save_errno ? save_errno : ENOSPC;
                               3500                 :                : 
 7449                          3501         [ #  # ]:              0 :             ereport(ERROR,
                               3502                 :                :                     (errcode_for_file_access(),
                               3503                 :                :                      errmsg("could not write to file \"%s\": %m", tmppath)));
                               3504                 :                :         }
 3094 rhaas@postgresql.org     3505                 :CBC       75776 :         pgstat_report_wait_end();
                               3506                 :                :     }
                               3507                 :                : 
                               3508                 :             37 :     pgstat_report_wait_start(WAIT_EVENT_WAL_COPY_SYNC);
 7717 tgl@sss.pgh.pa.us        3509         [ -  + ]:             37 :     if (pg_fsync(fd) != 0)
 2483 tmunro@postgresql.or     3510         [ #  # ]:UBC           0 :         ereport(data_sync_elevel(ERROR),
                               3511                 :                :                 (errcode_for_file_access(),
                               3512                 :                :                  errmsg("could not fsync file \"%s\": %m", tmppath)));
 3094 rhaas@postgresql.org     3513                 :CBC          37 :     pgstat_report_wait_end();
                               3514                 :                : 
 2254 peter@eisentraut.org     3515         [ -  + ]:             37 :     if (CloseTransientFile(fd) != 0)
 7449 tgl@sss.pgh.pa.us        3516         [ #  # ]:UBC           0 :         ereport(ERROR,
                               3517                 :                :                 (errcode_for_file_access(),
                               3518                 :                :                  errmsg("could not close file \"%s\": %m", tmppath)));
                               3519                 :                : 
 2254 peter@eisentraut.org     3520         [ -  + ]:CBC          37 :     if (CloseTransientFile(srcfd) != 0)
 2373 michael@paquier.xyz      3521         [ #  # ]:UBC           0 :         ereport(ERROR,
                               3522                 :                :                 (errcode_for_file_access(),
                               3523                 :                :                  errmsg("could not close file \"%s\": %m", path)));
                               3524                 :                : 
                               3525                 :                :     /*
                               3526                 :                :      * Now move the segment into place with its final name.
                               3527                 :                :      */
 1401 rhaas@postgresql.org     3528         [ -  + ]:CBC          37 :     if (!InstallXLogFileSegment(&destsegno, tmppath, false, 0, destTLI))
 3720 fujii@postgresql.org     3529         [ #  # ]:UBC           0 :         elog(ERROR, "InstallXLogFileSegment should not have failed");
 7717 tgl@sss.pgh.pa.us        3530                 :CBC          37 : }
                               3531                 :                : 
                               3532                 :                : /*
                               3533                 :                :  * Install a new XLOG segment file as a current or future log segment.
                               3534                 :                :  *
                               3535                 :                :  * This is used both to install a newly-created segment (which has a temp
                               3536                 :                :  * filename while it's being created) and to recycle an old segment.
                               3537                 :                :  *
                               3538                 :                :  * *segno: identify segment to install as (or first possible target).
                               3539                 :                :  * When find_free is true, this is modified on return to indicate the
                               3540                 :                :  * actual installation location or last segment searched.
                               3541                 :                :  *
                               3542                 :                :  * tmppath: initial name of file to install.  It will be renamed into place.
                               3543                 :                :  *
                               3544                 :                :  * find_free: if true, install the new segment at the first empty segno
                               3545                 :                :  * number at or after the passed numbers.  If false, install the new segment
                               3546                 :                :  * exactly where specified, deleting any existing segment file there.
                               3547                 :                :  *
                               3548                 :                :  * max_segno: maximum segment number to install the new file as.  Fail if no
                               3549                 :                :  * free slot is found between *segno and max_segno. (Ignored when find_free
                               3550                 :                :  * is false.)
                               3551                 :                :  *
                               3552                 :                :  * tli: The timeline on which the new segment should be installed.
                               3553                 :                :  *
                               3554                 :                :  * Returns true if the file was installed successfully.  false indicates that
                               3555                 :                :  * max_segno limit was exceeded, the startup process has disabled this
                               3556                 :                :  * function for now, or an error occurred while renaming the file into place.
                               3557                 :                :  */
                               3558                 :                : static bool
 4822 heikki.linnakangas@i     3559                 :           2876 : InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
                               3560                 :                :                        bool find_free, XLogSegNo max_segno, TimeLineID tli)
                               3561                 :                : {
                               3562                 :                :     char        path[MAXPGPATH];
                               3563                 :                :     struct stat stat_buf;
                               3564                 :                : 
 1401 rhaas@postgresql.org     3565         [ -  + ]:           2876 :     Assert(tli != 0);
                               3566                 :                : 
                               3567                 :           2876 :     XLogFilePath(path, tli, *segno, wal_segment_size);
                               3568                 :                : 
 1531 noah@leadboat.com        3569                 :           2876 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               3570         [ -  + ]:           2876 :     if (!XLogCtl->InstallXLogFileSegmentActive)
                               3571                 :                :     {
 1531 noah@leadboat.com        3572                 :UBC           0 :         LWLockRelease(ControlFileLock);
                               3573                 :              0 :         return false;
                               3574                 :                :     }
                               3575                 :                : 
 8815 tgl@sss.pgh.pa.us        3576         [ +  + ]:CBC        2876 :     if (!find_free)
                               3577                 :                :     {
                               3578                 :                :         /* Force installation: get rid of any pre-existing segment file */
 3085 teodor@sigaev.ru         3579                 :             37 :         durable_unlink(path, DEBUG1);
                               3580                 :                :     }
                               3581                 :                :     else
                               3582                 :                :     {
                               3583                 :                :         /* Find a free slot to put it in */
 8260 tgl@sss.pgh.pa.us        3584         [ +  + ]:           3999 :         while (stat(path, &stat_buf) == 0)
                               3585                 :                :         {
 3848 heikki.linnakangas@i     3586         [ +  + ]:           1269 :             if ((*segno) >= max_segno)
                               3587                 :                :             {
                               3588                 :                :                 /* Failed to find a free slot within specified range */
 1531 noah@leadboat.com        3589                 :            109 :                 LWLockRelease(ControlFileLock);
 8815 tgl@sss.pgh.pa.us        3590                 :            109 :                 return false;
                               3591                 :                :             }
 4822 heikki.linnakangas@i     3592                 :           1160 :             (*segno)++;
 1401 rhaas@postgresql.org     3593                 :           1160 :             XLogFilePath(path, tli, *segno, wal_segment_size);
                               3594                 :                :         }
                               3595                 :                :     }
                               3596                 :                : 
 1159 michael@paquier.xyz      3597   [ +  -  -  + ]:           2767 :     Assert(access(path, F_OK) != 0 && errno == ENOENT);
                               3598         [ -  + ]:           2767 :     if (durable_rename(tmppath, path, LOG) != 0)
                               3599                 :                :     {
 1531 noah@leadboat.com        3600                 :UBC           0 :         LWLockRelease(ControlFileLock);
                               3601                 :                :         /* durable_rename already emitted log message */
 5837 heikki.linnakangas@i     3602                 :              0 :         return false;
                               3603                 :                :     }
                               3604                 :                : 
 1531 noah@leadboat.com        3605                 :CBC        2767 :     LWLockRelease(ControlFileLock);
                               3606                 :                : 
 8815 tgl@sss.pgh.pa.us        3607                 :           2767 :     return true;
                               3608                 :                : }
                               3609                 :                : 
                               3610                 :                : /*
                               3611                 :                :  * Open a pre-existing logfile segment for writing.
                               3612                 :                :  */
                               3613                 :                : int
 1401 rhaas@postgresql.org     3614                 :            141 : XLogFileOpen(XLogSegNo segno, TimeLineID tli)
                               3615                 :                : {
                               3616                 :                :     char        path[MAXPGPATH];
                               3617                 :                :     int         fd;
                               3618                 :                : 
                               3619                 :            141 :     XLogFilePath(path, tli, segno, wal_segment_size);
                               3620                 :                : 
  918 tmunro@postgresql.or     3621                 :            141 :     fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC |
  694 nathan@postgresql.or     3622                 :            141 :                        get_sync_bit(wal_sync_method));
 9476 vadim4o@yahoo.com        3623         [ -  + ]:            141 :     if (fd < 0)
 8083 tgl@sss.pgh.pa.us        3624         [ #  # ]:UBC           0 :         ereport(PANIC,
                               3625                 :                :                 (errcode_for_file_access(),
                               3626                 :                :                  errmsg("could not open file \"%s\": %m", path)));
                               3627                 :                : 
 7717 tgl@sss.pgh.pa.us        3628                 :CBC         141 :     return fd;
                               3629                 :                : }
                               3630                 :                : 
                               3631                 :                : /*
                               3632                 :                :  * Close the current logfile segment for writing.
                               3633                 :                :  */
                               3634                 :                : static void
 7023 bruce@momjian.us         3635                 :           6338 : XLogFileClose(void)
                               3636                 :                : {
                               3637         [ -  + ]:           6338 :     Assert(openLogFile >= 0);
                               3638                 :                : 
                               3639                 :                :     /*
                               3640                 :                :      * WAL segment files will not be re-read in normal operation, so we advise
                               3641                 :                :      * the OS to release any cached pages.  But do not do so if WAL archiving
                               3642                 :                :      * or streaming is active, because archiver and walsender process could
                               3643                 :                :      * use the cache to read the WAL segment.
                               3644                 :                :      */
                               3645                 :                : #if defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_DONTNEED)
  882 tmunro@postgresql.or     3646   [ +  +  +  - ]:           6338 :     if (!XLogIsNeeded() && (io_direct_flags & IO_DIRECT_WAL) == 0)
 6082 tgl@sss.pgh.pa.us        3647                 :           1569 :         (void) posix_fadvise(openLogFile, 0, 0, POSIX_FADV_DONTNEED);
                               3648                 :                : #endif
                               3649                 :                : 
 2254 peter@eisentraut.org     3650         [ -  + ]:           6338 :     if (close(openLogFile) != 0)
                               3651                 :                :     {
                               3652                 :                :         char        xlogfname[MAXFNAMELEN];
 2104 michael@paquier.xyz      3653                 :UBC           0 :         int         save_errno = errno;
                               3654                 :                : 
 1401 rhaas@postgresql.org     3655                 :              0 :         XLogFileName(xlogfname, openLogTLI, openLogSegNo, wal_segment_size);
 2104 michael@paquier.xyz      3656                 :              0 :         errno = save_errno;
 7023 bruce@momjian.us         3657         [ #  # ]:              0 :         ereport(PANIC,
                               3658                 :                :                 (errcode_for_file_access(),
                               3659                 :                :                  errmsg("could not close file \"%s\": %m", xlogfname)));
                               3660                 :                :     }
                               3661                 :                : 
 7023 bruce@momjian.us         3662                 :CBC        6338 :     openLogFile = -1;
 2021 tgl@sss.pgh.pa.us        3663                 :           6338 :     ReleaseExternalFD();
 7023 bruce@momjian.us         3664                 :           6338 : }
                               3665                 :                : 
                               3666                 :                : /*
                               3667                 :                :  * Preallocate log files beyond the specified log endpoint.
                               3668                 :                :  *
                               3669                 :                :  * XXX this is currently extremely conservative, since it forces only one
                               3670                 :                :  * future log segment to exist, and even that only if we are 75% done with
                               3671                 :                :  * the current one.  This is only appropriate for very low-WAL-volume systems.
                               3672                 :                :  * High-volume systems will be OK once they've built up a sufficient set of
                               3673                 :                :  * recycled log segments, but the startup transient is likely to include
                               3674                 :                :  * a lot of segment creations by foreground processes, which is not so good.
                               3675                 :                :  *
                               3676                 :                :  * XLogFileInitInternal() can ereport(ERROR).  All known causes indicate big
                               3677                 :                :  * trouble; for example, a full filesystem is one cause.  The checkpoint WAL
                               3678                 :                :  * and/or ControlFile updates already completed.  If a RequestCheckpoint()
                               3679                 :                :  * initiated the present checkpoint and an ERROR ends this function, the
                               3680                 :                :  * command that called RequestCheckpoint() fails.  That's not ideal, but it's
                               3681                 :                :  * not worth contorting more functions to use caller-specified elevel values.
                               3682                 :                :  * (With or without RequestCheckpoint(), an ERROR forestalls some inessential
                               3683                 :                :  * reporting and resource reclamation.)
                               3684                 :                :  */
                               3685                 :                : static void
 1401 rhaas@postgresql.org     3686                 :           1926 : PreallocXlogFiles(XLogRecPtr endptr, TimeLineID tli)
                               3687                 :                : {
                               3688                 :                :     XLogSegNo   _logSegNo;
                               3689                 :                :     int         lf;
                               3690                 :                :     bool        added;
                               3691                 :                :     char        path[MAXPGPATH];
                               3692                 :                :     uint64      offset;
                               3693                 :                : 
 1531 noah@leadboat.com        3694         [ +  + ]:           1926 :     if (!XLogCtl->InstallXLogFileSegmentActive)
                               3695                 :             11 :         return;                 /* unlocked check says no */
                               3696                 :                : 
 2909 andres@anarazel.de       3697                 :           1915 :     XLByteToPrevSeg(endptr, _logSegNo, wal_segment_size);
                               3698                 :           1915 :     offset = XLogSegmentOffset(endptr - 1, wal_segment_size);
                               3699         [ +  + ]:           1915 :     if (offset >= (uint32) (0.75 * wal_segment_size))
                               3700                 :                :     {
 4822 heikki.linnakangas@i     3701                 :            224 :         _logSegNo++;
 1401 rhaas@postgresql.org     3702                 :            224 :         lf = XLogFileInitInternal(_logSegNo, tli, &added, path);
 1531 noah@leadboat.com        3703         [ +  + ]:            224 :         if (lf >= 0)
                               3704                 :            175 :             close(lf);
                               3705         [ +  + ]:            224 :         if (added)
 6643 tgl@sss.pgh.pa.us        3706                 :             49 :             CheckpointStats.ckpt_segs_added++;
                               3707                 :                :     }
                               3708                 :                : }
                               3709                 :                : 
                               3710                 :                : /*
                               3711                 :                :  * Throws an error if the given log segment has already been removed or
                               3712                 :                :  * recycled. The caller should only pass a segment that it knows to have
                               3713                 :                :  * existed while the server has been running, as this function always
                               3714                 :                :  * succeeds if no WAL segments have been removed since startup.
                               3715                 :                :  * 'tli' is only used in the error message.
                               3716                 :                :  *
                               3717                 :                :  * Note: this function guarantees to keep errno unchanged on return.
                               3718                 :                :  * This supports callers that use this to possibly deliver a better
                               3719                 :                :  * error message about a missing file, while still being able to throw
                               3720                 :                :  * a normal file-access error afterwards, if this does return.
                               3721                 :                :  */
                               3722                 :                : void
 4629 heikki.linnakangas@i     3723                 :         122179 : CheckXLogRemoved(XLogSegNo segno, TimeLineID tli)
                               3724                 :                : {
 2833 tgl@sss.pgh.pa.us        3725                 :         122179 :     int         save_errno = errno;
                               3726                 :                :     XLogSegNo   lastRemovedSegNo;
                               3727                 :                : 
 4002 andres@anarazel.de       3728         [ +  + ]:         122179 :     SpinLockAcquire(&XLogCtl->info_lck);
                               3729                 :         122179 :     lastRemovedSegNo = XLogCtl->lastRemovedSegNo;
                               3730                 :         122179 :     SpinLockRelease(&XLogCtl->info_lck);
                               3731                 :                : 
 4629 heikki.linnakangas@i     3732         [ -  + ]:         122179 :     if (segno <= lastRemovedSegNo)
                               3733                 :                :     {
                               3734                 :                :         char        filename[MAXFNAMELEN];
                               3735                 :                : 
 2909 andres@anarazel.de       3736                 :UBC           0 :         XLogFileName(filename, tli, segno, wal_segment_size);
 2833 tgl@sss.pgh.pa.us        3737                 :              0 :         errno = save_errno;
 4629 heikki.linnakangas@i     3738         [ #  # ]:              0 :         ereport(ERROR,
                               3739                 :                :                 (errcode_for_file_access(),
                               3740                 :                :                  errmsg("requested WAL segment %s has already been removed",
                               3741                 :                :                         filename)));
                               3742                 :                :     }
 2833 tgl@sss.pgh.pa.us        3743                 :CBC      122179 :     errno = save_errno;
 5626 heikki.linnakangas@i     3744                 :         122179 : }
                               3745                 :                : 
                               3746                 :                : /*
                               3747                 :                :  * Return the last WAL segment removed, or 0 if no segment has been removed
                               3748                 :                :  * since startup.
                               3749                 :                :  *
                               3750                 :                :  * NB: the result can be out of date arbitrarily fast, the caller has to deal
                               3751                 :                :  * with that.
                               3752                 :                :  */
                               3753                 :                : XLogSegNo
 4205 rhaas@postgresql.org     3754                 :            944 : XLogGetLastRemovedSegno(void)
                               3755                 :                : {
                               3756                 :                :     XLogSegNo   lastRemovedSegNo;
                               3757                 :                : 
 4002 andres@anarazel.de       3758         [ -  + ]:            944 :     SpinLockAcquire(&XLogCtl->info_lck);
                               3759                 :            944 :     lastRemovedSegNo = XLogCtl->lastRemovedSegNo;
                               3760                 :            944 :     SpinLockRelease(&XLogCtl->info_lck);
                               3761                 :                : 
 4205 rhaas@postgresql.org     3762                 :            944 :     return lastRemovedSegNo;
                               3763                 :                : }
                               3764                 :                : 
                               3765                 :                : /*
                               3766                 :                :  * Return the oldest WAL segment on the given TLI that still exists in
                               3767                 :                :  * XLOGDIR, or 0 if none.
                               3768                 :                :  */
                               3769                 :                : XLogSegNo
  626                          3770                 :              5 : XLogGetOldestSegno(TimeLineID tli)
                               3771                 :                : {
                               3772                 :                :     DIR        *xldir;
                               3773                 :                :     struct dirent *xlde;
                               3774                 :              5 :     XLogSegNo   oldest_segno = 0;
                               3775                 :                : 
                               3776                 :              5 :     xldir = AllocateDir(XLOGDIR);
                               3777         [ +  + ]:             33 :     while ((xlde = ReadDir(xldir, XLOGDIR)) != NULL)
                               3778                 :                :     {
                               3779                 :                :         TimeLineID  file_tli;
                               3780                 :                :         XLogSegNo   file_segno;
                               3781                 :                : 
                               3782                 :                :         /* Ignore files that are not XLOG segments. */
                               3783         [ +  + ]:             28 :         if (!IsXLogFileName(xlde->d_name))
                               3784                 :             20 :             continue;
                               3785                 :                : 
                               3786                 :                :         /* Parse filename to get TLI and segno. */
                               3787                 :              8 :         XLogFromFileName(xlde->d_name, &file_tli, &file_segno,
                               3788                 :                :                          wal_segment_size);
                               3789                 :                : 
                               3790                 :                :         /* Ignore anything that's not from the TLI of interest. */
                               3791         [ -  + ]:              8 :         if (tli != file_tli)
  626 rhaas@postgresql.org     3792                 :UBC           0 :             continue;
                               3793                 :                : 
                               3794                 :                :         /* If it's the oldest so far, update oldest_segno. */
  626 rhaas@postgresql.org     3795   [ +  +  +  + ]:CBC           8 :         if (oldest_segno == 0 || file_segno < oldest_segno)
                               3796                 :              6 :             oldest_segno = file_segno;
                               3797                 :                :     }
                               3798                 :                : 
                               3799                 :              5 :     FreeDir(xldir);
                               3800                 :              5 :     return oldest_segno;
                               3801                 :                : }
                               3802                 :                : 
                               3803                 :                : /*
                               3804                 :                :  * Update the last removed segno pointer in shared memory, to reflect that the
                               3805                 :                :  * given XLOG file has been removed.
                               3806                 :                :  */
                               3807                 :                : static void
 5626 heikki.linnakangas@i     3808                 :           2516 : UpdateLastRemovedPtr(char *filename)
                               3809                 :                : {
                               3810                 :                :     uint32      tli;
                               3811                 :                :     XLogSegNo   segno;
                               3812                 :                : 
 2909 andres@anarazel.de       3813                 :           2516 :     XLogFromFileName(filename, &tli, &segno, wal_segment_size);
                               3814                 :                : 
 4002                          3815         [ +  + ]:           2516 :     SpinLockAcquire(&XLogCtl->info_lck);
                               3816         [ +  + ]:           2516 :     if (segno > XLogCtl->lastRemovedSegNo)
                               3817                 :           1112 :         XLogCtl->lastRemovedSegNo = segno;
                               3818                 :           2516 :     SpinLockRelease(&XLogCtl->info_lck);
 5626 heikki.linnakangas@i     3819                 :           2516 : }
                               3820                 :                : 
                               3821                 :                : /*
                               3822                 :                :  * Remove all temporary log files in pg_wal
                               3823                 :                :  *
                               3824                 :                :  * This is called at the beginning of recovery after a previous crash,
                               3825                 :                :  * at a point where no other processes write fresh WAL data.
                               3826                 :                :  */
                               3827                 :                : static void
 2612 michael@paquier.xyz      3828                 :            169 : RemoveTempXlogFiles(void)
                               3829                 :                : {
                               3830                 :                :     DIR        *xldir;
                               3831                 :                :     struct dirent *xlde;
                               3832                 :                : 
                               3833         [ +  + ]:            169 :     elog(DEBUG2, "removing all temporary WAL segments");
                               3834                 :                : 
                               3835                 :            169 :     xldir = AllocateDir(XLOGDIR);
                               3836         [ +  + ]:           1094 :     while ((xlde = ReadDir(xldir, XLOGDIR)) != NULL)
                               3837                 :                :     {
                               3838                 :                :         char        path[MAXPGPATH];
                               3839                 :                : 
                               3840         [ +  - ]:            925 :         if (strncmp(xlde->d_name, "xlogtemp.", 9) != 0)
                               3841                 :            925 :             continue;
                               3842                 :                : 
 2612 michael@paquier.xyz      3843                 :UBC           0 :         snprintf(path, MAXPGPATH, XLOGDIR "/%s", xlde->d_name);
                               3844                 :              0 :         unlink(path);
                               3845         [ #  # ]:              0 :         elog(DEBUG2, "removed temporary WAL segment \"%s\"", path);
                               3846                 :                :     }
 2612 michael@paquier.xyz      3847                 :CBC         169 :     FreeDir(xldir);
                               3848                 :            169 : }
                               3849                 :                : 
                               3850                 :                : /*
                               3851                 :                :  * Recycle or remove all log files older or equal to passed segno.
                               3852                 :                :  *
                               3853                 :                :  * endptr is current (or recent) end of xlog, and lastredoptr is the
                               3854                 :                :  * redo pointer of the last checkpoint. These are used to determine
                               3855                 :                :  * whether we want to recycle rather than delete no-longer-wanted log files.
                               3856                 :                :  *
                               3857                 :                :  * insertTLI is the current timeline for XLOG insertion. Any recycled
                               3858                 :                :  * segments should be reused for this timeline.
                               3859                 :                :  */
                               3860                 :                : static void
 1401 rhaas@postgresql.org     3861                 :           1677 : RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr lastredoptr, XLogRecPtr endptr,
                               3862                 :                :                    TimeLineID insertTLI)
                               3863                 :                : {
                               3864                 :                :     DIR        *xldir;
                               3865                 :                :     struct dirent *xlde;
                               3866                 :                :     char        lastoff[MAXFNAMELEN];
                               3867                 :                :     XLogSegNo   endlogSegNo;
                               3868                 :                :     XLogSegNo   recycleSegNo;
                               3869                 :                : 
                               3870                 :                :     /* Initialize info about where to try to recycle to */
 1695 michael@paquier.xyz      3871                 :           1677 :     XLByteToSeg(endptr, endlogSegNo, wal_segment_size);
                               3872                 :           1677 :     recycleSegNo = XLOGfileslop(lastredoptr);
                               3873                 :                : 
                               3874                 :                :     /*
                               3875                 :                :      * Construct a filename of the last segment to be kept. The timeline ID
                               3876                 :                :      * doesn't matter, we ignore that in the comparison. (During recovery,
                               3877                 :                :      * InsertTimeLineID isn't set, so we can't use that.)
                               3878                 :                :      */
 2909 andres@anarazel.de       3879                 :           1677 :     XLogFileName(lastoff, 0, segno, wal_segment_size);
                               3880                 :                : 
 5486 simon@2ndQuadrant.co     3881         [ +  + ]:           1677 :     elog(DEBUG2, "attempting to remove WAL segments older than log file %s",
                               3882                 :                :          lastoff);
                               3883                 :                : 
 2833 tgl@sss.pgh.pa.us        3884                 :           1677 :     xldir = AllocateDir(XLOGDIR);
                               3885                 :                : 
 7369                          3886         [ +  + ]:          45131 :     while ((xlde = ReadDir(xldir, XLOGDIR)) != NULL)
                               3887                 :                :     {
                               3888                 :                :         /* Ignore files that are not XLOG segments */
 3774 heikki.linnakangas@i     3889         [ +  + ]:          43454 :         if (!IsXLogFileName(xlde->d_name) &&
                               3890         [ +  + ]:           7151 :             !IsPartialXLogFileName(xlde->d_name))
 3799                          3891                 :           7147 :             continue;
                               3892                 :                : 
                               3893                 :                :         /*
                               3894                 :                :          * We ignore the timeline part of the XLOG segment identifiers in
                               3895                 :                :          * deciding whether a segment is still needed.  This ensures that we
                               3896                 :                :          * won't prematurely remove a segment from a parent timeline. We could
                               3897                 :                :          * probably be a little more proactive about removing segments of
                               3898                 :                :          * non-parent timelines, but that would be a whole lot more
                               3899                 :                :          * complicated.
                               3900                 :                :          *
                               3901                 :                :          * We use the alphanumeric sorting property of the filenames to decide
                               3902                 :                :          * which ones are earlier than the lastoff segment.
                               3903                 :                :          */
                               3904         [ +  + ]:          36307 :         if (strcmp(xlde->d_name + 8, lastoff + 8) <= 0)
                               3905                 :                :         {
 4586                          3906         [ +  + ]:          30066 :             if (XLogArchiveCheckDone(xlde->d_name))
                               3907                 :                :             {
                               3908                 :                :                 /* Update the last removed location in shared memory first */
 5626                          3909                 :           2516 :                 UpdateLastRemovedPtr(xlde->d_name);
                               3910                 :                : 
 1100 michael@paquier.xyz      3911                 :           2516 :                 RemoveXlogFile(xlde, recycleSegNo, &endlogSegNo, insertTLI);
                               3912                 :                :             }
                               3913                 :                :         }
                               3914                 :                :     }
                               3915                 :                : 
 3799 heikki.linnakangas@i     3916                 :           1677 :     FreeDir(xldir);
                               3917                 :           1677 : }
                               3918                 :                : 
                               3919                 :                : /*
                               3920                 :                :  * Recycle or remove WAL files that are not part of the given timeline's
                               3921                 :                :  * history.
                               3922                 :                :  *
                               3923                 :                :  * This is called during recovery, whenever we switch to follow a new
                               3924                 :                :  * timeline, and at the end of recovery when we create a new timeline. We
                               3925                 :                :  * wouldn't otherwise care about extra WAL files lying in pg_wal, but they
                               3926                 :                :  * might be leftover pre-allocated or recycled WAL segments on the old timeline
                               3927                 :                :  * that we haven't used yet, and contain garbage. If we just leave them in
                               3928                 :                :  * pg_wal, they will eventually be archived, and we can't let that happen.
                               3929                 :                :  * Files that belong to our timeline history are valid, because we have
                               3930                 :                :  * successfully replayed them, but from others we can't be sure.
                               3931                 :                :  *
                               3932                 :                :  * 'switchpoint' is the current point in WAL where we switch to new timeline,
                               3933                 :                :  * and 'newTLI' is the new timeline we switch to.
                               3934                 :                :  */
                               3935                 :                : void
                               3936                 :             57 : RemoveNonParentXlogFiles(XLogRecPtr switchpoint, TimeLineID newTLI)
                               3937                 :                : {
                               3938                 :                :     DIR        *xldir;
                               3939                 :                :     struct dirent *xlde;
                               3940                 :                :     char        switchseg[MAXFNAMELEN];
                               3941                 :                :     XLogSegNo   endLogSegNo;
                               3942                 :                :     XLogSegNo   switchLogSegNo;
                               3943                 :                :     XLogSegNo   recycleSegNo;
                               3944                 :                : 
                               3945                 :                :     /*
                               3946                 :                :      * Initialize info about where to begin the work.  This will recycle,
                               3947                 :                :      * somewhat arbitrarily, 10 future segments.
                               3948                 :                :      */
 1695 michael@paquier.xyz      3949                 :             57 :     XLByteToPrevSeg(switchpoint, switchLogSegNo, wal_segment_size);
                               3950                 :             57 :     XLByteToSeg(switchpoint, endLogSegNo, wal_segment_size);
                               3951                 :             57 :     recycleSegNo = endLogSegNo + 10;
                               3952                 :                : 
                               3953                 :                :     /*
                               3954                 :                :      * Construct a filename of the last segment to be kept.
                               3955                 :                :      */
                               3956                 :             57 :     XLogFileName(switchseg, newTLI, switchLogSegNo, wal_segment_size);
                               3957                 :                : 
 3799 heikki.linnakangas@i     3958         [ +  + ]:             57 :     elog(DEBUG2, "attempting to remove WAL segments newer than log file %s",
                               3959                 :                :          switchseg);
                               3960                 :                : 
 2833 tgl@sss.pgh.pa.us        3961                 :             57 :     xldir = AllocateDir(XLOGDIR);
                               3962                 :                : 
 3799 heikki.linnakangas@i     3963         [ +  + ]:            541 :     while ((xlde = ReadDir(xldir, XLOGDIR)) != NULL)
                               3964                 :                :     {
                               3965                 :                :         /* Ignore files that are not XLOG segments */
 3774                          3966         [ +  + ]:            484 :         if (!IsXLogFileName(xlde->d_name))
 3799                          3967                 :            303 :             continue;
                               3968                 :                : 
                               3969                 :                :         /*
                               3970                 :                :          * Remove files that are on a timeline older than the new one we're
                               3971                 :                :          * switching to, but with a segment number >= the first segment on the
                               3972                 :                :          * new timeline.
                               3973                 :                :          */
                               3974         [ +  + ]:            181 :         if (strncmp(xlde->d_name, switchseg, 8) < 0 &&
                               3975         [ +  + ]:            117 :             strcmp(xlde->d_name + 8, switchseg + 8) > 0)
                               3976                 :                :         {
                               3977                 :                :             /*
                               3978                 :                :              * If the file has already been marked as .ready, however, don't
                               3979                 :                :              * remove it yet. It should be OK to remove it - files that are
                               3980                 :                :              * not part of our timeline history are not required for recovery
                               3981                 :                :              * - but seems safer to let them be archived and removed later.
                               3982                 :                :              */
                               3983         [ +  - ]:             14 :             if (!XLogArchiveIsReady(xlde->d_name))
 1100 michael@paquier.xyz      3984                 :             14 :                 RemoveXlogFile(xlde, recycleSegNo, &endLogSegNo, newTLI);
                               3985                 :                :         }
                               3986                 :                :     }
                               3987                 :                : 
 3799 heikki.linnakangas@i     3988                 :             57 :     FreeDir(xldir);
                               3989                 :             57 : }
                               3990                 :                : 
                               3991                 :                : /*
                               3992                 :                :  * Recycle or remove a log file that's no longer needed.
                               3993                 :                :  *
                               3994                 :                :  * segment_de is the dirent structure of the segment to recycle or remove.
                               3995                 :                :  * recycleSegNo is the segment number to recycle up to.  endlogSegNo is
                               3996                 :                :  * the segment number of the current (or recent) end of WAL.
                               3997                 :                :  *
                               3998                 :                :  * endlogSegNo gets incremented if the segment is recycled so as it is not
                               3999                 :                :  * checked again with future callers of this function.
                               4000                 :                :  *
                               4001                 :                :  * insertTLI is the current timeline for XLOG insertion. Any recycled segments
                               4002                 :                :  * should be used for this timeline.
                               4003                 :                :  */
                               4004                 :                : static void
 1100 michael@paquier.xyz      4005                 :           2530 : RemoveXlogFile(const struct dirent *segment_de,
                               4006                 :                :                XLogSegNo recycleSegNo, XLogSegNo *endlogSegNo,
                               4007                 :                :                TimeLineID insertTLI)
                               4008                 :                : {
                               4009                 :                :     char        path[MAXPGPATH];
                               4010                 :                : #ifdef WIN32
                               4011                 :                :     char        newpath[MAXPGPATH];
                               4012                 :                : #endif
                               4013                 :           2530 :     const char *segname = segment_de->d_name;
                               4014                 :                : 
 3799 heikki.linnakangas@i     4015                 :           2530 :     snprintf(path, MAXPGPATH, XLOGDIR "/%s", segname);
                               4016                 :                : 
                               4017                 :                :     /*
                               4018                 :                :      * Before deleting the file, see if it can be recycled as a future log
                               4019                 :                :      * segment. Only recycle normal files, because we don't want to recycle
                               4020                 :                :      * symbolic links pointing to a separate archive directory.
                               4021                 :                :      */
 2349 tmunro@postgresql.or     4022         [ +  - ]:           2530 :     if (wal_recycle &&
 1695 michael@paquier.xyz      4023         [ +  + ]:           2530 :         *endlogSegNo <= recycleSegNo &&
 1531 noah@leadboat.com        4024   [ +  +  +  - ]:           3292 :         XLogCtl->InstallXLogFileSegmentActive && /* callee rechecks this */
 1100 michael@paquier.xyz      4025         [ +  + ]:           2964 :         get_dirent_type(path, segment_de, false, DEBUG2) == PGFILETYPE_REG &&
 1695                          4026                 :           1482 :         InstallXLogFileSegment(endlogSegNo, path,
                               4027                 :                :                                true, recycleSegNo, insertTLI))
                               4028                 :                :     {
 3799 heikki.linnakangas@i     4029         [ +  + ]:           1373 :         ereport(DEBUG2,
                               4030                 :                :                 (errmsg_internal("recycled write-ahead log file \"%s\"",
                               4031                 :                :                                  segname)));
                               4032                 :           1373 :         CheckpointStats.ckpt_segs_recycled++;
                               4033                 :                :         /* Needn't recheck that slot on future iterations */
 1695 michael@paquier.xyz      4034                 :           1373 :         (*endlogSegNo)++;
                               4035                 :                :     }
                               4036                 :                :     else
                               4037                 :                :     {
                               4038                 :                :         /* No need for any more future segments, or recycling failed ... */
                               4039                 :                :         int         rc;
                               4040                 :                : 
 3799 heikki.linnakangas@i     4041         [ +  + ]:           1157 :         ereport(DEBUG2,
                               4042                 :                :                 (errmsg_internal("removing write-ahead log file \"%s\"",
                               4043                 :                :                                  segname)));
                               4044                 :                : 
                               4045                 :                : #ifdef WIN32
                               4046                 :                : 
                               4047                 :                :         /*
                               4048                 :                :          * On Windows, if another process (e.g another backend) holds the file
                               4049                 :                :          * open in FILE_SHARE_DELETE mode, unlink will succeed, but the file
                               4050                 :                :          * will still show up in directory listing until the last handle is
                               4051                 :                :          * closed. To avoid confusing the lingering deleted file for a live
                               4052                 :                :          * WAL file that needs to be archived, rename it before deleting it.
                               4053                 :                :          *
                               4054                 :                :          * If another process holds the file open without FILE_SHARE_DELETE
                               4055                 :                :          * flag, rename will fail. We'll try again at the next checkpoint.
                               4056                 :                :          */
                               4057                 :                :         snprintf(newpath, MAXPGPATH, "%s.deleted", path);
                               4058                 :                :         if (rename(path, newpath) != 0)
                               4059                 :                :         {
                               4060                 :                :             ereport(LOG,
                               4061                 :                :                     (errcode_for_file_access(),
                               4062                 :                :                      errmsg("could not rename file \"%s\": %m",
                               4063                 :                :                             path)));
                               4064                 :                :             return;
                               4065                 :                :         }
                               4066                 :                :         rc = durable_unlink(newpath, LOG);
                               4067                 :                : #else
 3085 teodor@sigaev.ru         4068                 :           1157 :         rc = durable_unlink(path, LOG);
                               4069                 :                : #endif
 3799 heikki.linnakangas@i     4070         [ -  + ]:           1157 :         if (rc != 0)
                               4071                 :                :         {
                               4072                 :                :             /* Message already logged by durable_unlink() */
 3799 heikki.linnakangas@i     4073                 :UBC           0 :             return;
                               4074                 :                :         }
 3799 heikki.linnakangas@i     4075                 :CBC        1157 :         CheckpointStats.ckpt_segs_removed++;
                               4076                 :                :     }
                               4077                 :                : 
                               4078                 :           2530 :     XLogArchiveCleanup(segname);
                               4079                 :                : }
                               4080                 :                : 
                               4081                 :                : /*
                               4082                 :                :  * Verify whether pg_wal, pg_wal/archive_status, and pg_wal/summaries exist.
                               4083                 :                :  * If the latter do not exist, recreate them.
                               4084                 :                :  *
                               4085                 :                :  * It is not the goal of this function to verify the contents of these
                               4086                 :                :  * directories, but to help in cases where someone has performed a cluster
                               4087                 :                :  * copy for PITR purposes but omitted pg_wal from the copy.
                               4088                 :                :  *
                               4089                 :                :  * We could also recreate pg_wal if it doesn't exist, but a deliberate
                               4090                 :                :  * policy decision was made not to.  It is fairly common for pg_wal to be
                               4091                 :                :  * a symlink, and if that was the DBA's intent then automatically making a
                               4092                 :                :  * plain directory would result in degraded performance with no notice.
                               4093                 :                :  */
                               4094                 :                : static void
 6145 tgl@sss.pgh.pa.us        4095                 :            887 : ValidateXLOGDirectoryStructure(void)
                               4096                 :                : {
                               4097                 :                :     char        path[MAXPGPATH];
                               4098                 :                :     struct stat stat_buf;
                               4099                 :                : 
                               4100                 :                :     /* Check for pg_wal; if it doesn't exist, error out */
                               4101         [ +  - ]:            887 :     if (stat(XLOGDIR, &stat_buf) != 0 ||
                               4102         [ -  + ]:            887 :         !S_ISDIR(stat_buf.st_mode))
 5931 bruce@momjian.us         4103         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4104                 :                :                 (errcode_for_file_access(),
                               4105                 :                :                  errmsg("required WAL directory \"%s\" does not exist",
                               4106                 :                :                         XLOGDIR)));
                               4107                 :                : 
                               4108                 :                :     /* Check for archive_status */
 6145 tgl@sss.pgh.pa.us        4109                 :CBC         887 :     snprintf(path, MAXPGPATH, XLOGDIR "/archive_status");
                               4110         [ +  + ]:            887 :     if (stat(path, &stat_buf) == 0)
                               4111                 :                :     {
                               4112                 :                :         /* Check for weird cases where it exists but isn't a directory */
                               4113         [ -  + ]:            886 :         if (!S_ISDIR(stat_buf.st_mode))
 5931 bruce@momjian.us         4114         [ #  # ]:UBC           0 :             ereport(FATAL,
                               4115                 :                :                     (errcode_for_file_access(),
                               4116                 :                :                      errmsg("required WAL directory \"%s\" does not exist",
                               4117                 :                :                             path)));
                               4118                 :                :     }
                               4119                 :                :     else
                               4120                 :                :     {
 6145 tgl@sss.pgh.pa.us        4121         [ +  - ]:CBC           1 :         ereport(LOG,
                               4122                 :                :                 (errmsg("creating missing WAL directory \"%s\"", path)));
 2709 sfrost@snowman.net       4123         [ -  + ]:              1 :         if (MakePGDirectory(path) < 0)
 5931 bruce@momjian.us         4124         [ #  # ]:UBC           0 :             ereport(FATAL,
                               4125                 :                :                     (errcode_for_file_access(),
                               4126                 :                :                      errmsg("could not create missing directory \"%s\": %m",
                               4127                 :                :                             path)));
                               4128                 :                :     }
                               4129                 :                : 
                               4130                 :                :     /* Check for summaries */
  626 rhaas@postgresql.org     4131                 :CBC         887 :     snprintf(path, MAXPGPATH, XLOGDIR "/summaries");
                               4132         [ +  + ]:            887 :     if (stat(path, &stat_buf) == 0)
                               4133                 :                :     {
                               4134                 :                :         /* Check for weird cases where it exists but isn't a directory */
                               4135         [ -  + ]:            886 :         if (!S_ISDIR(stat_buf.st_mode))
  626 rhaas@postgresql.org     4136         [ #  # ]:UBC           0 :             ereport(FATAL,
                               4137                 :                :                     (errmsg("required WAL directory \"%s\" does not exist",
                               4138                 :                :                             path)));
                               4139                 :                :     }
                               4140                 :                :     else
                               4141                 :                :     {
  626 rhaas@postgresql.org     4142         [ +  - ]:CBC           1 :         ereport(LOG,
                               4143                 :                :                 (errmsg("creating missing WAL directory \"%s\"", path)));
                               4144         [ -  + ]:              1 :         if (MakePGDirectory(path) < 0)
  626 rhaas@postgresql.org     4145         [ #  # ]:UBC           0 :             ereport(FATAL,
                               4146                 :                :                     (errmsg("could not create missing directory \"%s\": %m",
                               4147                 :                :                             path)));
                               4148                 :                :     }
 6145 tgl@sss.pgh.pa.us        4149                 :CBC         887 : }
                               4150                 :                : 
                               4151                 :                : /*
                               4152                 :                :  * Remove previous backup history files.  This also retries creation of
                               4153                 :                :  * .ready files for any backup history files for which XLogArchiveNotify
                               4154                 :                :  * failed earlier.
                               4155                 :                :  */
                               4156                 :                : static void
 7016                          4157                 :            155 : CleanupBackupHistory(void)
                               4158                 :                : {
                               4159                 :                :     DIR        *xldir;
                               4160                 :                :     struct dirent *xlde;
                               4161                 :                :     char        path[MAXPGPATH + sizeof(XLOGDIR)];
                               4162                 :                : 
 7369                          4163                 :            155 :     xldir = AllocateDir(XLOGDIR);
                               4164                 :                : 
                               4165         [ +  + ]:           1552 :     while ((xlde = ReadDir(xldir, XLOGDIR)) != NULL)
                               4166                 :                :     {
 3774 heikki.linnakangas@i     4167         [ +  + ]:           1242 :         if (IsBackupHistoryFileName(xlde->d_name))
                               4168                 :                :         {
 6207 tgl@sss.pgh.pa.us        4169         [ +  + ]:            163 :             if (XLogArchiveCheckDone(xlde->d_name))
                               4170                 :                :             {
 3039 peter_e@gmx.net          4171         [ +  + ]:            131 :                 elog(DEBUG2, "removing WAL backup history file \"%s\"",
                               4172                 :                :                      xlde->d_name);
 3070                          4173                 :            131 :                 snprintf(path, sizeof(path), XLOGDIR "/%s", xlde->d_name);
 7388 bruce@momjian.us         4174                 :            131 :                 unlink(path);
                               4175                 :            131 :                 XLogArchiveCleanup(xlde->d_name);
                               4176                 :                :             }
                               4177                 :                :         }
                               4178                 :                :     }
                               4179                 :                : 
                               4180                 :            155 :     FreeDir(xldir);
                               4181                 :            155 : }
                               4182                 :                : 
                               4183                 :                : /*
                               4184                 :                :  * I/O routines for pg_control
                               4185                 :                :  *
                               4186                 :                :  * *ControlFile is a buffer in shared memory that holds an image of the
                               4187                 :                :  * contents of pg_control.  WriteControlFile() initializes pg_control
                               4188                 :                :  * given a preloaded buffer, ReadControlFile() loads the buffer from
                               4189                 :                :  * the pg_control file (during postmaster or standalone-backend startup),
                               4190                 :                :  * and UpdateControlFile() rewrites pg_control after we modify xlog state.
                               4191                 :                :  * InitControlFile() fills the buffer with initial values.
                               4192                 :                :  *
                               4193                 :                :  * For simplicity, WriteControlFile() initializes the fields of pg_control
                               4194                 :                :  * that are related to checking backend/database compatibility, and
                               4195                 :                :  * ReadControlFile() verifies they are correct.  We could split out the
                               4196                 :                :  * I/O and compatibility-check functions, but there seems no need currently.
                               4197                 :                :  */
                               4198                 :                : 
                               4199                 :                : static void
  410 peter@eisentraut.org     4200                 :             50 : InitControlFile(uint64 sysidentifier, uint32 data_checksum_version)
                               4201                 :                : {
                               4202                 :                :     char        mock_auth_nonce[MOCK_AUTH_NONCE_LEN];
                               4203                 :                : 
                               4204                 :                :     /*
                               4205                 :                :      * Generate a random nonce. This is used for authentication requests that
                               4206                 :                :      * will fail because the user does not exist. The nonce is used to create
                               4207                 :                :      * a genuine-looking password challenge for the non-existent user, in lieu
                               4208                 :                :      * of an actual stored password.
                               4209                 :                :      */
 1298 heikki.linnakangas@i     4210         [ -  + ]:             50 :     if (!pg_strong_random(mock_auth_nonce, MOCK_AUTH_NONCE_LEN))
 1298 heikki.linnakangas@i     4211         [ #  # ]:UBC           0 :         ereport(PANIC,
                               4212                 :                :                 (errcode(ERRCODE_INTERNAL_ERROR),
                               4213                 :                :                  errmsg("could not generate secret authorization token")));
                               4214                 :                : 
 1298 heikki.linnakangas@i     4215                 :CBC          50 :     memset(ControlFile, 0, sizeof(ControlFileData));
                               4216                 :                :     /* Initialize pg_control status fields */
                               4217                 :             50 :     ControlFile->system_identifier = sysidentifier;
                               4218                 :             50 :     memcpy(ControlFile->mock_authentication_nonce, mock_auth_nonce, MOCK_AUTH_NONCE_LEN);
                               4219                 :             50 :     ControlFile->state = DB_SHUTDOWNED;
                               4220                 :             50 :     ControlFile->unloggedLSN = FirstNormalUnloggedLSN;
                               4221                 :                : 
                               4222                 :                :     /* Set important parameter values for use when replaying WAL */
 2028 peter@eisentraut.org     4223                 :             50 :     ControlFile->MaxConnections = MaxConnections;
                               4224                 :             50 :     ControlFile->max_worker_processes = max_worker_processes;
                               4225                 :             50 :     ControlFile->max_wal_senders = max_wal_senders;
                               4226                 :             50 :     ControlFile->max_prepared_xacts = max_prepared_xacts;
                               4227                 :             50 :     ControlFile->max_locks_per_xact = max_locks_per_xact;
                               4228                 :             50 :     ControlFile->wal_level = wal_level;
                               4229                 :             50 :     ControlFile->wal_log_hints = wal_log_hints;
                               4230                 :             50 :     ControlFile->track_commit_timestamp = track_commit_timestamp;
  410                          4231                 :             50 :     ControlFile->data_checksum_version = data_checksum_version;
 2028                          4232                 :             50 : }
                               4233                 :                : 
                               4234                 :                : static void
 9051 tgl@sss.pgh.pa.us        4235                 :             50 : WriteControlFile(void)
                               4236                 :                : {
                               4237                 :                :     int         fd;
                               4238                 :                :     char        buffer[PG_CONTROL_FILE_SIZE];   /* need not be aligned */
                               4239                 :                : 
                               4240                 :                :     /*
                               4241                 :                :      * Initialize version and compatibility-check fields
                               4242                 :                :      */
 8943                          4243                 :             50 :     ControlFile->pg_control_version = PG_CONTROL_VERSION;
                               4244                 :             50 :     ControlFile->catalog_version_no = CATALOG_VERSION_NO;
                               4245                 :                : 
 7278                          4246                 :             50 :     ControlFile->maxAlign = MAXIMUM_ALIGNOF;
                               4247                 :             50 :     ControlFile->floatFormat = FLOATFORMAT_VALUE;
                               4248                 :                : 
 9051                          4249                 :             50 :     ControlFile->blcksz = BLCKSZ;
                               4250                 :             50 :     ControlFile->relseg_size = RELSEG_SIZE;
 7096                          4251                 :             50 :     ControlFile->xlog_blcksz = XLOG_BLCKSZ;
 2909 andres@anarazel.de       4252                 :             50 :     ControlFile->xlog_seg_size = wal_segment_size;
                               4253                 :                : 
 8539 lockhart@fourpalms.o     4254                 :             50 :     ControlFile->nameDataLen = NAMEDATALEN;
 7466 tgl@sss.pgh.pa.us        4255                 :             50 :     ControlFile->indexMaxKeys = INDEX_MAX_KEYS;
                               4256                 :                : 
 6731                          4257                 :             50 :     ControlFile->toast_max_chunk_size = TOAST_MAX_CHUNK_SIZE;
 4111                          4258                 :             50 :     ControlFile->loblksize = LOBLKSIZE;
                               4259                 :                : 
   24 tgl@sss.pgh.pa.us        4260                 :GNC          50 :     ControlFile->float8ByVal = true; /* vestigial */
                               4261                 :                : 
                               4262                 :                :     /*
                               4263                 :                :      * Initialize the default 'char' signedness.
                               4264                 :                :      *
                               4265                 :                :      * The signedness of the char type is implementation-defined. For instance
                               4266                 :                :      * on x86 architecture CPUs, the char data type is typically treated as
                               4267                 :                :      * signed by default, whereas on aarch architecture CPUs, it is typically
                               4268                 :                :      * treated as unsigned by default. In v17 or earlier, we accidentally let
                               4269                 :                :      * C implementation signedness affect persistent data. This led to
                               4270                 :                :      * inconsistent results when comparing char data across different
                               4271                 :                :      * platforms.
                               4272                 :                :      *
                               4273                 :                :      * This flag can be used as a hint to ensure consistent behavior for
                               4274                 :                :      * pre-v18 data files that store data sorted by the 'char' type on disk,
                               4275                 :                :      * especially in cross-platform replication scenarios.
                               4276                 :                :      *
                               4277                 :                :      * Newly created database clusters unconditionally set the default char
                               4278                 :                :      * signedness to true. pg_upgrade changes this flag for clusters that were
                               4279                 :                :      * initialized on signedness=false platforms. As a result,
                               4280                 :                :      * signedness=false setting will become rare over time. If we had known
                               4281                 :                :      * about this problem during the last development cycle that forced initdb
                               4282                 :                :      * (v8.3), we would have made all clusters signed or all clusters
                               4283                 :                :      * unsigned. Making pg_upgrade the only source of signedness=false will
                               4284                 :                :      * cause the population of database clusters to converge toward that
                               4285                 :                :      * retrospective ideal.
                               4286                 :                :      */
  197 msawada@postgresql.o     4287                 :CBC          50 :     ControlFile->default_char_signedness = true;
                               4288                 :                : 
                               4289                 :                :     /* Contents are protected with a CRC */
 3959 heikki.linnakangas@i     4290                 :             50 :     INIT_CRC32C(ControlFile->crc);
                               4291                 :             50 :     COMP_CRC32C(ControlFile->crc,
                               4292                 :                :                 ControlFile,
                               4293                 :                :                 offsetof(ControlFileData, crc));
                               4294                 :             50 :     FIN_CRC32C(ControlFile->crc);
                               4295                 :                : 
                               4296                 :                :     /*
                               4297                 :                :      * We write out PG_CONTROL_FILE_SIZE bytes into pg_control, zero-padding
                               4298                 :                :      * the excess over sizeof(ControlFileData).  This reduces the odds of
                               4299                 :                :      * premature-EOF errors when reading pg_control.  We'll still fail when we
                               4300                 :                :      * check the contents of the file, but hopefully with a more specific
                               4301                 :                :      * error than "couldn't read pg_control".
                               4302                 :                :      */
 2971 tgl@sss.pgh.pa.us        4303                 :             50 :     memset(buffer, 0, PG_CONTROL_FILE_SIZE);
 9051                          4304                 :             50 :     memcpy(buffer, ControlFile, sizeof(ControlFileData));
                               4305                 :                : 
 7369                          4306                 :             50 :     fd = BasicOpenFile(XLOG_CONTROL_FILE,
                               4307                 :                :                        O_RDWR | O_CREAT | O_EXCL | PG_BINARY);
 9051                          4308         [ -  + ]:             50 :     if (fd < 0)
 8083 tgl@sss.pgh.pa.us        4309         [ #  # ]:UBC           0 :         ereport(PANIC,
                               4310                 :                :                 (errcode_for_file_access(),
                               4311                 :                :                  errmsg("could not create file \"%s\": %m",
                               4312                 :                :                         XLOG_CONTROL_FILE)));
                               4313                 :                : 
 8858 tgl@sss.pgh.pa.us        4314                 :CBC          50 :     errno = 0;
 3094 rhaas@postgresql.org     4315                 :             50 :     pgstat_report_wait_start(WAIT_EVENT_CONTROL_FILE_WRITE);
 2971 tgl@sss.pgh.pa.us        4316         [ -  + ]:             50 :     if (write(fd, buffer, PG_CONTROL_FILE_SIZE) != PG_CONTROL_FILE_SIZE)
                               4317                 :                :     {
                               4318                 :                :         /* if write didn't set errno, assume problem is no disk space */
 8858 tgl@sss.pgh.pa.us        4319         [ #  # ]:UBC           0 :         if (errno == 0)
                               4320                 :              0 :             errno = ENOSPC;
 8083                          4321         [ #  # ]:              0 :         ereport(PANIC,
                               4322                 :                :                 (errcode_for_file_access(),
                               4323                 :                :                  errmsg("could not write to file \"%s\": %m",
                               4324                 :                :                         XLOG_CONTROL_FILE)));
                               4325                 :                :     }
 3094 rhaas@postgresql.org     4326                 :CBC          50 :     pgstat_report_wait_end();
                               4327                 :                : 
                               4328                 :             50 :     pgstat_report_wait_start(WAIT_EVENT_CONTROL_FILE_SYNC);
 9038 tgl@sss.pgh.pa.us        4329         [ -  + ]:             50 :     if (pg_fsync(fd) != 0)
 8083 tgl@sss.pgh.pa.us        4330         [ #  # ]:UBC           0 :         ereport(PANIC,
                               4331                 :                :                 (errcode_for_file_access(),
                               4332                 :                :                  errmsg("could not fsync file \"%s\": %m",
                               4333                 :                :                         XLOG_CONTROL_FILE)));
 3094 rhaas@postgresql.org     4334                 :CBC          50 :     pgstat_report_wait_end();
                               4335                 :                : 
 2254 peter@eisentraut.org     4336         [ -  + ]:             50 :     if (close(fd) != 0)
 7894 tgl@sss.pgh.pa.us        4337         [ #  # ]:UBC           0 :         ereport(PANIC,
                               4338                 :                :                 (errcode_for_file_access(),
                               4339                 :                :                  errmsg("could not close file \"%s\": %m",
                               4340                 :                :                         XLOG_CONTROL_FILE)));
 9051 tgl@sss.pgh.pa.us        4341                 :CBC          50 : }
                               4342                 :                : 
                               4343                 :                : static void
                               4344                 :            937 : ReadControlFile(void)
                               4345                 :                : {
                               4346                 :                :     pg_crc32c   crc;
                               4347                 :                :     int         fd;
                               4348                 :                :     char        wal_segsz_str[20];
                               4349                 :                :     int         r;
                               4350                 :                : 
                               4351                 :                :     /*
                               4352                 :                :      * Read data...
                               4353                 :                :      */
 7369                          4354                 :            937 :     fd = BasicOpenFile(XLOG_CONTROL_FILE,
                               4355                 :                :                        O_RDWR | PG_BINARY);
 9051                          4356         [ -  + ]:            937 :     if (fd < 0)
 8083 tgl@sss.pgh.pa.us        4357         [ #  # ]:UBC           0 :         ereport(PANIC,
                               4358                 :                :                 (errcode_for_file_access(),
                               4359                 :                :                  errmsg("could not open file \"%s\": %m",
                               4360                 :                :                         XLOG_CONTROL_FILE)));
                               4361                 :                : 
 3094 rhaas@postgresql.org     4362                 :CBC         937 :     pgstat_report_wait_start(WAIT_EVENT_CONTROL_FILE_READ);
 2668 magnus@hagander.net      4363                 :            937 :     r = read(fd, ControlFile, sizeof(ControlFileData));
                               4364         [ -  + ]:            937 :     if (r != sizeof(ControlFileData))
                               4365                 :                :     {
 2668 magnus@hagander.net      4366         [ #  # ]:UBC           0 :         if (r < 0)
                               4367         [ #  # ]:              0 :             ereport(PANIC,
                               4368                 :                :                     (errcode_for_file_access(),
                               4369                 :                :                      errmsg("could not read file \"%s\": %m",
                               4370                 :                :                             XLOG_CONTROL_FILE)));
                               4371                 :                :         else
                               4372         [ #  # ]:              0 :             ereport(PANIC,
                               4373                 :                :                     (errcode(ERRCODE_DATA_CORRUPTED),
                               4374                 :                :                      errmsg("could not read file \"%s\": read %d of %zu",
                               4375                 :                :                             XLOG_CONTROL_FILE, r, sizeof(ControlFileData))));
                               4376                 :                :     }
 3094 rhaas@postgresql.org     4377                 :CBC         937 :     pgstat_report_wait_end();
                               4378                 :                : 
 9051 tgl@sss.pgh.pa.us        4379                 :            937 :     close(fd);
                               4380                 :                : 
                               4381                 :                :     /*
                               4382                 :                :      * Check for expected pg_control format version.  If this is wrong, the
                               4383                 :                :      * CRC check will likely fail because we'll be checking the wrong number
                               4384                 :                :      * of bytes.  Complaining about wrong version will probably be more
                               4385                 :                :      * enlightening than complaining about wrong CRC.
                               4386                 :                :      */
                               4387                 :                : 
 6438 peter_e@gmx.net          4388   [ -  +  -  -  :            937 :     if (ControlFile->pg_control_version != PG_CONTROL_VERSION && ControlFile->pg_control_version % 65536 == 0 && ControlFile->pg_control_version / 65536 != 0)
                                              -  - ]
 6438 peter_e@gmx.net          4389         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4390                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4391                 :                :                  errmsg("database files are incompatible with server"),
                               4392                 :                :                  errdetail("The database cluster was initialized with PG_CONTROL_VERSION %d (0x%08x),"
                               4393                 :                :                            " but the server was compiled with PG_CONTROL_VERSION %d (0x%08x).",
                               4394                 :                :                            ControlFile->pg_control_version, ControlFile->pg_control_version,
                               4395                 :                :                            PG_CONTROL_VERSION, PG_CONTROL_VERSION),
                               4396                 :                :                  errhint("This could be a problem of mismatched byte ordering.  It looks like you need to initdb.")));
                               4397                 :                : 
 8943 tgl@sss.pgh.pa.us        4398         [ -  + ]:CBC         937 :     if (ControlFile->pg_control_version != PG_CONTROL_VERSION)
 8083 tgl@sss.pgh.pa.us        4399         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4400                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4401                 :                :                  errmsg("database files are incompatible with server"),
                               4402                 :                :                  errdetail("The database cluster was initialized with PG_CONTROL_VERSION %d,"
                               4403                 :                :                            " but the server was compiled with PG_CONTROL_VERSION %d.",
                               4404                 :                :                            ControlFile->pg_control_version, PG_CONTROL_VERSION),
                               4405                 :                :                  errhint("It looks like you need to initdb.")));
                               4406                 :                : 
                               4407                 :                :     /* Now check the CRC. */
 3959 heikki.linnakangas@i     4408                 :CBC         937 :     INIT_CRC32C(crc);
                               4409                 :            937 :     COMP_CRC32C(crc,
                               4410                 :                :                 ControlFile,
                               4411                 :                :                 offsetof(ControlFileData, crc));
                               4412                 :            937 :     FIN_CRC32C(crc);
                               4413                 :                : 
                               4414         [ -  + ]:            937 :     if (!EQ_CRC32C(crc, ControlFile->crc))
 8083 tgl@sss.pgh.pa.us        4415         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4416                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4417                 :                :                  errmsg("incorrect checksum in control file")));
                               4418                 :                : 
                               4419                 :                :     /*
                               4420                 :                :      * Do compatibility checking immediately.  If the database isn't
                               4421                 :                :      * compatible with the backend executable, we want to abort before we can
                               4422                 :                :      * possibly do any damage.
                               4423                 :                :      */
 8943 tgl@sss.pgh.pa.us        4424         [ -  + ]:CBC         937 :     if (ControlFile->catalog_version_no != CATALOG_VERSION_NO)
 8083 tgl@sss.pgh.pa.us        4425         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4426                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4427                 :                :                  errmsg("database files are incompatible with server"),
                               4428                 :                :         /* translator: %s is a variable name and %d is its value */
                               4429                 :                :                  errdetail("The database cluster was initialized with %s %d,"
                               4430                 :                :                            " but the server was compiled with %s %d.",
                               4431                 :                :                            "CATALOG_VERSION_NO", ControlFile->catalog_version_no,
                               4432                 :                :                            "CATALOG_VERSION_NO", CATALOG_VERSION_NO),
                               4433                 :                :                  errhint("It looks like you need to initdb.")));
 7278 tgl@sss.pgh.pa.us        4434         [ -  + ]:CBC         937 :     if (ControlFile->maxAlign != MAXIMUM_ALIGNOF)
 7278 tgl@sss.pgh.pa.us        4435         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4436                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4437                 :                :                  errmsg("database files are incompatible with server"),
                               4438                 :                :         /* translator: %s is a variable name and %d is its value */
                               4439                 :                :                  errdetail("The database cluster was initialized with %s %d,"
                               4440                 :                :                            " but the server was compiled with %s %d.",
                               4441                 :                :                            "MAXALIGN", ControlFile->maxAlign,
                               4442                 :                :                            "MAXALIGN", MAXIMUM_ALIGNOF),
                               4443                 :                :                  errhint("It looks like you need to initdb.")));
 7278 tgl@sss.pgh.pa.us        4444         [ -  + ]:CBC         937 :     if (ControlFile->floatFormat != FLOATFORMAT_VALUE)
 7278 tgl@sss.pgh.pa.us        4445         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4446                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4447                 :                :                  errmsg("database files are incompatible with server"),
                               4448                 :                :                  errdetail("The database cluster appears to use a different floating-point number format than the server executable."),
                               4449                 :                :                  errhint("It looks like you need to initdb.")));
 9051 tgl@sss.pgh.pa.us        4450         [ -  + ]:CBC         937 :     if (ControlFile->blcksz != BLCKSZ)
 8083 tgl@sss.pgh.pa.us        4451         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4452                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4453                 :                :                  errmsg("database files are incompatible with server"),
                               4454                 :                :         /* translator: %s is a variable name and %d is its value */
                               4455                 :                :                  errdetail("The database cluster was initialized with %s %d,"
                               4456                 :                :                            " but the server was compiled with %s %d.",
                               4457                 :                :                            "BLCKSZ", ControlFile->blcksz,
                               4458                 :                :                            "BLCKSZ", BLCKSZ),
                               4459                 :                :                  errhint("It looks like you need to recompile or initdb.")));
 9051 tgl@sss.pgh.pa.us        4460         [ -  + ]:CBC         937 :     if (ControlFile->relseg_size != RELSEG_SIZE)
 8083 tgl@sss.pgh.pa.us        4461         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4462                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4463                 :                :                  errmsg("database files are incompatible with server"),
                               4464                 :                :         /* translator: %s is a variable name and %d is its value */
                               4465                 :                :                  errdetail("The database cluster was initialized with %s %d,"
                               4466                 :                :                            " but the server was compiled with %s %d.",
                               4467                 :                :                            "RELSEG_SIZE", ControlFile->relseg_size,
                               4468                 :                :                            "RELSEG_SIZE", RELSEG_SIZE),
                               4469                 :                :                  errhint("It looks like you need to recompile or initdb.")));
 7096 tgl@sss.pgh.pa.us        4470         [ -  + ]:CBC         937 :     if (ControlFile->xlog_blcksz != XLOG_BLCKSZ)
 7096 tgl@sss.pgh.pa.us        4471         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4472                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4473                 :                :                  errmsg("database files are incompatible with server"),
                               4474                 :                :         /* translator: %s is a variable name and %d is its value */
                               4475                 :                :                  errdetail("The database cluster was initialized with %s %d,"
                               4476                 :                :                            " but the server was compiled with %s %d.",
                               4477                 :                :                            "XLOG_BLCKSZ", ControlFile->xlog_blcksz,
                               4478                 :                :                            "XLOG_BLCKSZ", XLOG_BLCKSZ),
                               4479                 :                :                  errhint("It looks like you need to recompile or initdb.")));
 8539 lockhart@fourpalms.o     4480         [ -  + ]:CBC         937 :     if (ControlFile->nameDataLen != NAMEDATALEN)
 8083 tgl@sss.pgh.pa.us        4481         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4482                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4483                 :                :                  errmsg("database files are incompatible with server"),
                               4484                 :                :         /* translator: %s is a variable name and %d is its value */
                               4485                 :                :                  errdetail("The database cluster was initialized with %s %d,"
                               4486                 :                :                            " but the server was compiled with %s %d.",
                               4487                 :                :                            "NAMEDATALEN", ControlFile->nameDataLen,
                               4488                 :                :                            "NAMEDATALEN", NAMEDATALEN),
                               4489                 :                :                  errhint("It looks like you need to recompile or initdb.")));
 7466 tgl@sss.pgh.pa.us        4490         [ -  + ]:CBC         937 :     if (ControlFile->indexMaxKeys != INDEX_MAX_KEYS)
 8083 tgl@sss.pgh.pa.us        4491         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4492                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4493                 :                :                  errmsg("database files are incompatible with server"),
                               4494                 :                :         /* translator: %s is a variable name and %d is its value */
                               4495                 :                :                  errdetail("The database cluster was initialized with %s %d,"
                               4496                 :                :                            " but the server was compiled with %s %d.",
                               4497                 :                :                            "INDEX_MAX_KEYS", ControlFile->indexMaxKeys,
                               4498                 :                :                            "INDEX_MAX_KEYS", INDEX_MAX_KEYS),
                               4499                 :                :                  errhint("It looks like you need to recompile or initdb.")));
 6731 tgl@sss.pgh.pa.us        4500         [ -  + ]:CBC         937 :     if (ControlFile->toast_max_chunk_size != TOAST_MAX_CHUNK_SIZE)
 6731 tgl@sss.pgh.pa.us        4501         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4502                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4503                 :                :                  errmsg("database files are incompatible with server"),
                               4504                 :                :         /* translator: %s is a variable name and %d is its value */
                               4505                 :                :                  errdetail("The database cluster was initialized with %s %d,"
                               4506                 :                :                            " but the server was compiled with %s %d.",
                               4507                 :                :                            "TOAST_MAX_CHUNK_SIZE", ControlFile->toast_max_chunk_size,
                               4508                 :                :                            "TOAST_MAX_CHUNK_SIZE", (int) TOAST_MAX_CHUNK_SIZE),
                               4509                 :                :                  errhint("It looks like you need to recompile or initdb.")));
 4111 tgl@sss.pgh.pa.us        4510         [ -  + ]:CBC         937 :     if (ControlFile->loblksize != LOBLKSIZE)
 4111 tgl@sss.pgh.pa.us        4511         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4512                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4513                 :                :                  errmsg("database files are incompatible with server"),
                               4514                 :                :         /* translator: %s is a variable name and %d is its value */
                               4515                 :                :                  errdetail("The database cluster was initialized with %s %d,"
                               4516                 :                :                            " but the server was compiled with %s %d.",
                               4517                 :                :                            "LOBLKSIZE", ControlFile->loblksize,
                               4518                 :                :                            "LOBLKSIZE", (int) LOBLKSIZE),
                               4519                 :                :                  errhint("It looks like you need to recompile or initdb.")));
                               4520                 :                : 
   24 tgl@sss.pgh.pa.us        4521         [ -  + ]:GNC         937 :     Assert(ControlFile->float8ByVal);    /* vestigial, not worth an error msg */
                               4522                 :                : 
 2909 andres@anarazel.de       4523                 :CBC         937 :     wal_segment_size = ControlFile->xlog_seg_size;
                               4524                 :                : 
                               4525   [ +  -  +  -  :            937 :     if (!IsValidWalSegSize(wal_segment_size))
                                        +  -  -  + ]
 2909 andres@anarazel.de       4526         [ #  # ]:UBC           0 :         ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                               4527                 :                :                         errmsg_plural("invalid WAL segment size in control file (%d byte)",
                               4528                 :                :                                       "invalid WAL segment size in control file (%d bytes)",
                               4529                 :                :                                       wal_segment_size,
                               4530                 :                :                                       wal_segment_size),
                               4531                 :                :                         errdetail("The WAL segment size must be a power of two between 1 MB and 1 GB.")));
                               4532                 :                : 
 2909 andres@anarazel.de       4533                 :CBC         937 :     snprintf(wal_segsz_str, sizeof(wal_segsz_str), "%d", wal_segment_size);
                               4534                 :            937 :     SetConfigOption("wal_segment_size", wal_segsz_str, PGC_INTERNAL,
                               4535                 :                :                     PGC_S_DYNAMIC_DEFAULT);
                               4536                 :                : 
                               4537                 :                :     /* check and update variables dependent on wal_segment_size */
                               4538         [ -  + ]:            937 :     if (ConvertToXSegs(min_wal_size_mb, wal_segment_size) < 2)
 2909 andres@anarazel.de       4539         [ #  # ]:UBC           0 :         ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                               4540                 :                :         /* translator: both %s are GUC names */
                               4541                 :                :                         errmsg("\"%s\" must be at least twice \"%s\"",
                               4542                 :                :                                "min_wal_size", "wal_segment_size")));
                               4543                 :                : 
 2909 andres@anarazel.de       4544         [ -  + ]:CBC         937 :     if (ConvertToXSegs(max_wal_size_mb, wal_segment_size) < 2)
 2909 andres@anarazel.de       4545         [ #  # ]:UBC           0 :         ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                               4546                 :                :         /* translator: both %s are GUC names */
                               4547                 :                :                         errmsg("\"%s\" must be at least twice \"%s\"",
                               4548                 :                :                                "max_wal_size", "wal_segment_size")));
                               4549                 :                : 
 2909 andres@anarazel.de       4550                 :CBC         937 :     UsableBytesInSegment =
                               4551                 :            937 :         (wal_segment_size / XLOG_BLCKSZ * UsableBytesInPage) -
                               4552                 :                :         (SizeOfXLogLongPHD - SizeOfXLogShortPHD);
                               4553                 :                : 
                               4554                 :            937 :     CalculateCheckpointSegments();
                               4555                 :                : 
                               4556                 :                :     /* Make the initdb settings visible as GUC variables, too */
 2707 magnus@hagander.net      4557         [ +  + ]:            937 :     SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
                               4558                 :                :                     PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);
 9051 tgl@sss.pgh.pa.us        4559                 :            937 : }
                               4560                 :                : 
                               4561                 :                : /*
                               4562                 :                :  * Utility wrapper to update the control file.  Note that the control
                               4563                 :                :  * file gets flushed.
                               4564                 :                :  */
                               4565                 :                : static void
                               4566                 :           8846 : UpdateControlFile(void)
                               4567                 :                : {
 2350 peter@eisentraut.org     4568                 :           8846 :     update_controlfile(DataDir, ControlFile, true);
 9467 vadim4o@yahoo.com        4569                 :           8846 : }
                               4570                 :                : 
                               4571                 :                : /*
                               4572                 :                :  * Returns the unique system identifier from control file.
                               4573                 :                :  */
                               4574                 :                : uint64
 5713 heikki.linnakangas@i     4575                 :           1345 : GetSystemIdentifier(void)
                               4576                 :                : {
                               4577         [ -  + ]:           1345 :     Assert(ControlFile != NULL);
                               4578                 :           1345 :     return ControlFile->system_identifier;
                               4579                 :                : }
                               4580                 :                : 
                               4581                 :                : /*
                               4582                 :                :  * Returns the random nonce from control file.
                               4583                 :                :  */
                               4584                 :                : char *
 3105                          4585                 :              1 : GetMockAuthenticationNonce(void)
                               4586                 :                : {
                               4587         [ -  + ]:              1 :     Assert(ControlFile != NULL);
                               4588                 :              1 :     return ControlFile->mock_authentication_nonce;
                               4589                 :                : }
                               4590                 :                : 
                               4591                 :                : /*
                               4592                 :                :  * Are checksums enabled for data pages?
                               4593                 :                :  */
                               4594                 :                : bool
 2707 magnus@hagander.net      4595                 :        8916351 : DataChecksumsEnabled(void)
                               4596                 :                : {
 4551 simon@2ndQuadrant.co     4597         [ -  + ]:        8916351 :     Assert(ControlFile != NULL);
 4512                          4598                 :        8916351 :     return (ControlFile->data_checksum_version > 0);
                               4599                 :                : }
                               4600                 :                : 
                               4601                 :                : /*
                               4602                 :                :  * Return true if the cluster was initialized on a platform where the
                               4603                 :                :  * default signedness of char is "signed". This function exists for code
                               4604                 :                :  * that deals with pre-v18 data files that store data sorted by the 'char'
                               4605                 :                :  * type on disk (e.g., GIN and GiST indexes). See the comments in
                               4606                 :                :  * WriteControlFile() for details.
                               4607                 :                :  */
                               4608                 :                : bool
  197 msawada@postgresql.o     4609                 :              3 : GetDefaultCharSignedness(void)
                               4610                 :                : {
                               4611                 :              3 :     return ControlFile->default_char_signedness;
                               4612                 :                : }
                               4613                 :                : 
                               4614                 :                : /*
                               4615                 :                :  * Returns a fake LSN for unlogged relations.
                               4616                 :                :  *
                               4617                 :                :  * Each call generates an LSN that is greater than any previous value
                               4618                 :                :  * returned. The current counter value is saved and restored across clean
                               4619                 :                :  * shutdowns, but like unlogged relations, does not survive a crash. This can
                               4620                 :                :  * be used in lieu of real LSN values returned by XLogInsert, if you need an
                               4621                 :                :  * LSN-like increasing sequence of numbers without writing any WAL.
                               4622                 :                :  */
                               4623                 :                : XLogRecPtr
 4590 heikki.linnakangas@i     4624                 :             33 : GetFakeLSNForUnloggedRel(void)
                               4625                 :                : {
  555 nathan@postgresql.or     4626                 :             33 :     return pg_atomic_fetch_add_u64(&XLogCtl->unloggedLSN, 1);
                               4627                 :                : }
                               4628                 :                : 
                               4629                 :                : /*
                               4630                 :                :  * Auto-tune the number of XLOG buffers.
                               4631                 :                :  *
                               4632                 :                :  * The preferred setting for wal_buffers is about 3% of shared_buffers, with
                               4633                 :                :  * a maximum of one XLOG segment (there is little reason to think that more
                               4634                 :                :  * is helpful, at least so long as we force an fsync when switching log files)
                               4635                 :                :  * and a minimum of 8 blocks (which was the default value prior to PostgreSQL
                               4636                 :                :  * 9.1, when auto-tuning was added).
                               4637                 :                :  *
                               4638                 :                :  * This should not be called until NBuffers has received its final value.
                               4639                 :                :  */
                               4640                 :                : static int
 5266 tgl@sss.pgh.pa.us        4641                 :           1028 : XLOGChooseNumBuffers(void)
                               4642                 :                : {
                               4643                 :                :     int         xbuffers;
                               4644                 :                : 
                               4645                 :           1028 :     xbuffers = NBuffers / 32;
 2909 andres@anarazel.de       4646         [ +  + ]:           1028 :     if (xbuffers > (wal_segment_size / XLOG_BLCKSZ))
                               4647                 :             24 :         xbuffers = (wal_segment_size / XLOG_BLCKSZ);
 5266 tgl@sss.pgh.pa.us        4648         [ +  + ]:           1028 :     if (xbuffers < 8)
                               4649                 :            409 :         xbuffers = 8;
                               4650                 :           1028 :     return xbuffers;
                               4651                 :                : }
                               4652                 :                : 
                               4653                 :                : /*
                               4654                 :                :  * GUC check_hook for wal_buffers
                               4655                 :                :  */
                               4656                 :                : bool
                               4657                 :           2095 : check_wal_buffers(int *newval, void **extra, GucSource source)
                               4658                 :                : {
                               4659                 :                :     /*
                               4660                 :                :      * -1 indicates a request for auto-tune.
                               4661                 :                :      */
                               4662         [ +  + ]:           2095 :     if (*newval == -1)
                               4663                 :                :     {
                               4664                 :                :         /*
                               4665                 :                :          * If we haven't yet changed the boot_val default of -1, just let it
                               4666                 :                :          * be.  We'll fix it when XLOGShmemSize is called.
                               4667                 :                :          */
                               4668         [ +  - ]:           1067 :         if (XLOGbuffers == -1)
                               4669                 :           1067 :             return true;
                               4670                 :                : 
                               4671                 :                :         /* Otherwise, substitute the auto-tune value */
 5266 tgl@sss.pgh.pa.us        4672                 :UBC           0 :         *newval = XLOGChooseNumBuffers();
                               4673                 :                :     }
                               4674                 :                : 
                               4675                 :                :     /*
                               4676                 :                :      * We clamp manually-set values to at least 4 blocks.  Prior to PostgreSQL
                               4677                 :                :      * 9.1, a minimum of 4 was enforced by guc.c, but since that is no longer
                               4678                 :                :      * the case, we just silently treat such values as a request for the
                               4679                 :                :      * minimum.  (We could throw an error instead, but that doesn't seem very
                               4680                 :                :      * helpful.)
                               4681                 :                :      */
 5266 tgl@sss.pgh.pa.us        4682         [ -  + ]:CBC        1028 :     if (*newval < 4)
 5266 tgl@sss.pgh.pa.us        4683                 :UBC           0 :         *newval = 4;
                               4684                 :                : 
 5266 tgl@sss.pgh.pa.us        4685                 :CBC        1028 :     return true;
                               4686                 :                : }
                               4687                 :                : 
                               4688                 :                : /*
                               4689                 :                :  * GUC check_hook for wal_consistency_checking
                               4690                 :                :  */
                               4691                 :                : bool
 1089                          4692                 :           1971 : check_wal_consistency_checking(char **newval, void **extra, GucSource source)
                               4693                 :                : {
                               4694                 :                :     char       *rawstring;
                               4695                 :                :     List       *elemlist;
                               4696                 :                :     ListCell   *l;
                               4697                 :                :     bool        newwalconsistency[RM_MAX_ID + 1];
                               4698                 :                : 
                               4699                 :                :     /* Initialize the array */
                               4700   [ +  -  +  -  :          65043 :     MemSet(newwalconsistency, 0, (RM_MAX_ID + 1) * sizeof(bool));
                                     +  -  +  -  +  
                                                 + ]
                               4701                 :                : 
                               4702                 :                :     /* Need a modifiable copy of string */
                               4703                 :           1971 :     rawstring = pstrdup(*newval);
                               4704                 :                : 
                               4705                 :                :     /* Parse string into list of identifiers */
                               4706         [ -  + ]:           1971 :     if (!SplitIdentifierString(rawstring, ',', &elemlist))
                               4707                 :                :     {
                               4708                 :                :         /* syntax error in list */
 1089 tgl@sss.pgh.pa.us        4709                 :UBC           0 :         GUC_check_errdetail("List syntax is invalid.");
                               4710                 :              0 :         pfree(rawstring);
                               4711                 :              0 :         list_free(elemlist);
                               4712                 :              0 :         return false;
                               4713                 :                :     }
                               4714                 :                : 
 1089 tgl@sss.pgh.pa.us        4715   [ +  +  +  +  :CBC        2426 :     foreach(l, elemlist)
                                              +  + ]
                               4716                 :                :     {
                               4717                 :            455 :         char       *tok = (char *) lfirst(l);
                               4718                 :                :         int         rmid;
                               4719                 :                : 
                               4720                 :                :         /* Check for 'all'. */
                               4721         [ +  + ]:            455 :         if (pg_strcasecmp(tok, "all") == 0)
                               4722                 :                :         {
                               4723         [ +  + ]:         116421 :             for (rmid = 0; rmid <= RM_MAX_ID; rmid++)
                               4724   [ +  +  +  + ]:         115968 :                 if (RmgrIdExists(rmid) && GetRmgr(rmid).rm_mask != NULL)
                               4725                 :           4530 :                     newwalconsistency[rmid] = true;
                               4726                 :                :         }
                               4727                 :                :         else
                               4728                 :                :         {
                               4729                 :                :             /* Check if the token matches any known resource manager. */
                               4730                 :              2 :             bool        found = false;
                               4731                 :                : 
                               4732         [ +  - ]:             36 :             for (rmid = 0; rmid <= RM_MAX_ID; rmid++)
                               4733                 :                :             {
                               4734   [ +  -  +  +  :             54 :                 if (RmgrIdExists(rmid) && GetRmgr(rmid).rm_mask != NULL &&
                                              +  + ]
                               4735                 :             18 :                     pg_strcasecmp(tok, GetRmgr(rmid).rm_name) == 0)
                               4736                 :                :                 {
                               4737                 :              2 :                     newwalconsistency[rmid] = true;
                               4738                 :              2 :                     found = true;
                               4739                 :              2 :                     break;
                               4740                 :                :                 }
                               4741                 :                :             }
                               4742         [ -  + ]:              2 :             if (!found)
                               4743                 :                :             {
                               4744                 :                :                 /*
                               4745                 :                :                  * During startup, it might be a not-yet-loaded custom
                               4746                 :                :                  * resource manager.  Defer checking until
                               4747                 :                :                  * InitializeWalConsistencyChecking().
                               4748                 :                :                  */
 1089 tgl@sss.pgh.pa.us        4749         [ #  # ]:UBC           0 :                 if (!process_shared_preload_libraries_done)
                               4750                 :                :                 {
                               4751                 :              0 :                     check_wal_consistency_checking_deferred = true;
                               4752                 :                :                 }
                               4753                 :                :                 else
                               4754                 :                :                 {
                               4755                 :              0 :                     GUC_check_errdetail("Unrecognized key word: \"%s\".", tok);
                               4756                 :              0 :                     pfree(rawstring);
                               4757                 :              0 :                     list_free(elemlist);
                               4758                 :              0 :                     return false;
                               4759                 :                :                 }
                               4760                 :                :             }
                               4761                 :                :         }
                               4762                 :                :     }
                               4763                 :                : 
 1089 tgl@sss.pgh.pa.us        4764                 :CBC        1971 :     pfree(rawstring);
                               4765                 :           1971 :     list_free(elemlist);
                               4766                 :                : 
                               4767                 :                :     /* assign new value */
  163 dgustafsson@postgres     4768                 :           1971 :     *extra = guc_malloc(LOG, (RM_MAX_ID + 1) * sizeof(bool));
                               4769         [ -  + ]:           1971 :     if (!*extra)
  163 dgustafsson@postgres     4770                 :UBC           0 :         return false;
 1089 tgl@sss.pgh.pa.us        4771                 :CBC        1971 :     memcpy(*extra, newwalconsistency, (RM_MAX_ID + 1) * sizeof(bool));
                               4772                 :           1971 :     return true;
                               4773                 :                : }
                               4774                 :                : 
                               4775                 :                : /*
                               4776                 :                :  * GUC assign_hook for wal_consistency_checking
                               4777                 :                :  */
                               4778                 :                : void
                               4779                 :           1970 : assign_wal_consistency_checking(const char *newval, void *extra)
                               4780                 :                : {
                               4781                 :                :     /*
                               4782                 :                :      * If some checks were deferred, it's possible that the checks will fail
                               4783                 :                :      * later during InitializeWalConsistencyChecking(). But in that case, the
                               4784                 :                :      * postmaster will exit anyway, so it's safe to proceed with the
                               4785                 :                :      * assignment.
                               4786                 :                :      *
                               4787                 :                :      * Any built-in resource managers specified are assigned immediately,
                               4788                 :                :      * which affects WAL created before shared_preload_libraries are
                               4789                 :                :      * processed. Any custom resource managers specified won't be assigned
                               4790                 :                :      * until after shared_preload_libraries are processed, but that's OK
                               4791                 :                :      * because WAL for a custom resource manager can't be written before the
                               4792                 :                :      * module is loaded anyway.
                               4793                 :                :      */
                               4794                 :           1970 :     wal_consistency_checking = extra;
                               4795                 :           1970 : }
                               4796                 :                : 
                               4797                 :                : /*
                               4798                 :                :  * InitializeWalConsistencyChecking: run after loading custom resource managers
                               4799                 :                :  *
                               4800                 :                :  * If any unknown resource managers were specified in the
                               4801                 :                :  * wal_consistency_checking GUC, processing was deferred.  Now that
                               4802                 :                :  * shared_preload_libraries have been loaded, process wal_consistency_checking
                               4803                 :                :  * again.
                               4804                 :                :  */
                               4805                 :                : void
                               4806                 :            878 : InitializeWalConsistencyChecking(void)
                               4807                 :                : {
                               4808         [ -  + ]:            878 :     Assert(process_shared_preload_libraries_done);
                               4809                 :                : 
                               4810         [ -  + ]:            878 :     if (check_wal_consistency_checking_deferred)
                               4811                 :                :     {
                               4812                 :                :         struct config_generic *guc;
                               4813                 :                : 
 1089 tgl@sss.pgh.pa.us        4814                 :UBC           0 :         guc = find_option("wal_consistency_checking", false, false, ERROR);
                               4815                 :                : 
                               4816                 :              0 :         check_wal_consistency_checking_deferred = false;
                               4817                 :                : 
                               4818                 :              0 :         set_config_option_ext("wal_consistency_checking",
                               4819                 :                :                               wal_consistency_checking_string,
                               4820                 :                :                               guc->scontext, guc->source, guc->srole,
                               4821                 :                :                               GUC_ACTION_SET, true, ERROR, false);
                               4822                 :                : 
                               4823                 :                :         /* checking should not be deferred again */
                               4824         [ #  # ]:              0 :         Assert(!check_wal_consistency_checking_deferred);
                               4825                 :                :     }
 1089 tgl@sss.pgh.pa.us        4826                 :CBC         878 : }
                               4827                 :                : 
                               4828                 :                : /*
                               4829                 :                :  * GUC show_hook for archive_command
                               4830                 :                :  */
                               4831                 :                : const char *
                               4832                 :           1694 : show_archive_command(void)
                               4833                 :                : {
                               4834   [ +  +  -  +  :           1694 :     if (XLogArchivingActive())
                                              +  + ]
                               4835                 :              2 :         return XLogArchiveCommand;
                               4836                 :                :     else
                               4837                 :           1692 :         return "(disabled)";
                               4838                 :                : }
                               4839                 :                : 
                               4840                 :                : /*
                               4841                 :                :  * GUC show_hook for in_hot_standby
                               4842                 :                :  */
                               4843                 :                : const char *
                               4844                 :          13781 : show_in_hot_standby(void)
                               4845                 :                : {
                               4846                 :                :     /*
                               4847                 :                :      * We display the actual state based on shared memory, so that this GUC
                               4848                 :                :      * reports up-to-date state if examined intra-query.  The underlying
                               4849                 :                :      * variable (in_hot_standby_guc) changes only when we transmit a new value
                               4850                 :                :      * to the client.
                               4851                 :                :      */
                               4852         [ +  + ]:          13781 :     return RecoveryInProgress() ? "on" : "off";
                               4853                 :                : }
                               4854                 :                : 
                               4855                 :                : /*
                               4856                 :                :  * Read the control file, set respective GUCs.
                               4857                 :                :  *
                               4858                 :                :  * This is to be called during startup, including a crash recovery cycle,
                               4859                 :                :  * unless in bootstrap mode, where no control file yet exists.  As there's no
                               4860                 :                :  * usable shared memory yet (its sizing can depend on the contents of the
                               4861                 :                :  * control file!), first store the contents in local memory. XLOGShmemInit()
                               4862                 :                :  * will then copy it to shared memory later.
                               4863                 :                :  *
                               4864                 :                :  * reset just controls whether previous contents are to be expected (in the
                               4865                 :                :  * reset case, there's a dangling pointer into old shared memory), or not.
                               4866                 :                :  */
                               4867                 :                : void
 2911 andres@anarazel.de       4868                 :            887 : LocalProcessControlFile(bool reset)
                               4869                 :                : {
                               4870   [ +  +  -  + ]:            887 :     Assert(reset || ControlFile == NULL);
 2915                          4871                 :            887 :     ControlFile = palloc(sizeof(ControlFileData));
                               4872                 :            887 :     ReadControlFile();
                               4873                 :            887 : }
                               4874                 :                : 
                               4875                 :                : /*
                               4876                 :                :  * Get the wal_level from the control file. For a standby, this value should be
                               4877                 :                :  * considered as its active wal_level, because it may be different from what
                               4878                 :                :  * was originally configured on standby.
                               4879                 :                :  */
                               4880                 :                : WalLevel
  882                          4881                 :              1 : GetActiveWalLevelOnStandby(void)
                               4882                 :                : {
                               4883                 :              1 :     return ControlFile->wal_level;
                               4884                 :                : }
                               4885                 :                : 
                               4886                 :                : /*
                               4887                 :                :  * Initialization of shared memory for XLOG
                               4888                 :                :  */
                               4889                 :                : Size
 9055 peter_e@gmx.net          4890                 :           2938 : XLOGShmemSize(void)
                               4891                 :                : {
                               4892                 :                :     Size        size;
                               4893                 :                : 
                               4894                 :                :     /*
                               4895                 :                :      * If the value of wal_buffers is -1, use the preferred auto-tune value.
                               4896                 :                :      * This isn't an amazingly clean place to do this, but we must wait till
                               4897                 :                :      * NBuffers has received its final value, and must do it before using the
                               4898                 :                :      * value of XLOGbuffers to do anything important.
                               4899                 :                :      *
                               4900                 :                :      * We prefer to report this value's source as PGC_S_DYNAMIC_DEFAULT.
                               4901                 :                :      * However, if the DBA explicitly set wal_buffers = -1 in the config file,
                               4902                 :                :      * then PGC_S_DYNAMIC_DEFAULT will fail to override that and we must force
                               4903                 :                :      * the matter with PGC_S_OVERRIDE.
                               4904                 :                :      */
 5266 tgl@sss.pgh.pa.us        4905         [ +  + ]:           2938 :     if (XLOGbuffers == -1)
                               4906                 :                :     {
                               4907                 :                :         char        buf[32];
                               4908                 :                : 
                               4909                 :           1028 :         snprintf(buf, sizeof(buf), "%d", XLOGChooseNumBuffers());
 1186                          4910                 :           1028 :         SetConfigOption("wal_buffers", buf, PGC_POSTMASTER,
                               4911                 :                :                         PGC_S_DYNAMIC_DEFAULT);
                               4912         [ -  + ]:           1028 :         if (XLOGbuffers == -1)  /* failed to apply it? */
 1186 tgl@sss.pgh.pa.us        4913                 :UBC           0 :             SetConfigOption("wal_buffers", buf, PGC_POSTMASTER,
                               4914                 :                :                             PGC_S_OVERRIDE);
                               4915                 :                :     }
 5341 tgl@sss.pgh.pa.us        4916         [ -  + ]:CBC        2938 :     Assert(XLOGbuffers > 0);
                               4917                 :                : 
                               4918                 :                :     /* XLogCtl */
 7322                          4919                 :           2938 :     size = sizeof(XLogCtlData);
                               4920                 :                : 
                               4921                 :                :     /* WAL insertion locks, plus alignment */
 3993 heikki.linnakangas@i     4922                 :           2938 :     size = add_size(size, mul_size(sizeof(WALInsertLockPadded), NUM_XLOGINSERT_LOCKS + 1));
                               4923                 :                :     /* xlblocks array */
  627 jdavis@postgresql.or     4924                 :           2938 :     size = add_size(size, mul_size(sizeof(pg_atomic_uint64), XLOGbuffers));
                               4925                 :                :     /* extra alignment padding for XLOG I/O buffers */
  882 tmunro@postgresql.or     4926                 :           2938 :     size = add_size(size, Max(XLOG_BLCKSZ, PG_IO_ALIGN_SIZE));
                               4927                 :                :     /* and the buffers themselves */
 7096 tgl@sss.pgh.pa.us        4928                 :           2938 :     size = add_size(size, mul_size(XLOG_BLCKSZ, XLOGbuffers));
                               4929                 :                : 
                               4930                 :                :     /*
                               4931                 :                :      * Note: we don't count ControlFileData, it comes out of the "slop factor"
                               4932                 :                :      * added by CreateSharedMemoryAndSemaphores.  This lets us use this
                               4933                 :                :      * routine again below to compute the actual allocation size.
                               4934                 :                :      */
                               4935                 :                : 
 7322                          4936                 :           2938 :     return size;
                               4937                 :                : }
                               4938                 :                : 
                               4939                 :                : void
 9467 vadim4o@yahoo.com        4940                 :           1029 : XLOGShmemInit(void)
                               4941                 :                : {
                               4942                 :                :     bool        foundCFile,
                               4943                 :                :                 foundXLog;
                               4944                 :                :     char       *allocptr;
                               4945                 :                :     int         i;
                               4946                 :                :     ControlFileData *localControlFile;
                               4947                 :                : 
                               4948                 :                : #ifdef WAL_DEBUG
                               4949                 :                : 
                               4950                 :                :     /*
                               4951                 :                :      * Create a memory context for WAL debugging that's exempt from the normal
                               4952                 :                :      * "no pallocs in critical section" rule. Yes, that can lead to a PANIC if
                               4953                 :                :      * an allocation fails, but wal_debug is not for production use anyway.
                               4954                 :                :      */
                               4955                 :                :     if (walDebugCxt == NULL)
                               4956                 :                :     {
                               4957                 :                :         walDebugCxt = AllocSetContextCreate(TopMemoryContext,
                               4958                 :                :                                             "WAL Debug",
                               4959                 :                :                                             ALLOCSET_DEFAULT_SIZES);
                               4960                 :                :         MemoryContextAllowInCriticalSection(walDebugCxt, true);
                               4961                 :                :     }
                               4962                 :                : #endif
                               4963                 :                : 
                               4964                 :                : 
 2911 andres@anarazel.de       4965                 :           1029 :     XLogCtl = (XLogCtlData *)
                               4966                 :           1029 :         ShmemInitStruct("XLOG Ctl", XLOGShmemSize(), &foundXLog);
                               4967                 :                : 
 2915                          4968                 :           1029 :     localControlFile = ControlFile;
 9051 tgl@sss.pgh.pa.us        4969                 :           1029 :     ControlFile = (ControlFileData *)
 7931 bruce@momjian.us         4970                 :           1029 :         ShmemInitStruct("Control File", sizeof(ControlFileData), &foundCFile);
                               4971                 :                : 
 7320 tgl@sss.pgh.pa.us        4972   [ +  -  -  + ]:           1029 :     if (foundCFile || foundXLog)
                               4973                 :                :     {
                               4974                 :                :         /* both should be present or neither */
 7320 tgl@sss.pgh.pa.us        4975   [ #  #  #  # ]:UBC           0 :         Assert(foundCFile && foundXLog);
                               4976                 :                : 
                               4977                 :                :         /* Initialize local copy of WALInsertLocks */
 4062 rhaas@postgresql.org     4978                 :              0 :         WALInsertLocks = XLogCtl->Insert.WALInsertLocks;
                               4979                 :                : 
 2911 andres@anarazel.de       4980         [ #  # ]:              0 :         if (localControlFile)
                               4981                 :              0 :             pfree(localControlFile);
 7931 bruce@momjian.us         4982                 :              0 :         return;
                               4983                 :                :     }
 8943 tgl@sss.pgh.pa.us        4984                 :CBC        1029 :     memset(XLogCtl, 0, sizeof(XLogCtlData));
                               4985                 :                : 
                               4986                 :                :     /*
                               4987                 :                :      * Already have read control file locally, unless in bootstrap mode. Move
                               4988                 :                :      * contents into shared memory.
                               4989                 :                :      */
 2911 andres@anarazel.de       4990         [ +  + ]:           1029 :     if (localControlFile)
                               4991                 :                :     {
                               4992                 :            879 :         memcpy(ControlFile, localControlFile, sizeof(ControlFileData));
                               4993                 :            879 :         pfree(localControlFile);
                               4994                 :                :     }
                               4995                 :                : 
                               4996                 :                :     /*
                               4997                 :                :      * Since XLogCtlData contains XLogRecPtr fields, its sizeof should be a
                               4998                 :                :      * multiple of the alignment for same, so no extra alignment padding is
                               4999                 :                :      * needed here.
                               5000                 :                :      */
 4443 heikki.linnakangas@i     5001                 :           1029 :     allocptr = ((char *) XLogCtl) + sizeof(XLogCtlData);
  627 jdavis@postgresql.or     5002                 :           1029 :     XLogCtl->xlblocks = (pg_atomic_uint64 *) allocptr;
                               5003                 :           1029 :     allocptr += sizeof(pg_atomic_uint64) * XLOGbuffers;
                               5004                 :                : 
                               5005         [ +  + ]:         287467 :     for (i = 0; i < XLOGbuffers; i++)
                               5006                 :                :     {
                               5007                 :         286438 :         pg_atomic_init_u64(&XLogCtl->xlblocks[i], InvalidXLogRecPtr);
                               5008                 :                :     }
                               5009                 :                : 
                               5010                 :                :     /* WAL insertion locks. Ensure they're aligned to the full padded size */
 4187 heikki.linnakangas@i     5011                 :           1029 :     allocptr += sizeof(WALInsertLockPadded) -
 2999 tgl@sss.pgh.pa.us        5012                 :           1029 :         ((uintptr_t) allocptr) % sizeof(WALInsertLockPadded);
 4187 heikki.linnakangas@i     5013                 :           1029 :     WALInsertLocks = XLogCtl->Insert.WALInsertLocks =
                               5014                 :                :         (WALInsertLockPadded *) allocptr;
 3993                          5015                 :           1029 :     allocptr += sizeof(WALInsertLockPadded) * NUM_XLOGINSERT_LOCKS;
                               5016                 :                : 
                               5017         [ +  + ]:           9261 :     for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
                               5018                 :                :     {
 3553 rhaas@postgresql.org     5019                 :           8232 :         LWLockInitialize(&WALInsertLocks[i].l.lock, LWTRANCHE_WAL_INSERT);
  774 michael@paquier.xyz      5020                 :           8232 :         pg_atomic_init_u64(&WALInsertLocks[i].l.insertingAt, InvalidXLogRecPtr);
 3180 andres@anarazel.de       5021                 :           8232 :         WALInsertLocks[i].l.lastImportantAt = InvalidXLogRecPtr;
                               5022                 :                :     }
                               5023                 :                : 
                               5024                 :                :     /*
                               5025                 :                :      * Align the start of the page buffers to a full xlog block size boundary.
                               5026                 :                :      * This simplifies some calculations in XLOG insertion. It is also
                               5027                 :                :      * required for O_DIRECT.
                               5028                 :                :      */
 4443 heikki.linnakangas@i     5029                 :           1029 :     allocptr = (char *) TYPEALIGN(XLOG_BLCKSZ, allocptr);
 7322 tgl@sss.pgh.pa.us        5030                 :           1029 :     XLogCtl->pages = allocptr;
 7096                          5031                 :           1029 :     memset(XLogCtl->pages, 0, (Size) XLOG_BLCKSZ * XLOGbuffers);
                               5032                 :                : 
                               5033                 :                :     /*
                               5034                 :                :      * Do basic initialization of XLogCtl shared data. (StartupXLOG will fill
                               5035                 :                :      * in additional info.)
                               5036                 :                :      */
 8943                          5037                 :           1029 :     XLogCtl->XLogCacheBlck = XLOGbuffers - 1;
 1961 michael@paquier.xyz      5038                 :           1029 :     XLogCtl->SharedRecoveryState = RECOVERY_STATE_CRASH;
 1531 noah@leadboat.com        5039                 :           1029 :     XLogCtl->InstallXLogFileSegmentActive = false;
 4869 tgl@sss.pgh.pa.us        5040                 :           1029 :     XLogCtl->WalWriterSleeping = false;
                               5041                 :                : 
 4443 heikki.linnakangas@i     5042                 :           1029 :     SpinLockInit(&XLogCtl->Insert.insertpos_lck);
 8743 tgl@sss.pgh.pa.us        5043                 :           1029 :     SpinLockInit(&XLogCtl->info_lck);
  517 alvherre@alvh.no-ip.     5044                 :           1029 :     pg_atomic_init_u64(&XLogCtl->logInsertResult, InvalidXLogRecPtr);
  519                          5045                 :           1029 :     pg_atomic_init_u64(&XLogCtl->logWriteResult, InvalidXLogRecPtr);
                               5046                 :           1029 :     pg_atomic_init_u64(&XLogCtl->logFlushResult, InvalidXLogRecPtr);
  555 nathan@postgresql.or     5047                 :           1029 :     pg_atomic_init_u64(&XLogCtl->unloggedLSN, InvalidXLogRecPtr);
                               5048                 :                : }
                               5049                 :                : 
                               5050                 :                : /*
                               5051                 :                :  * This func must be called ONCE on system install.  It creates pg_control
                               5052                 :                :  * and the initial XLOG segment.
                               5053                 :                :  */
                               5054                 :                : void
  410 peter@eisentraut.org     5055                 :             50 : BootStrapXLOG(uint32 data_checksum_version)
                               5056                 :                : {
                               5057                 :                :     CheckPoint  checkPoint;
                               5058                 :                :     char       *buffer;
                               5059                 :                :     XLogPageHeader page;
                               5060                 :                :     XLogLongPageHeader longpage;
                               5061                 :                :     XLogRecord *record;
                               5062                 :                :     char       *recptr;
                               5063                 :                :     uint64      sysidentifier;
                               5064                 :                :     struct timeval tv;
                               5065                 :                :     pg_crc32c   crc;
                               5066                 :                : 
                               5067                 :                :     /* allow ordinary WAL segment creation, like StartupXLOG() would */
 1116 michael@paquier.xyz      5068                 :             50 :     SetInstallXLogFileSegmentActive();
                               5069                 :                : 
                               5070                 :                :     /*
                               5071                 :                :      * Select a hopefully-unique system identifier code for this installation.
                               5072                 :                :      * We use the result of gettimeofday(), including the fractional seconds
                               5073                 :                :      * field, as being about as unique as we can easily get.  (Think not to
                               5074                 :                :      * use random(), since it hasn't been seeded and there's no portable way
                               5075                 :                :      * to seed it other than the system clock value...)  The upper half of the
                               5076                 :                :      * uint64 value is just the tv_sec part, while the lower half contains the
                               5077                 :                :      * tv_usec part (which must fit in 20 bits), plus 12 bits from our current
                               5078                 :                :      * PID for a little extra uniqueness.  A person knowing this encoding can
                               5079                 :                :      * determine the initialization time of the installation, which could
                               5080                 :                :      * perhaps be useful sometimes.
                               5081                 :                :      */
 7878 tgl@sss.pgh.pa.us        5082                 :             50 :     gettimeofday(&tv, NULL);
                               5083                 :             50 :     sysidentifier = ((uint64) tv.tv_sec) << 32;
 4151                          5084                 :             50 :     sysidentifier |= ((uint64) tv.tv_usec) << 12;
                               5085                 :             50 :     sysidentifier |= getpid() & 0xFFF;
                               5086                 :                : 
                               5087                 :                :     /* page buffer must be aligned suitably for O_DIRECT */
 4443 heikki.linnakangas@i     5088                 :             50 :     buffer = (char *) palloc(XLOG_BLCKSZ + XLOG_BLCKSZ);
                               5089                 :             50 :     page = (XLogPageHeader) TYPEALIGN(XLOG_BLCKSZ, buffer);
 7096 tgl@sss.pgh.pa.us        5090                 :             50 :     memset(page, 0, XLOG_BLCKSZ);
                               5091                 :                : 
                               5092                 :                :     /*
                               5093                 :                :      * Set up information for the initial checkpoint record
                               5094                 :                :      *
                               5095                 :                :      * The initial checkpoint record is written to the beginning of the WAL
                               5096                 :                :      * segment with logid=0 logseg=1. The very first WAL segment, 0/0, is not
                               5097                 :                :      * used, so that we can use 0/0 to mean "before any valid WAL segment".
                               5098                 :                :      */
 2909 andres@anarazel.de       5099                 :             50 :     checkPoint.redo = wal_segment_size + SizeOfXLogLongPHD;
 1401 rhaas@postgresql.org     5100                 :             50 :     checkPoint.ThisTimeLineID = BootstrapTimeLineID;
                               5101                 :             50 :     checkPoint.PrevTimeLineID = BootstrapTimeLineID;
 4973 simon@2ndQuadrant.co     5102                 :             50 :     checkPoint.fullPageWrites = fullPageWrites;
  411 rhaas@postgresql.org     5103                 :             50 :     checkPoint.wal_level = wal_level;
                               5104                 :                :     checkPoint.nextXid =
 2354 tmunro@postgresql.or     5105                 :             50 :         FullTransactionIdFromEpochAndXid(0, FirstNormalTransactionId);
 1514 tgl@sss.pgh.pa.us        5106                 :             50 :     checkPoint.nextOid = FirstGenbkiObjectId;
 7436                          5107                 :             50 :     checkPoint.nextMulti = FirstMultiXactId;
 7395                          5108                 :             50 :     checkPoint.nextMultiOffset = 0;
 5850                          5109                 :             50 :     checkPoint.oldestXid = FirstNormalTransactionId;
 1234                          5110                 :             50 :     checkPoint.oldestXidDB = Template1DbOid;
 4609 alvherre@alvh.no-ip.     5111                 :             50 :     checkPoint.oldestMulti = FirstMultiXactId;
 1234 tgl@sss.pgh.pa.us        5112                 :             50 :     checkPoint.oldestMultiDB = Template1DbOid;
 3540 mail@joeconway.com       5113                 :             50 :     checkPoint.oldestCommitTsXid = InvalidTransactionId;
                               5114                 :             50 :     checkPoint.newestCommitTsXid = InvalidTransactionId;
 6411 tgl@sss.pgh.pa.us        5115                 :             50 :     checkPoint.time = (pg_time_t) time(NULL);
 5740 simon@2ndQuadrant.co     5116                 :             50 :     checkPoint.oldestActiveXid = InvalidTransactionId;
                               5117                 :                : 
  638 heikki.linnakangas@i     5118                 :             50 :     TransamVariables->nextXid = checkPoint.nextXid;
                               5119                 :             50 :     TransamVariables->nextOid = checkPoint.nextOid;
                               5120                 :             50 :     TransamVariables->oidCount = 0;
 7395 tgl@sss.pgh.pa.us        5121                 :             50 :     MultiXactSetNextMXact(checkPoint.nextMulti, checkPoint.nextMultiOffset);
 3089 rhaas@postgresql.org     5122                 :             50 :     AdvanceOldestClogXid(checkPoint.oldestXid);
 5680 tgl@sss.pgh.pa.us        5123                 :             50 :     SetTransactionIdLimit(checkPoint.oldestXid, checkPoint.oldestXidDB);
 3098                          5124                 :             50 :     SetMultiXactIdLimit(checkPoint.oldestMulti, checkPoint.oldestMultiDB, true);
 3930 alvherre@alvh.no-ip.     5125                 :             50 :     SetCommitTsLimit(InvalidTransactionId, InvalidTransactionId);
                               5126                 :                : 
                               5127                 :                :     /* Set up the XLOG page header */
 9467 vadim4o@yahoo.com        5128                 :             50 :     page->xlp_magic = XLOG_PAGE_MAGIC;
 7717 tgl@sss.pgh.pa.us        5129                 :             50 :     page->xlp_info = XLP_LONG_HEADER;
 1401 rhaas@postgresql.org     5130                 :             50 :     page->xlp_tli = BootstrapTimeLineID;
 2909 andres@anarazel.de       5131                 :             50 :     page->xlp_pageaddr = wal_segment_size;
 7717 tgl@sss.pgh.pa.us        5132                 :             50 :     longpage = (XLogLongPageHeader) page;
                               5133                 :             50 :     longpage->xlp_sysid = sysidentifier;
 2909 andres@anarazel.de       5134                 :             50 :     longpage->xlp_seg_size = wal_segment_size;
 7094 tgl@sss.pgh.pa.us        5135                 :             50 :     longpage->xlp_xlog_blcksz = XLOG_BLCKSZ;
                               5136                 :                : 
                               5137                 :                :     /* Insert the initial checkpoint record */
 3943 heikki.linnakangas@i     5138                 :             50 :     recptr = ((char *) page + SizeOfXLogLongPHD);
                               5139                 :             50 :     record = (XLogRecord *) recptr;
 4822                          5140                 :             50 :     record->xl_prev = 0;
 9467 vadim4o@yahoo.com        5141                 :             50 :     record->xl_xid = InvalidTransactionId;
 3943 heikki.linnakangas@i     5142                 :             50 :     record->xl_tot_len = SizeOfXLogRecord + SizeOfXLogRecordDataHeaderShort + sizeof(checkPoint);
 8943 tgl@sss.pgh.pa.us        5143                 :             50 :     record->xl_info = XLOG_CHECKPOINT_SHUTDOWN;
 9467 vadim4o@yahoo.com        5144                 :             50 :     record->xl_rmid = RM_XLOG_ID;
 3943 heikki.linnakangas@i     5145                 :             50 :     recptr += SizeOfXLogRecord;
                               5146                 :                :     /* fill the XLogRecordDataHeaderShort struct */
 3084 tgl@sss.pgh.pa.us        5147                 :             50 :     *(recptr++) = (char) XLR_BLOCK_ID_DATA_SHORT;
 3943 heikki.linnakangas@i     5148                 :             50 :     *(recptr++) = sizeof(checkPoint);
                               5149                 :             50 :     memcpy(recptr, &checkPoint, sizeof(checkPoint));
                               5150                 :             50 :     recptr += sizeof(checkPoint);
                               5151         [ -  + ]:             50 :     Assert(recptr - (char *) record == record->xl_tot_len);
                               5152                 :                : 
 3959                          5153                 :             50 :     INIT_CRC32C(crc);
 3943                          5154                 :             50 :     COMP_CRC32C(crc, ((char *) record) + SizeOfXLogRecord, record->xl_tot_len - SizeOfXLogRecord);
 3959                          5155                 :             50 :     COMP_CRC32C(crc, (char *) record, offsetof(XLogRecord, xl_crc));
                               5156                 :             50 :     FIN_CRC32C(crc);
 9018 vadim4o@yahoo.com        5157                 :             50 :     record->xl_crc = crc;
                               5158                 :                : 
                               5159                 :                :     /* Create first XLOG segment file */
 1401 rhaas@postgresql.org     5160                 :             50 :     openLogTLI = BootstrapTimeLineID;
                               5161                 :             50 :     openLogFile = XLogFileInit(1, BootstrapTimeLineID);
                               5162                 :                : 
                               5163                 :                :     /*
                               5164                 :                :      * We needn't bother with Reserve/ReleaseExternalFD here, since we'll
                               5165                 :                :      * close the file again in a moment.
                               5166                 :                :      */
                               5167                 :                : 
                               5168                 :                :     /* Write the first page with the initial record */
 8858 tgl@sss.pgh.pa.us        5169                 :             50 :     errno = 0;
 3094 rhaas@postgresql.org     5170                 :             50 :     pgstat_report_wait_start(WAIT_EVENT_WAL_BOOTSTRAP_WRITE);
 7096 tgl@sss.pgh.pa.us        5171         [ -  + ]:             50 :     if (write(openLogFile, page, XLOG_BLCKSZ) != XLOG_BLCKSZ)
                               5172                 :                :     {
                               5173                 :                :         /* if write didn't set errno, assume problem is no disk space */
 8858 tgl@sss.pgh.pa.us        5174         [ #  # ]:UBC           0 :         if (errno == 0)
                               5175                 :              0 :             errno = ENOSPC;
 8083                          5176         [ #  # ]:              0 :         ereport(PANIC,
                               5177                 :                :                 (errcode_for_file_access(),
                               5178                 :                :                  errmsg("could not write bootstrap write-ahead log file: %m")));
                               5179                 :                :     }
 3094 rhaas@postgresql.org     5180                 :CBC          50 :     pgstat_report_wait_end();
                               5181                 :                : 
                               5182                 :             50 :     pgstat_report_wait_start(WAIT_EVENT_WAL_BOOTSTRAP_SYNC);
 8943 tgl@sss.pgh.pa.us        5183         [ -  + ]:             50 :     if (pg_fsync(openLogFile) != 0)
 8083 tgl@sss.pgh.pa.us        5184         [ #  # ]:UBC           0 :         ereport(PANIC,
                               5185                 :                :                 (errcode_for_file_access(),
                               5186                 :                :                  errmsg("could not fsync bootstrap write-ahead log file: %m")));
 3094 rhaas@postgresql.org     5187                 :CBC          50 :     pgstat_report_wait_end();
                               5188                 :                : 
 2254 peter@eisentraut.org     5189         [ -  + ]:             50 :     if (close(openLogFile) != 0)
 7894 tgl@sss.pgh.pa.us        5190         [ #  # ]:UBC           0 :         ereport(PANIC,
                               5191                 :                :                 (errcode_for_file_access(),
                               5192                 :                :                  errmsg("could not close bootstrap write-ahead log file: %m")));
                               5193                 :                : 
 8943 tgl@sss.pgh.pa.us        5194                 :CBC          50 :     openLogFile = -1;
                               5195                 :                : 
                               5196                 :                :     /* Now create pg_control */
  410 peter@eisentraut.org     5197                 :             50 :     InitControlFile(sysidentifier, data_checksum_version);
 8943 tgl@sss.pgh.pa.us        5198                 :             50 :     ControlFile->time = checkPoint.time;
 9467 vadim4o@yahoo.com        5199                 :             50 :     ControlFile->checkPoint = checkPoint.redo;
 8943 tgl@sss.pgh.pa.us        5200                 :             50 :     ControlFile->checkPointCopy = checkPoint;
                               5201                 :                : 
                               5202                 :                :     /* some additional ControlFile fields are set in WriteControlFile() */
 9051                          5203                 :             50 :     WriteControlFile();
                               5204                 :                : 
                               5205                 :                :     /* Bootstrap the commit log, too */
 8778                          5206                 :             50 :     BootStrapCLOG();
 3930 alvherre@alvh.no-ip.     5207                 :             50 :     BootStrapCommitTs();
 7737 tgl@sss.pgh.pa.us        5208                 :             50 :     BootStrapSUBTRANS();
 7436                          5209                 :             50 :     BootStrapMultiXact();
                               5210                 :                : 
 7322                          5211                 :             50 :     pfree(buffer);
                               5212                 :                : 
                               5213                 :                :     /*
                               5214                 :                :      * Force control file to be read - in contrast to normal processing we'd
                               5215                 :                :      * otherwise never run the checks and GUC related initializations therein.
                               5216                 :                :      */
 2915 andres@anarazel.de       5217                 :             50 :     ReadControlFile();
 9467 vadim4o@yahoo.com        5218                 :             50 : }
                               5219                 :                : 
                               5220                 :                : static char *
   35 tgl@sss.pgh.pa.us        5221                 :GNC         777 : str_time(pg_time_t tnow, char *buf, size_t bufsize)
                               5222                 :                : {
                               5223                 :            777 :     pg_strftime(buf, bufsize,
                               5224                 :                :                 "%Y-%m-%d %H:%M:%S %Z",
 6608 tgl@sss.pgh.pa.us        5225                 :CBC         777 :                 pg_localtime(&tnow, log_timezone));
                               5226                 :                : 
 9055 peter_e@gmx.net          5227                 :            777 :     return buf;
                               5228                 :                : }
                               5229                 :                : 
                               5230                 :                : /*
                               5231                 :                :  * Initialize the first WAL segment on new timeline.
                               5232                 :                :  */
                               5233                 :                : static void
 1298 heikki.linnakangas@i     5234                 :             47 : XLogInitNewTimeline(TimeLineID endTLI, XLogRecPtr endOfLog, TimeLineID newTLI)
                               5235                 :                : {
                               5236                 :                :     char        xlogfname[MAXFNAMELEN];
                               5237                 :                :     XLogSegNo   endLogSegNo;
                               5238                 :                :     XLogSegNo   startLogSegNo;
                               5239                 :                : 
                               5240                 :                :     /* we always switch to a new timeline after archive recovery */
 1401 rhaas@postgresql.org     5241         [ -  + ]:             47 :     Assert(endTLI != newTLI);
                               5242                 :                : 
                               5243                 :                :     /*
                               5244                 :                :      * Update min recovery point one last time.
                               5245                 :                :      */
 5917 heikki.linnakangas@i     5246                 :             47 :     UpdateMinRecoveryPoint(InvalidXLogRecPtr, true);
                               5247                 :                : 
                               5248                 :                :     /*
                               5249                 :                :      * Calculate the last segment on the old timeline, and the first segment
                               5250                 :                :      * on the new timeline. If the switch happens in the middle of a segment,
                               5251                 :                :      * they are the same, but if the switch happens exactly at a segment
                               5252                 :                :      * boundary, startLogSegNo will be endLogSegNo + 1.
                               5253                 :                :      */
 2909 andres@anarazel.de       5254                 :             47 :     XLByteToPrevSeg(endOfLog, endLogSegNo, wal_segment_size);
                               5255                 :             47 :     XLByteToSeg(endOfLog, startLogSegNo, wal_segment_size);
                               5256                 :                : 
                               5257                 :                :     /*
                               5258                 :                :      * Initialize the starting WAL segment for the new timeline. If the switch
                               5259                 :                :      * happens in the middle of a segment, copy data from the last WAL segment
                               5260                 :                :      * of the old timeline up to the switch point, to the starting WAL segment
                               5261                 :                :      * on the new timeline.
                               5262                 :                :      */
 3915 heikki.linnakangas@i     5263         [ +  + ]:             47 :     if (endLogSegNo == startLogSegNo)
                               5264                 :                :     {
                               5265                 :                :         /*
                               5266                 :                :          * Make a copy of the file on the new timeline.
                               5267                 :                :          *
                               5268                 :                :          * Writing WAL isn't allowed yet, so there are no locking
                               5269                 :                :          * considerations. But we should be just as tense as XLogFileInit to
                               5270                 :                :          * avoid emplacing a bogus file.
                               5271                 :                :          */
 1401 rhaas@postgresql.org     5272                 :             37 :         XLogFileCopy(newTLI, endLogSegNo, endTLI, endLogSegNo,
 2909 andres@anarazel.de       5273                 :             37 :                      XLogSegmentOffset(endOfLog, wal_segment_size));
                               5274                 :                :     }
                               5275                 :                :     else
                               5276                 :                :     {
                               5277                 :                :         /*
                               5278                 :                :          * The switch happened at a segment boundary, so just create the next
                               5279                 :                :          * segment on the new timeline.
                               5280                 :                :          */
                               5281                 :                :         int         fd;
                               5282                 :                : 
 1401 rhaas@postgresql.org     5283                 :             10 :         fd = XLogFileInit(startLogSegNo, newTLI);
                               5284                 :                : 
 2254 peter@eisentraut.org     5285         [ -  + ]:             10 :         if (close(fd) != 0)
                               5286                 :                :         {
 2104 michael@paquier.xyz      5287                 :UBC           0 :             int         save_errno = errno;
                               5288                 :                : 
 1401 rhaas@postgresql.org     5289                 :              0 :             XLogFileName(xlogfname, newTLI, startLogSegNo, wal_segment_size);
 2104 michael@paquier.xyz      5290                 :              0 :             errno = save_errno;
 3912 heikki.linnakangas@i     5291         [ #  # ]:              0 :             ereport(ERROR,
                               5292                 :                :                     (errcode_for_file_access(),
                               5293                 :                :                      errmsg("could not close file \"%s\": %m", xlogfname)));
                               5294                 :                :         }
                               5295                 :                :     }
                               5296                 :                : 
                               5297                 :                :     /*
                               5298                 :                :      * Let's just make real sure there are not .ready or .done flags posted
                               5299                 :                :      * for the new segment.
                               5300                 :                :      */
 1401 rhaas@postgresql.org     5301                 :CBC          47 :     XLogFileName(xlogfname, newTLI, startLogSegNo, wal_segment_size);
 3971 fujii@postgresql.org     5302                 :             47 :     XLogArchiveCleanup(xlogfname);
 7719 tgl@sss.pgh.pa.us        5303                 :             47 : }
                               5304                 :                : 
                               5305                 :                : /*
                               5306                 :                :  * Perform cleanup actions at the conclusion of archive recovery.
                               5307                 :                :  */
                               5308                 :                : static void
 1401 rhaas@postgresql.org     5309                 :             47 : CleanupAfterArchiveRecovery(TimeLineID EndOfLogTLI, XLogRecPtr EndOfLog,
                               5310                 :                :                             TimeLineID newTLI)
                               5311                 :                : {
                               5312                 :                :     /*
                               5313                 :                :      * Execute the recovery_end_command, if any.
                               5314                 :                :      */
 1424                          5315   [ +  -  +  + ]:             47 :     if (recoveryEndCommand && strcmp(recoveryEndCommand, "") != 0)
  943 michael@paquier.xyz      5316                 :              2 :         ExecuteRecoveryCommand(recoveryEndCommand,
                               5317                 :                :                                "recovery_end_command",
                               5318                 :                :                                true,
                               5319                 :                :                                WAIT_EVENT_RECOVERY_END_COMMAND);
                               5320                 :                : 
                               5321                 :                :     /*
                               5322                 :                :      * We switched to a new timeline. Clean up segments on the old timeline.
                               5323                 :                :      *
                               5324                 :                :      * If there are any higher-numbered segments on the old timeline, remove
                               5325                 :                :      * them. They might contain valid WAL, but they might also be
                               5326                 :                :      * pre-allocated files containing garbage. In any case, they are not part
                               5327                 :                :      * of the new timeline's history so we don't need them.
                               5328                 :                :      */
 1401 rhaas@postgresql.org     5329                 :             47 :     RemoveNonParentXlogFiles(EndOfLog, newTLI);
                               5330                 :                : 
                               5331                 :                :     /*
                               5332                 :                :      * If the switch happened in the middle of a segment, what to do with the
                               5333                 :                :      * last, partial segment on the old timeline? If we don't archive it, and
                               5334                 :                :      * the server that created the WAL never archives it either (e.g. because
                               5335                 :                :      * it was hit by a meteor), it will never make it to the archive. That's
                               5336                 :                :      * OK from our point of view, because the new segment that we created with
                               5337                 :                :      * the new TLI contains all the WAL from the old timeline up to the switch
                               5338                 :                :      * point. But if you later try to do PITR to the "missing" WAL on the old
                               5339                 :                :      * timeline, recovery won't find it in the archive. It's physically
                               5340                 :                :      * present in the new file with new TLI, but recovery won't look there
                               5341                 :                :      * when it's recovering to the older timeline. On the other hand, if we
                               5342                 :                :      * archive the partial segment, and the original server on that timeline
                               5343                 :                :      * is still running and archives the completed version of the same segment
                               5344                 :                :      * later, it will fail. (We used to do that in 9.4 and below, and it
                               5345                 :                :      * caused such problems).
                               5346                 :                :      *
                               5347                 :                :      * As a compromise, we rename the last segment with the .partial suffix,
                               5348                 :                :      * and archive it. Archive recovery will never try to read .partial
                               5349                 :                :      * segments, so they will normally go unused. But in the odd PITR case,
                               5350                 :                :      * the administrator can copy them manually to the pg_wal directory
                               5351                 :                :      * (removing the suffix). They can be useful in debugging, too.
                               5352                 :                :      *
                               5353                 :                :      * If a .done or .ready file already exists for the old timeline, however,
                               5354                 :                :      * we had already determined that the segment is complete, so we can let
                               5355                 :                :      * it be archived normally. (In particular, if it was restored from the
                               5356                 :                :      * archive to begin with, it's expected to have a .done file).
                               5357                 :                :      */
 1424                          5358   [ +  +  +  + ]:             84 :     if (XLogSegmentOffset(EndOfLog, wal_segment_size) != 0 &&
                               5359   [ +  +  -  + ]:             37 :         XLogArchivingActive())
                               5360                 :                :     {
                               5361                 :                :         char        origfname[MAXFNAMELEN];
                               5362                 :                :         XLogSegNo   endLogSegNo;
                               5363                 :                : 
                               5364                 :              9 :         XLByteToPrevSeg(EndOfLog, endLogSegNo, wal_segment_size);
                               5365                 :              9 :         XLogFileName(origfname, EndOfLogTLI, endLogSegNo, wal_segment_size);
                               5366                 :                : 
                               5367         [ +  + ]:              9 :         if (!XLogArchiveIsReadyOrDone(origfname))
                               5368                 :                :         {
                               5369                 :                :             char        origpath[MAXPGPATH];
                               5370                 :                :             char        partialfname[MAXFNAMELEN];
                               5371                 :                :             char        partialpath[MAXPGPATH];
                               5372                 :                : 
                               5373                 :                :             /*
                               5374                 :                :              * If we're summarizing WAL, we can't rename the partial file
                               5375                 :                :              * until the summarizer finishes with it, else it will fail.
                               5376                 :                :              */
  407                          5377         [ +  + ]:              5 :             if (summarize_wal)
                               5378                 :              1 :                 WaitForWalSummarization(EndOfLog);
                               5379                 :                : 
 1424                          5380                 :              5 :             XLogFilePath(origpath, EndOfLogTLI, endLogSegNo, wal_segment_size);
                               5381                 :              5 :             snprintf(partialfname, MAXFNAMELEN, "%s.partial", origfname);
                               5382                 :              5 :             snprintf(partialpath, MAXPGPATH, "%s.partial", origpath);
                               5383                 :                : 
                               5384                 :                :             /*
                               5385                 :                :              * Make sure there's no .done or .ready file for the .partial
                               5386                 :                :              * file.
                               5387                 :                :              */
                               5388                 :              5 :             XLogArchiveCleanup(partialfname);
                               5389                 :                : 
                               5390                 :              5 :             durable_rename(origpath, partialpath, ERROR);
                               5391                 :              5 :             XLogArchiveNotify(partialfname);
                               5392                 :                :         }
                               5393                 :                :     }
                               5394                 :             47 : }
                               5395                 :                : 
                               5396                 :                : /*
                               5397                 :                :  * Check to see if required parameters are set high enough on this server
                               5398                 :                :  * for various aspects of recovery operation.
                               5399                 :                :  *
                               5400                 :                :  * Note that all the parameters which this function tests need to be
                               5401                 :                :  * listed in Administrator's Overview section in high-availability.sgml.
                               5402                 :                :  * If you change them, don't forget to update the list.
                               5403                 :                :  */
                               5404                 :                : static void
 1298 heikki.linnakangas@i     5405                 :            234 : CheckRequiredParameterValues(void)
                               5406                 :                : {
                               5407                 :                :     /*
                               5408                 :                :      * For archive recovery, the WAL must be generated with at least 'replica'
                               5409                 :                :      * wal_level.
                               5410                 :                :      */
                               5411   [ +  +  +  + ]:            234 :     if (ArchiveRecoveryRequested && ControlFile->wal_level == WAL_LEVEL_MINIMAL)
                               5412                 :                :     {
                               5413         [ +  - ]:              2 :         ereport(FATAL,
                               5414                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               5415                 :                :                  errmsg("WAL was generated with \"wal_level=minimal\", cannot continue recovering"),
                               5416                 :                :                  errdetail("This happens if you temporarily set \"wal_level=minimal\" on the server."),
                               5417                 :                :                  errhint("Use a backup taken after setting \"wal_level\" to higher than \"minimal\".")));
                               5418                 :                :     }
                               5419                 :                : 
                               5420                 :                :     /*
                               5421                 :                :      * For Hot Standby, the WAL must be generated with 'replica' mode, and we
                               5422                 :                :      * must have at least as many backend slots as the primary.
                               5423                 :                :      */
 4203                          5424   [ +  +  +  + ]:            232 :     if (ArchiveRecoveryRequested && EnableHotStandby)
                               5425                 :                :     {
                               5426                 :                :         /* We ignore autovacuum_worker_slots when we make this test. */
 5610                          5427                 :            115 :         RecoveryRequiresIntParameter("max_connections",
                               5428                 :                :                                      MaxConnections,
 5609 tgl@sss.pgh.pa.us        5429                 :            115 :                                      ControlFile->MaxConnections);
 4447 rhaas@postgresql.org     5430                 :            115 :         RecoveryRequiresIntParameter("max_worker_processes",
                               5431                 :                :                                      max_worker_processes,
                               5432                 :            115 :                                      ControlFile->max_worker_processes);
 2398 michael@paquier.xyz      5433                 :            115 :         RecoveryRequiresIntParameter("max_wal_senders",
                               5434                 :                :                                      max_wal_senders,
                               5435                 :            115 :                                      ControlFile->max_wal_senders);
 4749 tgl@sss.pgh.pa.us        5436                 :            115 :         RecoveryRequiresIntParameter("max_prepared_transactions",
                               5437                 :                :                                      max_prepared_xacts,
 5609                          5438                 :            115 :                                      ControlFile->max_prepared_xacts);
 4749                          5439                 :            115 :         RecoveryRequiresIntParameter("max_locks_per_transaction",
                               5440                 :                :                                      max_locks_per_xact,
 5609                          5441                 :            115 :                                      ControlFile->max_locks_per_xact);
                               5442                 :                :     }
 5740 simon@2ndQuadrant.co     5443                 :            232 : }
                               5444                 :                : 
                               5445                 :                : /*
                               5446                 :                :  * This must be called ONCE during postmaster or standalone-backend startup
                               5447                 :                :  */
                               5448                 :                : void
 8943 tgl@sss.pgh.pa.us        5449                 :            887 : StartupXLOG(void)
                               5450                 :                : {
                               5451                 :                :     XLogCtlInsert *Insert;
                               5452                 :                :     CheckPoint  checkPoint;
                               5453                 :                :     bool        wasShutdown;
                               5454                 :                :     bool        didCrash;
                               5455                 :                :     bool        haveTblspcMap;
                               5456                 :                :     bool        haveBackupLabel;
                               5457                 :                :     XLogRecPtr  EndOfLog;
                               5458                 :                :     TimeLineID  EndOfLogTLI;
                               5459                 :                :     TimeLineID  newTLI;
                               5460                 :                :     bool        performedWalRecovery;
                               5461                 :                :     EndOfWalRecoveryInfo *endOfRecoveryInfo;
                               5462                 :                :     XLogRecPtr  abortedRecPtr;
                               5463                 :                :     XLogRecPtr  missingContrecPtr;
                               5464                 :                :     TransactionId oldestActiveXID;
 1865 fujii@postgresql.org     5465                 :            887 :     bool        promoted = false;
                               5466                 :                :     char        timebuf[128];
                               5467                 :                : 
                               5468                 :                :     /*
                               5469                 :                :      * We should have an aux process resource owner to use, and we should not
                               5470                 :                :      * be in a transaction that's installed some other resowner.
                               5471                 :                :      */
 2607 tgl@sss.pgh.pa.us        5472         [ -  + ]:            887 :     Assert(AuxProcessResourceOwner != NULL);
                               5473   [ +  -  -  + ]:            887 :     Assert(CurrentResourceOwner == NULL ||
                               5474                 :                :            CurrentResourceOwner == AuxProcessResourceOwner);
                               5475                 :            887 :     CurrentResourceOwner = AuxProcessResourceOwner;
                               5476                 :                : 
                               5477                 :                :     /*
                               5478                 :                :      * Check that contents look valid.
                               5479                 :                :      */
 2129 peter@eisentraut.org     5480         [ -  + ]:            887 :     if (!XRecOffIsValid(ControlFile->checkPoint))
 8083 tgl@sss.pgh.pa.us        5481         [ #  # ]:UBC           0 :         ereport(FATAL,
                               5482                 :                :                 (errcode(ERRCODE_DATA_CORRUPTED),
                               5483                 :                :                  errmsg("control file contains invalid checkpoint location")));
                               5484                 :                : 
 2129 peter@eisentraut.org     5485   [ +  +  -  -  :CBC         887 :     switch (ControlFile->state)
                                           +  +  - ]
                               5486                 :                :     {
                               5487                 :            690 :         case DB_SHUTDOWNED:
                               5488                 :                : 
                               5489                 :                :             /*
                               5490                 :                :              * This is the expected case, so don't be chatty in standalone
                               5491                 :                :              * mode
                               5492                 :                :              */
                               5493   [ +  +  +  + ]:            690 :             ereport(IsPostmasterEnvironment ? LOG : NOTICE,
                               5494                 :                :                     (errmsg("database system was shut down at %s",
                               5495                 :                :                             str_time(ControlFile->time,
                               5496                 :                :                                      timebuf, sizeof(timebuf)))));
                               5497                 :            690 :             break;
                               5498                 :                : 
                               5499                 :             28 :         case DB_SHUTDOWNED_IN_RECOVERY:
                               5500         [ +  - ]:             28 :             ereport(LOG,
                               5501                 :                :                     (errmsg("database system was shut down in recovery at %s",
                               5502                 :                :                             str_time(ControlFile->time,
                               5503                 :                :                                      timebuf, sizeof(timebuf)))));
                               5504                 :             28 :             break;
                               5505                 :                : 
 2129 peter@eisentraut.org     5506                 :UBC           0 :         case DB_SHUTDOWNING:
                               5507         [ #  # ]:              0 :             ereport(LOG,
                               5508                 :                :                     (errmsg("database system shutdown was interrupted; last known up at %s",
                               5509                 :                :                             str_time(ControlFile->time,
                               5510                 :                :                                      timebuf, sizeof(timebuf)))));
                               5511                 :              0 :             break;
                               5512                 :                : 
                               5513                 :              0 :         case DB_IN_CRASH_RECOVERY:
                               5514         [ #  # ]:              0 :             ereport(LOG,
                               5515                 :                :                     (errmsg("database system was interrupted while in recovery at %s",
                               5516                 :                :                             str_time(ControlFile->time,
                               5517                 :                :                                      timebuf, sizeof(timebuf))),
                               5518                 :                :                      errhint("This probably means that some data is corrupted and"
                               5519                 :                :                              " you will have to use the last backup for recovery.")));
                               5520                 :              0 :             break;
                               5521                 :                : 
 2129 peter@eisentraut.org     5522                 :CBC           6 :         case DB_IN_ARCHIVE_RECOVERY:
                               5523         [ +  - ]:              6 :             ereport(LOG,
                               5524                 :                :                     (errmsg("database system was interrupted while in recovery at log time %s",
                               5525                 :                :                             str_time(ControlFile->checkPointCopy.time,
                               5526                 :                :                                      timebuf, sizeof(timebuf))),
                               5527                 :                :                      errhint("If this has occurred more than once some data might be corrupted"
                               5528                 :                :                              " and you might need to choose an earlier recovery target.")));
                               5529                 :              6 :             break;
                               5530                 :                : 
                               5531                 :            163 :         case DB_IN_PRODUCTION:
                               5532         [ +  - ]:            163 :             ereport(LOG,
                               5533                 :                :                     (errmsg("database system was interrupted; last known up at %s",
                               5534                 :                :                             str_time(ControlFile->time,
                               5535                 :                :                                      timebuf, sizeof(timebuf)))));
                               5536                 :            163 :             break;
                               5537                 :                : 
 2129 peter@eisentraut.org     5538                 :UBC           0 :         default:
                               5539         [ #  # ]:              0 :             ereport(FATAL,
                               5540                 :                :                     (errcode(ERRCODE_DATA_CORRUPTED),
                               5541                 :                :                      errmsg("control file contains invalid database cluster state")));
                               5542                 :                :     }
                               5543                 :                : 
                               5544                 :                :     /* This is just to allow attaching to startup process with a debugger */
                               5545                 :                : #ifdef XLOG_REPLAY_DELAY
                               5546                 :                :     if (ControlFile->state != DB_SHUTDOWNED)
                               5547                 :                :         pg_usleep(60000000L);
                               5548                 :                : #endif
                               5549                 :                : 
                               5550                 :                :     /*
                               5551                 :                :      * Verify that pg_wal, pg_wal/archive_status, and pg_wal/summaries exist.
                               5552                 :                :      * In cases where someone has performed a copy for PITR, these directories
                               5553                 :                :      * may have been excluded and need to be re-created.
                               5554                 :                :      */
 6145 tgl@sss.pgh.pa.us        5555                 :CBC         887 :     ValidateXLOGDirectoryStructure();
                               5556                 :                : 
                               5557                 :                :     /* Set up timeout handler needed to report startup progress. */
 1412 rhaas@postgresql.org     5558         [ +  + ]:            887 :     if (!IsBootstrapProcessingMode())
                               5559                 :            837 :         RegisterTimeout(STARTUP_PROGRESS_TIMEOUT,
                               5560                 :                :                         startup_progress_timeout_handler);
                               5561                 :                : 
                               5562                 :                :     /*----------
                               5563                 :                :      * If we previously crashed, perform a couple of actions:
                               5564                 :                :      *
                               5565                 :                :      * - The pg_wal directory may still include some temporary WAL segments
                               5566                 :                :      *   used when creating a new segment, so perform some clean up to not
                               5567                 :                :      *   bloat this path.  This is done first as there is no point to sync
                               5568                 :                :      *   this temporary data.
                               5569                 :                :      *
                               5570                 :                :      * - There might be data which we had written, intending to fsync it, but
                               5571                 :                :      *   which we had not actually fsync'd yet.  Therefore, a power failure in
                               5572                 :                :      *   the near future might cause earlier unflushed writes to be lost, even
                               5573                 :                :      *   though more recent data written to disk from here on would be
                               5574                 :                :      *   persisted.  To avoid that, fsync the entire data directory.
                               5575                 :                :      */
 1298 heikki.linnakangas@i     5576         [ +  + ]:            887 :     if (ControlFile->state != DB_SHUTDOWNED &&
                               5577         [ +  + ]:            197 :         ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
                               5578                 :                :     {
                               5579                 :            169 :         RemoveTempXlogFiles();
                               5580                 :            169 :         SyncDataDirectory();
 1249 andres@anarazel.de       5581                 :            169 :         didCrash = true;
                               5582                 :                :     }
                               5583                 :                :     else
                               5584                 :            718 :         didCrash = false;
                               5585                 :                : 
                               5586                 :                :     /*
                               5587                 :                :      * Prepare for WAL recovery if needed.
                               5588                 :                :      *
                               5589                 :                :      * InitWalRecovery analyzes the control file and the backup label file, if
                               5590                 :                :      * any.  It updates the in-memory ControlFile buffer according to the
                               5591                 :                :      * starting checkpoint, and sets InRecovery and ArchiveRecoveryRequested.
                               5592                 :                :      * It also applies the tablespace map file, if any.
                               5593                 :                :      */
 1298 heikki.linnakangas@i     5594                 :            887 :     InitWalRecovery(ControlFile, &wasShutdown,
                               5595                 :                :                     &haveBackupLabel, &haveTblspcMap);
                               5596                 :            887 :     checkPoint = ControlFile->checkPointCopy;
                               5597                 :                : 
                               5598                 :                :     /* initialize shared memory variables from the checkpoint record */
  638                          5599                 :            887 :     TransamVariables->nextXid = checkPoint.nextXid;
                               5600                 :            887 :     TransamVariables->nextOid = checkPoint.nextOid;
                               5601                 :            887 :     TransamVariables->oidCount = 0;
 7395 tgl@sss.pgh.pa.us        5602                 :            887 :     MultiXactSetNextMXact(checkPoint.nextMulti, checkPoint.nextMultiOffset);
 3089 rhaas@postgresql.org     5603                 :            887 :     AdvanceOldestClogXid(checkPoint.oldestXid);
 5680 tgl@sss.pgh.pa.us        5604                 :            887 :     SetTransactionIdLimit(checkPoint.oldestXid, checkPoint.oldestXidDB);
 3098                          5605                 :            887 :     SetMultiXactIdLimit(checkPoint.oldestMulti, checkPoint.oldestMultiDB, true);
 3540 mail@joeconway.com       5606                 :            887 :     SetCommitTsLimit(checkPoint.oldestCommitTsXid,
                               5607                 :                :                      checkPoint.newestCommitTsXid);
                               5608                 :                : 
                               5609                 :                :     /*
                               5610                 :                :      * Clear out any old relcache cache files.  This is *necessary* if we do
                               5611                 :                :      * any WAL replay, since that would probably result in the cache files
                               5612                 :                :      * being out of sync with database reality.  In theory we could leave them
                               5613                 :                :      * in place if the database had been cleanly shut down, but it seems
                               5614                 :                :      * safest to just remove them always and let them be rebuilt during the
                               5615                 :                :      * first backend startup.  These files needs to be removed from all
                               5616                 :                :      * directories including pg_tblspc, however the symlinks are created only
                               5617                 :                :      * after reading tablespace_map file in case of archive recovery from
                               5618                 :                :      * backup, so needs to clear old relcache files here after creating
                               5619                 :                :      * symlinks.
                               5620                 :                :      */
 1298 heikki.linnakangas@i     5621                 :            887 :     RelationCacheInitFileRemove();
                               5622                 :                : 
                               5623                 :                :     /*
                               5624                 :                :      * Initialize replication slots, before there's a chance to remove
                               5625                 :                :      * required resources.
                               5626                 :                :      */
 4104 andres@anarazel.de       5627                 :            887 :     StartupReplicationSlots();
                               5628                 :                : 
                               5629                 :                :     /*
                               5630                 :                :      * Startup logical state, needs to be setup now so we have proper data
                               5631                 :                :      * during crash recovery.
                               5632                 :                :      */
 4205 rhaas@postgresql.org     5633                 :            887 :     StartupReorderBuffer();
                               5634                 :                : 
                               5635                 :                :     /*
                               5636                 :                :      * Startup CLOG. This must be done after TransamVariables->nextXid has
                               5637                 :                :      * been initialized and before we accept connections or begin WAL replay.
                               5638                 :                :      */
 1683                          5639                 :            887 :     StartupCLOG();
                               5640                 :                : 
                               5641                 :                :     /*
                               5642                 :                :      * Startup MultiXact. We need to do this early to be able to replay
                               5643                 :                :      * truncations.
                               5644                 :                :      */
 4299 alvherre@alvh.no-ip.     5645                 :            887 :     StartupMultiXact();
                               5646                 :                : 
                               5647                 :                :     /*
                               5648                 :                :      * Ditto for commit timestamps.  Activate the facility if the setting is
                               5649                 :                :      * enabled in the control file, as there should be no tracking of commit
                               5650                 :                :      * timestamps done when the setting was disabled.  This facility can be
                               5651                 :                :      * started or stopped when replaying a XLOG_PARAMETER_CHANGE record.
                               5652                 :                :      */
 2537 michael@paquier.xyz      5653         [ +  + ]:            887 :     if (ControlFile->track_commit_timestamp)
 3557 alvherre@alvh.no-ip.     5654                 :             13 :         StartupCommitTs();
                               5655                 :                : 
                               5656                 :                :     /*
                               5657                 :                :      * Recover knowledge about replay progress of known replication partners.
                               5658                 :                :      */
 3783 andres@anarazel.de       5659                 :            887 :     StartupReplicationOrigin();
                               5660                 :                : 
                               5661                 :                :     /*
                               5662                 :                :      * Initialize unlogged LSN. On a clean shutdown, it's restored from the
                               5663                 :                :      * control file. On recovery, all unlogged relations are blown away, so
                               5664                 :                :      * the unlogged LSN counter can be reset too.
                               5665                 :                :      */
 4590 heikki.linnakangas@i     5666         [ +  + ]:            887 :     if (ControlFile->state == DB_SHUTDOWNED)
  555 nathan@postgresql.or     5667                 :            683 :         pg_atomic_write_membarrier_u64(&XLogCtl->unloggedLSN,
                               5668                 :            683 :                                        ControlFile->unloggedLSN);
                               5669                 :                :     else
                               5670                 :            204 :         pg_atomic_write_membarrier_u64(&XLogCtl->unloggedLSN,
                               5671                 :                :                                        FirstNormalUnloggedLSN);
                               5672                 :                : 
                               5673                 :                :     /*
                               5674                 :                :      * Copy any missing timeline history files between 'now' and the recovery
                               5675                 :                :      * target timeline from archive to pg_wal. While we don't need those files
                               5676                 :                :      * ourselves - the history file of the recovery target timeline covers all
                               5677                 :                :      * the previous timelines in the history too - a cascading standby server
                               5678                 :                :      * might be interested in them. Or, if you archive the WAL from this
                               5679                 :                :      * server to a different archive than the primary, it'd be good for all
                               5680                 :                :      * the history files to get archived there after failover, so that you can
                               5681                 :                :      * use one of the old timelines as a PITR target. Timeline history files
                               5682                 :                :      * are small, so it's better to copy them unnecessarily than not copy them
                               5683                 :                :      * and regret later.
                               5684                 :                :      */
 1298 heikki.linnakangas@i     5685                 :            887 :     restoreTimeLineHistoryFiles(checkPoint.ThisTimeLineID, recoveryTargetTLI);
                               5686                 :                : 
                               5687                 :                :     /*
                               5688                 :                :      * Before running in recovery, scan pg_twophase and fill in its status to
                               5689                 :                :      * be able to work on entries generated by redo.  Doing a scan before
                               5690                 :                :      * taking any recovery action has the merit to discard any 2PC files that
                               5691                 :                :      * are newer than the first record to replay, saving from any conflicts at
                               5692                 :                :      * replay.  This avoids as well any subsequent scans when doing recovery
                               5693                 :                :      * of the on-disk two-phase data.
                               5694                 :                :      */
 3077 simon@2ndQuadrant.co     5695                 :            887 :     restoreTwoPhaseData();
                               5696                 :                : 
                               5697                 :                :     /*
                               5698                 :                :      * When starting with crash recovery, reset pgstat data - it might not be
                               5699                 :                :      * valid. Otherwise restore pgstat data. It's safe to do this here,
                               5700                 :                :      * because postmaster will not yet have started any other processes.
                               5701                 :                :      *
                               5702                 :                :      * NB: Restoring replication slot stats relies on slot state to have
                               5703                 :                :      * already been restored from disk.
                               5704                 :                :      *
                               5705                 :                :      * TODO: With a bit of extra work we could just start with a pgstat file
                               5706                 :                :      * associated with the checkpoint redo location we're starting from.
                               5707                 :                :      */
 1249 andres@anarazel.de       5708         [ +  + ]:            887 :     if (didCrash)
                               5709                 :            169 :         pgstat_discard_stats();
                               5710                 :                :     else
  173 michael@paquier.xyz      5711                 :            718 :         pgstat_restore_stats();
                               5712                 :                : 
 4973 simon@2ndQuadrant.co     5713                 :            887 :     lastFullPageWrites = checkPoint.fullPageWrites;
                               5714                 :                : 
 4443 heikki.linnakangas@i     5715                 :            887 :     RedoRecPtr = XLogCtl->RedoRecPtr = XLogCtl->Insert.RedoRecPtr = checkPoint.redo;
 3957                          5716                 :            887 :     doPageWrites = lastFullPageWrites;
                               5717                 :                : 
                               5718                 :                :     /* REDO */
 7703 tgl@sss.pgh.pa.us        5719         [ +  + ]:            887 :     if (InRecovery)
                               5720                 :                :     {
                               5721                 :                :         /* Initialize state for RecoveryInProgress() */
 1298 heikki.linnakangas@i     5722         [ -  + ]:            204 :         SpinLockAcquire(&XLogCtl->info_lck);
                               5723         [ +  + ]:            204 :         if (InArchiveRecovery)
                               5724                 :            106 :             XLogCtl->SharedRecoveryState = RECOVERY_STATE_ARCHIVE;
                               5725                 :                :         else
                               5726                 :             98 :             XLogCtl->SharedRecoveryState = RECOVERY_STATE_CRASH;
                               5727                 :            204 :         SpinLockRelease(&XLogCtl->info_lck);
                               5728                 :                : 
                               5729                 :                :         /*
                               5730                 :                :          * Update pg_control to show that we are recovering and to show the
                               5731                 :                :          * selected checkpoint as the place we are starting from. We also mark
                               5732                 :                :          * pg_control with any minimum recovery stop point obtained from a
                               5733                 :                :          * backup history file.
                               5734                 :                :          *
                               5735                 :                :          * No need to hold ControlFileLock yet, we aren't up far enough.
                               5736                 :                :          */
                               5737                 :            204 :         UpdateControlFile();
                               5738                 :                : 
                               5739                 :                :         /*
                               5740                 :                :          * If there was a backup label file, it's done its job and the info
                               5741                 :                :          * has now been propagated into pg_control.  We must get rid of the
                               5742                 :                :          * label file so that if we crash during recovery, we'll pick up at
                               5743                 :                :          * the latest recovery restartpoint instead of going all the way back
                               5744                 :                :          * to the backup start point.  It seems prudent though to just rename
                               5745                 :                :          * the file out of the way rather than delete it completely.
                               5746                 :                :          */
                               5747         [ +  + ]:            204 :         if (haveBackupLabel)
                               5748                 :                :         {
                               5749                 :             69 :             unlink(BACKUP_LABEL_OLD);
                               5750                 :             69 :             durable_rename(BACKUP_LABEL_FILE, BACKUP_LABEL_OLD, FATAL);
                               5751                 :                :         }
                               5752                 :                : 
                               5753                 :                :         /*
                               5754                 :                :          * If there was a tablespace_map file, it's done its job and the
                               5755                 :                :          * symlinks have been created.  We must get rid of the map file so
                               5756                 :                :          * that if we crash during recovery, we don't create symlinks again.
                               5757                 :                :          * It seems prudent though to just rename the file out of the way
                               5758                 :                :          * rather than delete it completely.
                               5759                 :                :          */
                               5760         [ +  + ]:            204 :         if (haveTblspcMap)
                               5761                 :                :         {
                               5762                 :              2 :             unlink(TABLESPACE_MAP_OLD);
                               5763                 :              2 :             durable_rename(TABLESPACE_MAP, TABLESPACE_MAP_OLD, FATAL);
                               5764                 :                :         }
                               5765                 :                : 
                               5766                 :                :         /*
                               5767                 :                :          * Initialize our local copy of minRecoveryPoint.  When doing crash
                               5768                 :                :          * recovery we want to replay up to the end of WAL.  Particularly, in
                               5769                 :                :          * the case of a promoted standby minRecoveryPoint value in the
                               5770                 :                :          * control file is only updated after the first checkpoint.  However,
                               5771                 :                :          * if the instance crashes before the first post-recovery checkpoint
                               5772                 :                :          * is completed then recovery will use a stale location causing the
                               5773                 :                :          * startup process to think that there are still invalid page
                               5774                 :                :          * references when checking for data consistency.
                               5775                 :                :          */
 2620 michael@paquier.xyz      5776         [ +  + ]:            204 :         if (InArchiveRecovery)
                               5777                 :                :         {
 1298 heikki.linnakangas@i     5778                 :            106 :             LocalMinRecoveryPoint = ControlFile->minRecoveryPoint;
                               5779                 :            106 :             LocalMinRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
                               5780                 :                :         }
                               5781                 :                :         else
                               5782                 :                :         {
                               5783                 :             98 :             LocalMinRecoveryPoint = InvalidXLogRecPtr;
                               5784                 :             98 :             LocalMinRecoveryPointTLI = 0;
                               5785                 :                :         }
                               5786                 :                : 
                               5787                 :                :         /* Check that the GUCs used to generate the WAL allow recovery */
 5610                          5788                 :            204 :         CheckRequiredParameterValues();
                               5789                 :                : 
                               5790                 :                :         /*
                               5791                 :                :          * We're in recovery, so unlogged relations may be trashed and must be
                               5792                 :                :          * reset.  This should be done BEFORE allowing Hot Standby
                               5793                 :                :          * connections, so that read-only backends don't try to read whatever
                               5794                 :                :          * garbage is left over from before.
                               5795                 :                :          */
 5365 rhaas@postgresql.org     5796                 :            204 :         ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP);
                               5797                 :                : 
                               5798                 :                :         /*
                               5799                 :                :          * Likewise, delete any saved transaction snapshot files that got left
                               5800                 :                :          * behind by crashed backends.
                               5801                 :                :          */
 5068 tgl@sss.pgh.pa.us        5802                 :            204 :         DeleteAllExportedSnapshotFiles();
                               5803                 :                : 
                               5804                 :                :         /*
                               5805                 :                :          * Initialize for Hot Standby, if enabled. We won't let backends in
                               5806                 :                :          * yet, not until we've reached the min recovery point specified in
                               5807                 :                :          * control file and we've established a recovery snapshot from a
                               5808                 :                :          * running-xacts WAL record.
                               5809                 :                :          */
 4579 heikki.linnakangas@i     5810   [ +  +  +  + ]:            204 :         if (ArchiveRecoveryRequested && EnableHotStandby)
                               5811                 :                :         {
                               5812                 :                :             TransactionId *xids;
                               5813                 :                :             int         nxids;
                               5814                 :                : 
 5685                          5815         [ +  + ]:            100 :             ereport(DEBUG1,
                               5816                 :                :                     (errmsg_internal("initializing for hot standby")));
                               5817                 :                : 
 5740 simon@2ndQuadrant.co     5818                 :            100 :             InitRecoveryTransactionEnvironment();
                               5819                 :                : 
                               5820         [ +  + ]:            100 :             if (wasShutdown)
                               5821                 :             26 :                 oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
                               5822                 :                :             else
                               5823                 :             74 :                 oldestActiveXID = checkPoint.oldestActiveXid;
                               5824         [ -  + ]:            100 :             Assert(TransactionIdIsValid(oldestActiveXID));
                               5825                 :                : 
                               5826                 :                :             /* Tell procarray about the range of xids it has to deal with */
  638 heikki.linnakangas@i     5827                 :            100 :             ProcArrayInitRecovery(XidFromFullTransactionId(TransamVariables->nextXid));
                               5828                 :                : 
                               5829                 :                :             /*
                               5830                 :                :              * Startup subtrans only.  CLOG, MultiXact and commit timestamp
                               5831                 :                :              * have already been started up and other SLRUs are not maintained
                               5832                 :                :              * during recovery and need not be started yet.
                               5833                 :                :              */
 5740 simon@2ndQuadrant.co     5834                 :            100 :             StartupSUBTRANS(oldestActiveXID);
                               5835                 :                : 
                               5836                 :                :             /*
                               5837                 :                :              * If we're beginning at a shutdown checkpoint, we know that
                               5838                 :                :              * nothing was running on the primary at this point. So fake-up an
                               5839                 :                :              * empty running-xacts record and use that here and now. Recover
                               5840                 :                :              * additional standby state for prepared transactions.
                               5841                 :                :              */
 5625 heikki.linnakangas@i     5842         [ +  + ]:            100 :             if (wasShutdown)
                               5843                 :                :             {
                               5844                 :                :                 RunningTransactionsData running;
                               5845                 :                :                 TransactionId latestCompletedXid;
                               5846                 :                : 
                               5847                 :                :                 /* Update pg_subtrans entries for any prepared transactions */
  436                          5848                 :             26 :                 StandbyRecoverPreparedTransactions();
                               5849                 :                : 
                               5850                 :                :                 /*
                               5851                 :                :                  * Construct a RunningTransactions snapshot representing a
                               5852                 :                :                  * shut down server, with only prepared transactions still
                               5853                 :                :                  * alive. We're never overflowed at this point because all
                               5854                 :                :                  * subxids are listed with their parent prepared transactions.
                               5855                 :                :                  */
 5625                          5856                 :             26 :                 running.xcnt = nxids;
 4661 simon@2ndQuadrant.co     5857                 :             26 :                 running.subxcnt = 0;
  436 heikki.linnakangas@i     5858                 :             26 :                 running.subxid_status = SUBXIDS_IN_SUBTRANS;
 1852 andres@anarazel.de       5859                 :             26 :                 running.nextXid = XidFromFullTransactionId(checkPoint.nextXid);
 5625 heikki.linnakangas@i     5860                 :             26 :                 running.oldestRunningXid = oldestActiveXID;
 1852 andres@anarazel.de       5861                 :             26 :                 latestCompletedXid = XidFromFullTransactionId(checkPoint.nextXid);
 5595 simon@2ndQuadrant.co     5862         [ -  + ]:             26 :                 TransactionIdRetreat(latestCompletedXid);
 5594                          5863         [ -  + ]:             26 :                 Assert(TransactionIdIsNormal(latestCompletedXid));
 5595                          5864                 :             26 :                 running.latestCompletedXid = latestCompletedXid;
 5625 heikki.linnakangas@i     5865                 :             26 :                 running.xids = xids;
                               5866                 :                : 
                               5867                 :             26 :                 ProcArrayApplyRecoveryInfo(&running);
                               5868                 :                :             }
                               5869                 :                :         }
                               5870                 :                : 
                               5871                 :                :         /*
                               5872                 :                :          * We're all set for replaying the WAL now. Do it.
                               5873                 :                :          */
 1298                          5874                 :            204 :         PerformWalRecovery();
                               5875                 :            149 :         performedWalRecovery = true;
                               5876                 :                :     }
                               5877                 :                :     else
 1294                          5878                 :            683 :         performedWalRecovery = false;
                               5879                 :                : 
                               5880                 :                :     /*
                               5881                 :                :      * Finish WAL recovery.
                               5882                 :                :      */
 1298                          5883                 :            832 :     endOfRecoveryInfo = FinishWalRecovery();
                               5884                 :            832 :     EndOfLog = endOfRecoveryInfo->endOfLog;
                               5885                 :            832 :     EndOfLogTLI = endOfRecoveryInfo->endOfLogTLI;
                               5886                 :            832 :     abortedRecPtr = endOfRecoveryInfo->abortedRecPtr;
                               5887                 :            832 :     missingContrecPtr = endOfRecoveryInfo->missingContrecPtr;
                               5888                 :                : 
                               5889                 :                :     /*
                               5890                 :                :      * Reset ps status display, so as no information related to recovery shows
                               5891                 :                :      * up.
                               5892                 :                :      */
 1080 michael@paquier.xyz      5893                 :            832 :     set_ps_display("");
                               5894                 :                : 
                               5895                 :                :     /*
                               5896                 :                :      * When recovering from a backup (we are in recovery, and archive recovery
                               5897                 :                :      * was requested), complain if we did not roll forward far enough to reach
                               5898                 :                :      * the point where the database is consistent.  For regular online
                               5899                 :                :      * backup-from-primary, that means reaching the end-of-backup WAL record
                               5900                 :                :      * (at which point we reset backupStartPoint to be Invalid), for
                               5901                 :                :      * backup-from-replica (which can't inject records into the WAL stream),
                               5902                 :                :      * that point is when we reach the minRecoveryPoint in pg_control (which
                               5903                 :                :      * we purposefully copy last when backing up from a replica).  For
                               5904                 :                :      * pg_rewind (which creates a backup_label with a method of "pg_rewind")
                               5905                 :                :      * or snapshot-style backups (which don't), backupEndRequired will be set
                               5906                 :                :      * to false.
                               5907                 :                :      *
                               5908                 :                :      * Note: it is indeed okay to look at the local variable
                               5909                 :                :      * LocalMinRecoveryPoint here, even though ControlFile->minRecoveryPoint
                               5910                 :                :      * might be further ahead --- ControlFile->minRecoveryPoint cannot have
                               5911                 :                :      * been advanced beyond the WAL we processed.
                               5912                 :                :      */
 5274 heikki.linnakangas@i     5913         [ +  + ]:            832 :     if (InRecovery &&
 1298                          5914         [ +  - ]:            149 :         (EndOfLog < LocalMinRecoveryPoint ||
 5724                          5915         [ -  + ]:            149 :          !XLogRecPtrIsInvalid(ControlFile->backupStartPoint)))
                               5916                 :                :     {
                               5917                 :                :         /*
                               5918                 :                :          * Ran off end of WAL before reaching end-of-backup WAL record, or
                               5919                 :                :          * minRecoveryPoint. That's a bad sign, indicating that you tried to
                               5920                 :                :          * recover from an online backup but never called pg_backup_stop(), or
                               5921                 :                :          * you didn't archive all the WAL needed.
                               5922                 :                :          */
 4579 heikki.linnakangas@i     5923   [ #  #  #  # ]:UBC           0 :         if (ArchiveRecoveryRequested || ControlFile->backupEndRequired)
                               5924                 :                :         {
 1249 sfrost@snowman.net       5925   [ #  #  #  # ]:              0 :             if (!XLogRecPtrIsInvalid(ControlFile->backupStartPoint) || ControlFile->backupEndRequired)
 5141 heikki.linnakangas@i     5926         [ #  # ]:              0 :                 ereport(FATAL,
                               5927                 :                :                         (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               5928                 :                :                          errmsg("WAL ends before end of online backup"),
                               5929                 :                :                          errhint("All WAL generated while online backup was taken must be available at recovery.")));
                               5930                 :                :             else
 5260                          5931         [ #  # ]:              0 :                 ereport(FATAL,
                               5932                 :                :                         (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               5933                 :                :                          errmsg("WAL ends before consistent recovery point")));
                               5934                 :                :         }
                               5935                 :                :     }
                               5936                 :                : 
                               5937                 :                :     /*
                               5938                 :                :      * Reset unlogged relations to the contents of their INIT fork. This is
                               5939                 :                :      * done AFTER recovery is complete so as to include any unlogged relations
                               5940                 :                :      * created during recovery, but BEFORE recovery is marked as having
                               5941                 :                :      * completed successfully. Otherwise we'd not retry if any of the post
                               5942                 :                :      * end-of-recovery steps fail.
                               5943                 :                :      */
 1298 heikki.linnakangas@i     5944         [ +  + ]:CBC         832 :     if (InRecovery)
                               5945                 :            149 :         ResetUnloggedRelations(UNLOGGED_RELATION_INIT);
                               5946                 :                : 
                               5947                 :                :     /*
                               5948                 :                :      * Pre-scan prepared transactions to find out the range of XIDs present.
                               5949                 :                :      * This information is not quite needed yet, but it is positioned here so
                               5950                 :                :      * as potential problems are detected before any on-disk change is done.
                               5951                 :                :      */
 2616 michael@paquier.xyz      5952                 :            832 :     oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
                               5953                 :                : 
                               5954                 :                :     /*
                               5955                 :                :      * Allow ordinary WAL segment creation before possibly switching to a new
                               5956                 :                :      * timeline, which creates a new segment, and after the last ReadRecord().
                               5957                 :                :      */
 1116                          5958                 :            832 :     SetInstallXLogFileSegmentActive();
                               5959                 :                : 
                               5960                 :                :     /*
                               5961                 :                :      * Consider whether we need to assign a new timeline ID.
                               5962                 :                :      *
                               5963                 :                :      * If we did archive recovery, we always assign a new ID.  This handles a
                               5964                 :                :      * couple of issues.  If we stopped short of the end of WAL during
                               5965                 :                :      * recovery, then we are clearly generating a new timeline and must assign
                               5966                 :                :      * it a unique new ID.  Even if we ran to the end, modifying the current
                               5967                 :                :      * last segment is problematic because it may result in trying to
                               5968                 :                :      * overwrite an already-archived copy of that segment, and we encourage
                               5969                 :                :      * DBAs to make their archive_commands reject that.  We can dodge the
                               5970                 :                :      * problem by making the new active segment have a new timeline ID.
                               5971                 :                :      *
                               5972                 :                :      * In a normal crash recovery, we can just extend the timeline we were in.
                               5973                 :                :      */
 1298 heikki.linnakangas@i     5974                 :            832 :     newTLI = endOfRecoveryInfo->lastRecTLI;
 4579                          5975         [ +  + ]:            832 :     if (ArchiveRecoveryRequested)
                               5976                 :                :     {
 1396 rhaas@postgresql.org     5977                 :             47 :         newTLI = findNewestTimeLine(recoveryTargetTLI) + 1;
 7717 tgl@sss.pgh.pa.us        5978         [ +  - ]:             47 :         ereport(LOG,
                               5979                 :                :                 (errmsg("selected new timeline ID: %u", newTLI)));
                               5980                 :                : 
                               5981                 :                :         /*
                               5982                 :                :          * Make a writable copy of the last WAL segment.  (Note that we also
                               5983                 :                :          * have a copy of the last block of the old WAL in
                               5984                 :                :          * endOfRecovery->lastPage; we will use that below.)
                               5985                 :                :          */
 1298 heikki.linnakangas@i     5986                 :             47 :         XLogInitNewTimeline(EndOfLogTLI, EndOfLog, newTLI);
                               5987                 :                : 
                               5988                 :                :         /*
                               5989                 :                :          * Remove the signal files out of the way, so that we don't
                               5990                 :                :          * accidentally re-enter archive recovery mode in a subsequent crash.
                               5991                 :                :          */
                               5992         [ +  + ]:             47 :         if (endOfRecoveryInfo->standby_signal_file_found)
                               5993                 :             44 :             durable_unlink(STANDBY_SIGNAL_FILE, FATAL);
                               5994                 :                : 
                               5995         [ +  + ]:             47 :         if (endOfRecoveryInfo->recovery_signal_file_found)
                               5996                 :              3 :             durable_unlink(RECOVERY_SIGNAL_FILE, FATAL);
                               5997                 :                : 
                               5998                 :                :         /*
                               5999                 :                :          * Write the timeline history file, and have it archived. After this
                               6000                 :                :          * point (or rather, as soon as the file is archived), the timeline
                               6001                 :                :          * will appear as "taken" in the WAL archive and to any standby
                               6002                 :                :          * servers.  If we crash before actually switching to the new
                               6003                 :                :          * timeline, standby servers will nevertheless think that we switched
                               6004                 :                :          * to the new timeline, and will try to connect to the new timeline.
                               6005                 :                :          * To minimize the window for that, try to do as little as possible
                               6006                 :                :          * between here and writing the end-of-recovery record.
                               6007                 :                :          */
 1396 rhaas@postgresql.org     6008                 :             47 :         writeTimeLineHistory(newTLI, recoveryTargetTLI,
                               6009                 :                :                              EndOfLog, endOfRecoveryInfo->recoveryStopReason);
                               6010                 :                : 
 1298 heikki.linnakangas@i     6011         [ +  - ]:             47 :         ereport(LOG,
                               6012                 :                :                 (errmsg("archive recovery complete")));
                               6013                 :                :     }
                               6014                 :                : 
                               6015                 :                :     /* Save the selected TimeLineID in shared memory, too */
  407 rhaas@postgresql.org     6016         [ -  + ]:            832 :     SpinLockAcquire(&XLogCtl->info_lck);
 1396                          6017                 :            832 :     XLogCtl->InsertTimeLineID = newTLI;
 1298 heikki.linnakangas@i     6018                 :            832 :     XLogCtl->PrevTimeLineID = endOfRecoveryInfo->lastRecTLI;
  407 rhaas@postgresql.org     6019                 :            832 :     SpinLockRelease(&XLogCtl->info_lck);
                               6020                 :                : 
                               6021                 :                :     /*
                               6022                 :                :      * Actually, if WAL ended in an incomplete record, skip the parts that
                               6023                 :                :      * made it through and start writing after the portion that persisted.
                               6024                 :                :      * (It's critical to first write an OVERWRITE_CONTRECORD message, which
                               6025                 :                :      * we'll do as soon as we're open for writing new WAL.)
                               6026                 :                :      */
 1438 alvherre@alvh.no-ip.     6027         [ +  + ]:            832 :     if (!XLogRecPtrIsInvalid(missingContrecPtr))
                               6028                 :                :     {
                               6029                 :                :         /*
                               6030                 :                :          * We should only have a missingContrecPtr if we're not switching to a
                               6031                 :                :          * new timeline. When a timeline switch occurs, WAL is copied from the
                               6032                 :                :          * old timeline to the new only up to the end of the last complete
                               6033                 :                :          * record, so there can't be an incomplete WAL record that we need to
                               6034                 :                :          * disregard.
                               6035                 :                :          */
 1104 rhaas@postgresql.org     6036         [ -  + ]:             11 :         Assert(newTLI == endOfRecoveryInfo->lastRecTLI);
 1438 alvherre@alvh.no-ip.     6037         [ -  + ]:             11 :         Assert(!XLogRecPtrIsInvalid(abortedRecPtr));
                               6038                 :             11 :         EndOfLog = missingContrecPtr;
                               6039                 :                :     }
                               6040                 :                : 
                               6041                 :                :     /*
                               6042                 :                :      * Prepare to write WAL starting at EndOfLog location, and init xlog
                               6043                 :                :      * buffer cache using the block containing the last record from the
                               6044                 :                :      * previous incarnation.
                               6045                 :                :      */
 9079 vadim4o@yahoo.com        6046                 :            832 :     Insert = &XLogCtl->Insert;
 1298 heikki.linnakangas@i     6047                 :            832 :     Insert->PrevBytePos = XLogRecPtrToBytePos(endOfRecoveryInfo->lastRec);
 4434                          6048                 :            832 :     Insert->CurrBytePos = XLogRecPtrToBytePos(EndOfLog);
                               6049                 :                : 
                               6050                 :                :     /*
                               6051                 :                :      * Tricky point here: lastPage contains the *last* block that the LastRec
                               6052                 :                :      * record spans, not the one it starts in.  The last block is indeed the
                               6053                 :                :      * one we want to use.
                               6054                 :                :      */
                               6055         [ +  + ]:            832 :     if (EndOfLog % XLOG_BLCKSZ != 0)
                               6056                 :                :     {
                               6057                 :                :         char       *page;
                               6058                 :                :         int         len;
                               6059                 :                :         int         firstIdx;
                               6060                 :                : 
                               6061                 :            804 :         firstIdx = XLogRecPtrToBufIdx(EndOfLog);
 1298                          6062                 :            804 :         len = EndOfLog - endOfRecoveryInfo->lastPageBeginPtr;
                               6063         [ -  + ]:            804 :         Assert(len < XLOG_BLCKSZ);
                               6064                 :                : 
                               6065                 :                :         /* Copy the valid part of the last block, and zero the rest */
 4434                          6066                 :            804 :         page = &XLogCtl->pages[firstIdx * XLOG_BLCKSZ];
 1298                          6067                 :            804 :         memcpy(page, endOfRecoveryInfo->lastPage, len);
 4434                          6068                 :            804 :         memset(page + len, 0, XLOG_BLCKSZ - len);
                               6069                 :                : 
  627 jdavis@postgresql.or     6070                 :            804 :         pg_atomic_write_u64(&XLogCtl->xlblocks[firstIdx], endOfRecoveryInfo->lastPageBeginPtr + XLOG_BLCKSZ);
   15 akorotkov@postgresql     6071                 :            804 :         XLogCtl->InitializedUpTo = endOfRecoveryInfo->lastPageBeginPtr + XLOG_BLCKSZ;
                               6072                 :                :     }
                               6073                 :                :     else
                               6074                 :                :     {
                               6075                 :                :         /*
                               6076                 :                :          * There is no partial block to copy. Just set InitializedUpTo, and
                               6077                 :                :          * let the first attempt to insert a log record to initialize the next
                               6078                 :                :          * buffer.
                               6079                 :                :          */
                               6080                 :             28 :         XLogCtl->InitializedUpTo = EndOfLog;
                               6081                 :                :     }
                               6082                 :                : 
                               6083                 :                :     /*
                               6084                 :                :      * Update local and shared status.  This is OK to do without any locks
                               6085                 :                :      * because no other process can be reading or writing WAL yet.
                               6086                 :                :      */
 4434 heikki.linnakangas@i     6087                 :            832 :     LogwrtResult.Write = LogwrtResult.Flush = EndOfLog;
  517 alvherre@alvh.no-ip.     6088                 :            832 :     pg_atomic_write_u64(&XLogCtl->logInsertResult, EndOfLog);
  519                          6089                 :            832 :     pg_atomic_write_u64(&XLogCtl->logWriteResult, EndOfLog);
                               6090                 :            832 :     pg_atomic_write_u64(&XLogCtl->logFlushResult, EndOfLog);
 4434 heikki.linnakangas@i     6091                 :            832 :     XLogCtl->LogwrtRqst.Write = EndOfLog;
                               6092                 :            832 :     XLogCtl->LogwrtRqst.Flush = EndOfLog;
                               6093                 :                : 
                               6094                 :                :     /*
                               6095                 :                :      * Preallocate additional log files, if wanted.
                               6096                 :                :      */
 1396 rhaas@postgresql.org     6097                 :            832 :     PreallocXlogFiles(EndOfLog, newTLI);
                               6098                 :                : 
                               6099                 :                :     /*
                               6100                 :                :      * Okay, we're officially UP.
                               6101                 :                :      */
 9079 vadim4o@yahoo.com        6102                 :            832 :     InRecovery = false;
                               6103                 :                : 
                               6104                 :                :     /* start the archive_timeout timer and LSN running */
 4434 heikki.linnakangas@i     6105                 :            832 :     XLogCtl->lastSegSwitchTime = (pg_time_t) time(NULL);
 3180 andres@anarazel.de       6106                 :            832 :     XLogCtl->lastSegSwitchLSN = EndOfLog;
                               6107                 :                : 
                               6108                 :                :     /* also initialize latestCompletedXid, to nextXid - 1 */
 4961 tgl@sss.pgh.pa.us        6109                 :            832 :     LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
  638 heikki.linnakangas@i     6110                 :            832 :     TransamVariables->latestCompletedXid = TransamVariables->nextXid;
                               6111                 :            832 :     FullTransactionIdRetreat(&TransamVariables->latestCompletedXid);
 4961 tgl@sss.pgh.pa.us        6112                 :            832 :     LWLockRelease(ProcArrayLock);
                               6113                 :                : 
                               6114                 :                :     /*
                               6115                 :                :      * Start up subtrans, if not already done for hot standby.  (commit
                               6116                 :                :      * timestamps are started below, if necessary.)
                               6117                 :                :      */
 5740 simon@2ndQuadrant.co     6118         [ +  + ]:            832 :     if (standbyState == STANDBY_DISABLED)
                               6119                 :            785 :         StartupSUBTRANS(oldestActiveXID);
                               6120                 :                : 
                               6121                 :                :     /*
                               6122                 :                :      * Perform end of recovery actions for any SLRUs that need it.
                               6123                 :                :      */
 5057                          6124                 :            832 :     TrimCLOG();
 4299 alvherre@alvh.no-ip.     6125                 :            832 :     TrimMultiXact();
                               6126                 :                : 
                               6127                 :                :     /*
                               6128                 :                :      * Reload shared-memory state for prepared transactions.  This needs to
                               6129                 :                :      * happen before renaming the last partial segment of the old timeline as
                               6130                 :                :      * it may be possible that we have to recover some transactions from it.
                               6131                 :                :      */
 7386 tgl@sss.pgh.pa.us        6132                 :            832 :     RecoverPreparedTransactions();
                               6133                 :                : 
                               6134                 :                :     /* Shut down xlogreader */
 1298 heikki.linnakangas@i     6135                 :            832 :     ShutdownWalRecovery();
                               6136                 :                : 
                               6137                 :                :     /* Enable WAL writes for this backend only. */
 1423 rhaas@postgresql.org     6138                 :            832 :     LocalSetXLogInsertAllowed();
                               6139                 :                : 
                               6140                 :                :     /* If necessary, write overwrite-contrecord before doing anything else */
                               6141         [ +  + ]:            832 :     if (!XLogRecPtrIsInvalid(abortedRecPtr))
                               6142                 :                :     {
                               6143         [ -  + ]:             11 :         Assert(!XLogRecPtrIsInvalid(missingContrecPtr));
 1298 heikki.linnakangas@i     6144                 :             11 :         CreateOverwriteContrecordRecord(abortedRecPtr, missingContrecPtr, newTLI);
                               6145                 :                :     }
                               6146                 :                : 
                               6147                 :                :     /*
                               6148                 :                :      * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE
                               6149                 :                :      * record before resource manager writes cleanup WAL records or checkpoint
                               6150                 :                :      * record is written.
                               6151                 :                :      */
 1423 rhaas@postgresql.org     6152                 :            832 :     Insert->fullPageWrites = lastFullPageWrites;
                               6153                 :            832 :     UpdateFullPageWrites();
                               6154                 :                : 
                               6155                 :                :     /*
                               6156                 :                :      * Emit checkpoint or end-of-recovery record in XLOG, if required.
                               6157                 :                :      */
 1298 heikki.linnakangas@i     6158         [ +  + ]:            832 :     if (performedWalRecovery)
 1423 rhaas@postgresql.org     6159                 :            149 :         promoted = PerformRecoveryXLogAction();
                               6160                 :                : 
                               6161                 :                :     /*
                               6162                 :                :      * If any of the critical GUCs have changed, log them before we allow
                               6163                 :                :      * backends to write WAL.
                               6164                 :                :      */
 5610 heikki.linnakangas@i     6165                 :            832 :     XLogReportParameters();
                               6166                 :                : 
                               6167                 :                :     /* If this is archive recovery, perform post-recovery cleanup actions. */
 1412 rhaas@postgresql.org     6168         [ +  + ]:            832 :     if (ArchiveRecoveryRequested)
 1396                          6169                 :             47 :         CleanupAfterArchiveRecovery(EndOfLogTLI, EndOfLog, newTLI);
                               6170                 :                : 
                               6171                 :                :     /*
                               6172                 :                :      * Local WAL inserts enabled, so it's time to finish initialization of
                               6173                 :                :      * commit timestamp.
                               6174                 :                :      */
 3930 alvherre@alvh.no-ip.     6175                 :            832 :     CompleteCommitTsInitialization();
                               6176                 :                : 
                               6177                 :                :     /* Clean up EndOfWalRecoveryInfo data to appease Valgrind leak checking */
   35 tgl@sss.pgh.pa.us        6178         [ +  + ]:GNC         832 :     if (endOfRecoveryInfo->lastPage)
                               6179                 :            815 :         pfree(endOfRecoveryInfo->lastPage);
                               6180                 :            832 :     pfree(endOfRecoveryInfo->recoveryStopReason);
                               6181                 :            832 :     pfree(endOfRecoveryInfo);
                               6182                 :                : 
                               6183                 :                :     /*
                               6184                 :                :      * All done with end-of-recovery actions.
                               6185                 :                :      *
                               6186                 :                :      * Now allow backends to write WAL and update the control file status in
                               6187                 :                :      * consequence.  SharedRecoveryState, that controls if backends can write
                               6188                 :                :      * WAL, is updated while holding ControlFileLock to prevent other backends
                               6189                 :                :      * to look at an inconsistent state of the control file in shared memory.
                               6190                 :                :      * There is still a small window during which backends can write WAL and
                               6191                 :                :      * the control file is still referring to a system not in DB_IN_PRODUCTION
                               6192                 :                :      * state while looking at the on-disk control file.
                               6193                 :                :      *
                               6194                 :                :      * Also, we use info_lck to update SharedRecoveryState to ensure that
                               6195                 :                :      * there are no race conditions concerning visibility of other recent
                               6196                 :                :      * updates to shared memory.
                               6197                 :                :      */
 3329 peter_e@gmx.net          6198                 :CBC         832 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               6199                 :            832 :     ControlFile->state = DB_IN_PRODUCTION;
                               6200                 :                : 
 4002 andres@anarazel.de       6201         [ -  + ]:            832 :     SpinLockAcquire(&XLogCtl->info_lck);
 1961 michael@paquier.xyz      6202                 :            832 :     XLogCtl->SharedRecoveryState = RECOVERY_STATE_DONE;
 4002 andres@anarazel.de       6203                 :            832 :     SpinLockRelease(&XLogCtl->info_lck);
                               6204                 :                : 
 3329 peter_e@gmx.net          6205                 :            832 :     UpdateControlFile();
                               6206                 :            832 :     LWLockRelease(ControlFileLock);
                               6207                 :                : 
                               6208                 :                :     /*
                               6209                 :                :      * Shutdown the recovery environment.  This must occur after
                               6210                 :                :      * RecoverPreparedTransactions() (see notes in lock_twophase_recover())
                               6211                 :                :      * and after switching SharedRecoveryState to RECOVERY_STATE_DONE so as
                               6212                 :                :      * any session building a snapshot will not rely on KnownAssignedXids as
                               6213                 :                :      * RecoveryInProgress() would return false at this stage.  This is
                               6214                 :                :      * particularly critical for prepared 2PC transactions, that would still
                               6215                 :                :      * need to be included in snapshots once recovery has ended.
                               6216                 :                :      */
 1433 michael@paquier.xyz      6217         [ +  + ]:            832 :     if (standbyState != STANDBY_DISABLED)
                               6218                 :             47 :         ShutdownRecoveryTransactionEnvironment();
                               6219                 :                : 
                               6220                 :                :     /*
                               6221                 :                :      * If there were cascading standby servers connected to us, nudge any wal
                               6222                 :                :      * sender processes to notice that we've been promoted.
                               6223                 :                :      */
  882 andres@anarazel.de       6224                 :            832 :     WalSndWakeup(true, true);
                               6225                 :                : 
                               6226                 :                :     /*
                               6227                 :                :      * If this was a promotion, request an (online) checkpoint now. This isn't
                               6228                 :                :      * required for consistency, but the last restartpoint might be far back,
                               6229                 :                :      * and in case of a crash, recovering from it might take a longer than is
                               6230                 :                :      * appropriate now that we're not in standby mode anymore.
                               6231                 :                :      */
 1865 fujii@postgresql.org     6232         [ +  + ]:            832 :     if (promoted)
 4491 simon@2ndQuadrant.co     6233                 :             40 :         RequestCheckpoint(CHECKPOINT_FORCE);
 6044 heikki.linnakangas@i     6234                 :            832 : }
                               6235                 :                : 
                               6236                 :                : /*
                               6237                 :                :  * Callback from PerformWalRecovery(), called when we switch from crash
                               6238                 :                :  * recovery to archive recovery mode.  Updates the control file accordingly.
                               6239                 :                :  */
                               6240                 :                : void
 1298                          6241                 :              2 : SwitchIntoArchiveRecovery(XLogRecPtr EndRecPtr, TimeLineID replayTLI)
                               6242                 :                : {
                               6243                 :                :     /* initialize minRecoveryPoint to this record */
                               6244                 :              2 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               6245                 :              2 :     ControlFile->state = DB_IN_ARCHIVE_RECOVERY;
                               6246         [ +  - ]:              2 :     if (ControlFile->minRecoveryPoint < EndRecPtr)
                               6247                 :                :     {
                               6248                 :              2 :         ControlFile->minRecoveryPoint = EndRecPtr;
                               6249                 :              2 :         ControlFile->minRecoveryPointTLI = replayTLI;
                               6250                 :                :     }
                               6251                 :                :     /* update local copy */
                               6252                 :              2 :     LocalMinRecoveryPoint = ControlFile->minRecoveryPoint;
                               6253                 :              2 :     LocalMinRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
                               6254                 :                : 
                               6255                 :                :     /*
                               6256                 :                :      * The startup process can update its local copy of minRecoveryPoint from
                               6257                 :                :      * this point.
                               6258                 :                :      */
                               6259                 :              2 :     updateMinRecoveryPoint = true;
                               6260                 :                : 
                               6261                 :              2 :     UpdateControlFile();
                               6262                 :                : 
                               6263                 :                :     /*
                               6264                 :                :      * We update SharedRecoveryState while holding the lock on ControlFileLock
                               6265                 :                :      * so both states are consistent in shared memory.
                               6266                 :                :      */
                               6267         [ -  + ]:              2 :     SpinLockAcquire(&XLogCtl->info_lck);
                               6268                 :              2 :     XLogCtl->SharedRecoveryState = RECOVERY_STATE_ARCHIVE;
                               6269                 :              2 :     SpinLockRelease(&XLogCtl->info_lck);
                               6270                 :                : 
                               6271                 :              2 :     LWLockRelease(ControlFileLock);
                               6272                 :              2 : }
                               6273                 :                : 
                               6274                 :                : /*
                               6275                 :                :  * Callback from PerformWalRecovery(), called when we reach the end of backup.
                               6276                 :                :  * Updates the control file accordingly.
                               6277                 :                :  */
                               6278                 :                : void
                               6279                 :             69 : ReachedEndOfBackup(XLogRecPtr EndRecPtr, TimeLineID tli)
                               6280                 :                : {
                               6281                 :                :     /*
                               6282                 :                :      * We have reached the end of base backup, as indicated by pg_control. The
                               6283                 :                :      * data on disk is now consistent (unless minRecoveryPoint is further
                               6284                 :                :      * ahead, which can happen if we crashed during previous recovery).  Reset
                               6285                 :                :      * backupStartPoint and backupEndPoint, and update minRecoveryPoint to
                               6286                 :                :      * make sure we don't allow starting up at an earlier point even if
                               6287                 :                :      * recovery is stopped and restarted soon after this.
                               6288                 :                :      */
                               6289                 :             69 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               6290                 :                : 
                               6291         [ +  + ]:             69 :     if (ControlFile->minRecoveryPoint < EndRecPtr)
                               6292                 :                :     {
                               6293                 :             65 :         ControlFile->minRecoveryPoint = EndRecPtr;
                               6294                 :             65 :         ControlFile->minRecoveryPointTLI = tli;
                               6295                 :                :     }
                               6296                 :                : 
                               6297                 :             69 :     ControlFile->backupStartPoint = InvalidXLogRecPtr;
                               6298                 :             69 :     ControlFile->backupEndPoint = InvalidXLogRecPtr;
                               6299                 :             69 :     ControlFile->backupEndRequired = false;
                               6300                 :             69 :     UpdateControlFile();
                               6301                 :                : 
                               6302                 :             69 :     LWLockRelease(ControlFileLock);
 5625                          6303                 :             69 : }
                               6304                 :                : 
                               6305                 :                : /*
                               6306                 :                :  * Perform whatever XLOG actions are necessary at end of REDO.
                               6307                 :                :  *
                               6308                 :                :  * The goal here is to make sure that we'll be able to recover properly if
                               6309                 :                :  * we crash again. If we choose to write a checkpoint, we'll write a shutdown
                               6310                 :                :  * checkpoint rather than an on-line one. This is not particularly critical,
                               6311                 :                :  * but since we may be assigning a new TLI, using a shutdown checkpoint allows
                               6312                 :                :  * us to have the rule that TLI only changes in shutdown checkpoints, which
                               6313                 :                :  * allows some extra error checking in xlog_redo.
                               6314                 :                :  */
                               6315                 :                : static bool
 1424 rhaas@postgresql.org     6316                 :            149 : PerformRecoveryXLogAction(void)
                               6317                 :                : {
                               6318                 :            149 :     bool        promoted = false;
                               6319                 :                : 
                               6320                 :                :     /*
                               6321                 :                :      * Perform a checkpoint to update all our recovery activity to disk.
                               6322                 :                :      *
                               6323                 :                :      * Note that we write a shutdown checkpoint rather than an on-line one.
                               6324                 :                :      * This is not particularly critical, but since we may be assigning a new
                               6325                 :                :      * TLI, using a shutdown checkpoint allows us to have the rule that TLI
                               6326                 :                :      * only changes in shutdown checkpoints, which allows some extra error
                               6327                 :                :      * checking in xlog_redo.
                               6328                 :                :      *
                               6329                 :                :      * In promotion, only create a lightweight end-of-recovery record instead
                               6330                 :                :      * of a full checkpoint. A checkpoint is requested later, after we're
                               6331                 :                :      * fully out of recovery mode and already accepting queries.
                               6332                 :                :      */
                               6333   [ +  +  +  -  :            196 :     if (ArchiveRecoveryRequested && IsUnderPostmaster &&
                                              +  + ]
 1298 heikki.linnakangas@i     6334                 :             47 :         PromoteIsTriggered())
                               6335                 :                :     {
 1424 rhaas@postgresql.org     6336                 :             40 :         promoted = true;
                               6337                 :                : 
                               6338                 :                :         /*
                               6339                 :                :          * Insert a special WAL record to mark the end of recovery, since we
                               6340                 :                :          * aren't doing a checkpoint. That means that the checkpointer process
                               6341                 :                :          * may likely be in the middle of a time-smoothed restartpoint and
                               6342                 :                :          * could continue to be for minutes after this.  That sounds strange,
                               6343                 :                :          * but the effect is roughly the same and it would be stranger to try
                               6344                 :                :          * to come out of the restartpoint and then checkpoint. We request a
                               6345                 :                :          * checkpoint later anyway, just for safety.
                               6346                 :                :          */
                               6347                 :             40 :         CreateEndOfRecoveryRecord();
                               6348                 :                :     }
                               6349                 :                :     else
                               6350                 :                :     {
                               6351                 :            109 :         RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
                               6352                 :                :                           CHECKPOINT_FAST |
                               6353                 :                :                           CHECKPOINT_WAIT);
                               6354                 :                :     }
                               6355                 :                : 
                               6356                 :            149 :     return promoted;
                               6357                 :                : }
                               6358                 :                : 
                               6359                 :                : /*
                               6360                 :                :  * Is the system still in recovery?
                               6361                 :                :  *
                               6362                 :                :  * Unlike testing InRecovery, this works in any process that's connected to
                               6363                 :                :  * shared memory.
                               6364                 :                :  */
                               6365                 :                : bool
 6044 heikki.linnakangas@i     6366                 :       59737666 : RecoveryInProgress(void)
                               6367                 :                : {
                               6368                 :                :     /*
                               6369                 :                :      * We check shared state each time only until we leave recovery mode. We
                               6370                 :                :      * can't re-enter recovery, so there's no need to keep checking after the
                               6371                 :                :      * shared variable has once been seen false.
                               6372                 :                :      */
                               6373         [ +  + ]:       59737666 :     if (!LocalRecoveryInProgress)
                               6374                 :       57572943 :         return false;
                               6375                 :                :     else
                               6376                 :                :     {
                               6377                 :                :         /*
                               6378                 :                :          * use volatile pointer to make sure we make a fresh read of the
                               6379                 :                :          * shared variable.
                               6380                 :                :          */
                               6381                 :        2164723 :         volatile XLogCtlData *xlogctl = XLogCtl;
                               6382                 :                : 
 1961 michael@paquier.xyz      6383                 :        2164723 :         LocalRecoveryInProgress = (xlogctl->SharedRecoveryState != RECOVERY_STATE_DONE);
                               6384                 :                : 
                               6385                 :                :         /*
                               6386                 :                :          * Note: We don't need a memory barrier when we're still in recovery.
                               6387                 :                :          * We might exit recovery immediately after return, so the caller
                               6388                 :                :          * can't rely on 'true' meaning that we're still in recovery anyway.
                               6389                 :                :          */
                               6390                 :                : 
 6044 heikki.linnakangas@i     6391                 :        2164723 :         return LocalRecoveryInProgress;
                               6392                 :                :     }
                               6393                 :                : }
                               6394                 :                : 
                               6395                 :                : /*
                               6396                 :                :  * Returns current recovery state from shared memory.
                               6397                 :                :  *
                               6398                 :                :  * This returned state is kept consistent with the contents of the control
                               6399                 :                :  * file.  See details about the possible values of RecoveryState in xlog.h.
                               6400                 :                :  */
                               6401                 :                : RecoveryState
 1961 michael@paquier.xyz      6402                 :          28117 : GetRecoveryState(void)
                               6403                 :                : {
                               6404                 :                :     RecoveryState retval;
                               6405                 :                : 
                               6406         [ -  + ]:          28117 :     SpinLockAcquire(&XLogCtl->info_lck);
                               6407                 :          28117 :     retval = XLogCtl->SharedRecoveryState;
                               6408                 :          28117 :     SpinLockRelease(&XLogCtl->info_lck);
                               6409                 :                : 
                               6410                 :          28117 :     return retval;
                               6411                 :                : }
                               6412                 :                : 
                               6413                 :                : /*
                               6414                 :                :  * Is this process allowed to insert new WAL records?
                               6415                 :                :  *
                               6416                 :                :  * Ordinarily this is essentially equivalent to !RecoveryInProgress().
                               6417                 :                :  * But we also have provisions for forcing the result "true" or "false"
                               6418                 :                :  * within specific processes regardless of the global state.
                               6419                 :                :  */
                               6420                 :                : bool
 5916 tgl@sss.pgh.pa.us        6421                 :       29005830 : XLogInsertAllowed(void)
                               6422                 :                : {
                               6423                 :                :     /*
                               6424                 :                :      * If value is "unconditionally true" or "unconditionally false", just
                               6425                 :                :      * return it.  This provides the normal fast path once recovery is known
                               6426                 :                :      * done.
                               6427                 :                :      */
                               6428         [ +  + ]:       29005830 :     if (LocalXLogInsertAllowed >= 0)
                               6429                 :       28893864 :         return (bool) LocalXLogInsertAllowed;
                               6430                 :                : 
                               6431                 :                :     /*
                               6432                 :                :      * Else, must check to see if we're still in recovery.
                               6433                 :                :      */
                               6434         [ +  + ]:         111966 :     if (RecoveryInProgress())
                               6435                 :         104452 :         return false;
                               6436                 :                : 
                               6437                 :                :     /*
                               6438                 :                :      * On exit from recovery, reset to "unconditionally true", since there is
                               6439                 :                :      * no need to keep checking.
                               6440                 :                :      */
                               6441                 :           7514 :     LocalXLogInsertAllowed = 1;
                               6442                 :           7514 :     return true;
                               6443                 :                : }
                               6444                 :                : 
                               6445                 :                : /*
                               6446                 :                :  * Make XLogInsertAllowed() return true in the current process only.
                               6447                 :                :  *
                               6448                 :                :  * Note: it is allowed to switch LocalXLogInsertAllowed back to -1 later,
                               6449                 :                :  * and even call LocalSetXLogInsertAllowed() again after that.
                               6450                 :                :  *
                               6451                 :                :  * Returns the previous value of LocalXLogInsertAllowed.
                               6452                 :                :  */
                               6453                 :                : static int
                               6454                 :            860 : LocalSetXLogInsertAllowed(void)
                               6455                 :                : {
 1298 heikki.linnakangas@i     6456                 :            860 :     int         oldXLogAllowed = LocalXLogInsertAllowed;
                               6457                 :                : 
 5916 tgl@sss.pgh.pa.us        6458                 :            860 :     LocalXLogInsertAllowed = 1;
                               6459                 :                : 
 1412 rhaas@postgresql.org     6460                 :            860 :     return oldXLogAllowed;
                               6461                 :                : }
                               6462                 :                : 
                               6463                 :                : /*
                               6464                 :                :  * Return the current Redo pointer from shared memory.
                               6465                 :                :  *
                               6466                 :                :  * As a side-effect, the local RedoRecPtr copy is updated.
                               6467                 :                :  */
                               6468                 :                : XLogRecPtr
 9018 vadim4o@yahoo.com        6469                 :         207582 : GetRedoRecPtr(void)
                               6470                 :                : {
                               6471                 :                :     XLogRecPtr  ptr;
                               6472                 :                : 
                               6473                 :                :     /*
                               6474                 :                :      * The possibly not up-to-date copy in XlogCtl is enough. Even if we
                               6475                 :                :      * grabbed a WAL insertion lock to read the authoritative value in
                               6476                 :                :      * Insert->RedoRecPtr, someone might update it just after we've released
                               6477                 :                :      * the lock.
                               6478                 :                :      */
 4002 andres@anarazel.de       6479         [ +  + ]:         207582 :     SpinLockAcquire(&XLogCtl->info_lck);
                               6480                 :         207582 :     ptr = XLogCtl->RedoRecPtr;
                               6481                 :         207582 :     SpinLockRelease(&XLogCtl->info_lck);
                               6482                 :                : 
 4443 heikki.linnakangas@i     6483         [ +  + ]:         207582 :     if (RedoRecPtr < ptr)
                               6484                 :           1521 :         RedoRecPtr = ptr;
                               6485                 :                : 
 8576 tgl@sss.pgh.pa.us        6486                 :         207582 :     return RedoRecPtr;
                               6487                 :                : }
                               6488                 :                : 
                               6489                 :                : /*
                               6490                 :                :  * Return information needed to decide whether a modified block needs a
                               6491                 :                :  * full-page image to be included in the WAL record.
                               6492                 :                :  *
                               6493                 :                :  * The returned values are cached copies from backend-private memory, and
                               6494                 :                :  * possibly out-of-date or, indeed, uninitialized, in which case they will
                               6495                 :                :  * be InvalidXLogRecPtr and false, respectively.  XLogInsertRecord will
                               6496                 :                :  * re-check them against up-to-date values, while holding the WAL insert lock.
                               6497                 :                :  */
                               6498                 :                : void
 3957 heikki.linnakangas@i     6499                 :       13996795 : GetFullPageWriteInfo(XLogRecPtr *RedoRecPtr_p, bool *doPageWrites_p)
                               6500                 :                : {
                               6501                 :       13996795 :     *RedoRecPtr_p = RedoRecPtr;
                               6502                 :       13996795 :     *doPageWrites_p = doPageWrites;
                               6503                 :       13996795 : }
                               6504                 :                : 
                               6505                 :                : /*
                               6506                 :                :  * GetInsertRecPtr -- Returns the current insert position.
                               6507                 :                :  *
                               6508                 :                :  * NOTE: The value *actually* returned is the position of the last full
                               6509                 :                :  * xlog page. It lags behind the real insert position by at most 1 page.
                               6510                 :                :  * For that, we don't need to scan through WAL insertion locks, and an
                               6511                 :                :  * approximation is enough for the current usage of this function.
                               6512                 :                :  */
                               6513                 :                : XLogRecPtr
 6645 tgl@sss.pgh.pa.us        6514                 :           7007 : GetInsertRecPtr(void)
                               6515                 :                : {
                               6516                 :                :     XLogRecPtr  recptr;
                               6517                 :                : 
 4002 andres@anarazel.de       6518         [ +  + ]:           7007 :     SpinLockAcquire(&XLogCtl->info_lck);
                               6519                 :           7007 :     recptr = XLogCtl->LogwrtRqst.Write;
                               6520                 :           7007 :     SpinLockRelease(&XLogCtl->info_lck);
                               6521                 :                : 
 6645 tgl@sss.pgh.pa.us        6522                 :           7007 :     return recptr;
                               6523                 :                : }
                               6524                 :                : 
                               6525                 :                : /*
                               6526                 :                :  * GetFlushRecPtr -- Returns the current flush position, ie, the last WAL
                               6527                 :                :  * position known to be fsync'd to disk. This should only be used on a
                               6528                 :                :  * system that is known not to be in recovery.
                               6529                 :                :  */
                               6530                 :                : XLogRecPtr
 1401 rhaas@postgresql.org     6531                 :         178816 : GetFlushRecPtr(TimeLineID *insertTLI)
                               6532                 :                : {
 1396                          6533         [ -  + ]:         178816 :     Assert(XLogCtl->SharedRecoveryState == RECOVERY_STATE_DONE);
                               6534                 :                : 
  521 alvherre@alvh.no-ip.     6535                 :         178816 :     RefreshXLogWriteResult(LogwrtResult);
                               6536                 :                : 
                               6537                 :                :     /*
                               6538                 :                :      * If we're writing and flushing WAL, the time line can't be changing, so
                               6539                 :                :      * no lock is required.
                               6540                 :                :      */
 1401 rhaas@postgresql.org     6541         [ +  + ]:         178816 :     if (insertTLI)
 1396                          6542                 :          22981 :         *insertTLI = XLogCtl->InsertTimeLineID;
                               6543                 :                : 
 3525 simon@2ndQuadrant.co     6544                 :         178816 :     return LogwrtResult.Flush;
                               6545                 :                : }
                               6546                 :                : 
                               6547                 :                : /*
                               6548                 :                :  * GetWALInsertionTimeLine -- Returns the current timeline of a system that
                               6549                 :                :  * is not in recovery.
                               6550                 :                :  */
                               6551                 :                : TimeLineID
 1401 rhaas@postgresql.org     6552                 :         110439 : GetWALInsertionTimeLine(void)
                               6553                 :                : {
                               6554         [ -  + ]:         110439 :     Assert(XLogCtl->SharedRecoveryState == RECOVERY_STATE_DONE);
                               6555                 :                : 
                               6556                 :                :     /* Since the value can't be changing, no lock is required. */
 1396                          6557                 :         110439 :     return XLogCtl->InsertTimeLineID;
                               6558                 :                : }
                               6559                 :                : 
                               6560                 :                : /*
                               6561                 :                :  * GetWALInsertionTimeLineIfSet -- If the system is not in recovery, returns
                               6562                 :                :  * the WAL insertion timeline; else, returns 0. Wherever possible, use
                               6563                 :                :  * GetWALInsertionTimeLine() instead, since it's cheaper. Note that this
                               6564                 :                :  * function decides recovery has ended as soon as the insert TLI is set, which
                               6565                 :                :  * happens before we set XLogCtl->SharedRecoveryState to RECOVERY_STATE_DONE.
                               6566                 :                :  */
                               6567                 :                : TimeLineID
  407 rhaas@postgresql.org     6568                 :UBC           0 : GetWALInsertionTimeLineIfSet(void)
                               6569                 :                : {
                               6570                 :                :     TimeLineID  insertTLI;
                               6571                 :                : 
                               6572         [ #  # ]:              0 :     SpinLockAcquire(&XLogCtl->info_lck);
                               6573                 :              0 :     insertTLI = XLogCtl->InsertTimeLineID;
                               6574                 :              0 :     SpinLockRelease(&XLogCtl->info_lck);
                               6575                 :                : 
                               6576                 :              0 :     return insertTLI;
                               6577                 :                : }
                               6578                 :                : 
                               6579                 :                : /*
                               6580                 :                :  * GetLastImportantRecPtr -- Returns the LSN of the last important record
                               6581                 :                :  * inserted. All records not explicitly marked as unimportant are considered
                               6582                 :                :  * important.
                               6583                 :                :  *
                               6584                 :                :  * The LSN is determined by computing the maximum of
                               6585                 :                :  * WALInsertLocks[i].lastImportantAt.
                               6586                 :                :  */
                               6587                 :                : XLogRecPtr
 3180 andres@anarazel.de       6588                 :CBC        1524 : GetLastImportantRecPtr(void)
                               6589                 :                : {
                               6590                 :           1524 :     XLogRecPtr  res = InvalidXLogRecPtr;
                               6591                 :                :     int         i;
                               6592                 :                : 
                               6593         [ +  + ]:          13716 :     for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
                               6594                 :                :     {
                               6595                 :                :         XLogRecPtr  last_important;
                               6596                 :                : 
                               6597                 :                :         /*
                               6598                 :                :          * Need to take a lock to prevent torn reads of the LSN, which are
                               6599                 :                :          * possible on some of the supported platforms. WAL insert locks only
                               6600                 :                :          * support exclusive mode, so we have to use that.
                               6601                 :                :          */
                               6602                 :          12192 :         LWLockAcquire(&WALInsertLocks[i].l.lock, LW_EXCLUSIVE);
                               6603                 :          12192 :         last_important = WALInsertLocks[i].l.lastImportantAt;
                               6604                 :          12192 :         LWLockRelease(&WALInsertLocks[i].l.lock);
                               6605                 :                : 
                               6606         [ +  + ]:          12192 :         if (res < last_important)
                               6607                 :           2635 :             res = last_important;
                               6608                 :                :     }
                               6609                 :                : 
                               6610                 :           1524 :     return res;
                               6611                 :                : }
                               6612                 :                : 
                               6613                 :                : /*
                               6614                 :                :  * Get the time and LSN of the last xlog segment switch
                               6615                 :                :  */
                               6616                 :                : pg_time_t
 3180 andres@anarazel.de       6617                 :UBC           0 : GetLastSegSwitchData(XLogRecPtr *lastSwitchLSN)
                               6618                 :                : {
                               6619                 :                :     pg_time_t   result;
                               6620                 :                : 
                               6621                 :                :     /* Need WALWriteLock, but shared lock is sufficient */
 6960 tgl@sss.pgh.pa.us        6622                 :              0 :     LWLockAcquire(WALWriteLock, LW_SHARED);
 4434 heikki.linnakangas@i     6623                 :              0 :     result = XLogCtl->lastSegSwitchTime;
 3180 andres@anarazel.de       6624                 :              0 :     *lastSwitchLSN = XLogCtl->lastSegSwitchLSN;
 6960 tgl@sss.pgh.pa.us        6625                 :              0 :     LWLockRelease(WALWriteLock);
                               6626                 :                : 
                               6627                 :              0 :     return result;
                               6628                 :                : }
                               6629                 :                : 
                               6630                 :                : /*
                               6631                 :                :  * This must be called ONCE during postmaster or standalone-backend shutdown
                               6632                 :                :  */
                               6633                 :                : void
 7939 peter_e@gmx.net          6634                 :CBC         607 : ShutdownXLOG(int code, Datum arg)
                               6635                 :                : {
                               6636                 :                :     /*
                               6637                 :                :      * We should have an aux process resource owner to use, and we should not
                               6638                 :                :      * be in a transaction that's installed some other resowner.
                               6639                 :                :      */
 2607 tgl@sss.pgh.pa.us        6640         [ -  + ]:            607 :     Assert(AuxProcessResourceOwner != NULL);
                               6641   [ +  +  -  + ]:            607 :     Assert(CurrentResourceOwner == NULL ||
                               6642                 :                :            CurrentResourceOwner == AuxProcessResourceOwner);
                               6643                 :            607 :     CurrentResourceOwner = AuxProcessResourceOwner;
                               6644                 :                : 
                               6645                 :                :     /* Don't be chatty in standalone mode */
 4468                          6646   [ +  +  +  + ]:            607 :     ereport(IsPostmasterEnvironment ? LOG : NOTICE,
                               6647                 :                :             (errmsg("shutting down")));
                               6648                 :                : 
                               6649                 :                :     /*
                               6650                 :                :      * Signal walsenders to move to stopping state.
                               6651                 :                :      */
 3015 andres@anarazel.de       6652                 :            607 :     WalSndInitStopping();
                               6653                 :                : 
                               6654                 :                :     /*
                               6655                 :                :      * Wait for WAL senders to be in stopping state.  This prevents commands
                               6656                 :                :      * from writing new WAL.
                               6657                 :                :      */
                               6658                 :            607 :     WalSndWaitStopping();
                               6659                 :                : 
 6044 heikki.linnakangas@i     6660         [ +  + ]:            607 :     if (RecoveryInProgress())
   57 nathan@postgresql.or     6661                 :GNC          52 :         CreateRestartPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_FAST);
                               6662                 :                :     else
                               6663                 :                :     {
                               6664                 :                :         /*
                               6665                 :                :          * If archiving is enabled, rotate the last XLOG file so that all the
                               6666                 :                :          * remaining records are archived (postmaster wakes up the archiver
                               6667                 :                :          * process one more time at the end of shutdown). The checkpoint
                               6668                 :                :          * record will go to the next XLOG file and won't be archived (yet).
                               6669                 :                :          */
 1311 rhaas@postgresql.org     6670   [ +  +  -  +  :CBC         555 :         if (XLogArchivingActive())
                                              +  + ]
 3180 andres@anarazel.de       6671                 :             12 :             RequestXLogSwitch(false);
                               6672                 :                : 
   57 nathan@postgresql.or     6673                 :GNC         555 :         CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_FAST);
                               6674                 :                :     }
 9467 vadim4o@yahoo.com        6675                 :CBC         607 : }
                               6676                 :                : 
                               6677                 :                : /*
                               6678                 :                :  * Log start of a checkpoint.
                               6679                 :                :  */
                               6680                 :                : static void
 6044 heikki.linnakangas@i     6681                 :           1385 : LogCheckpointStart(int flags, bool restartpoint)
                               6682                 :                : {
 1737 peter@eisentraut.org     6683         [ +  + ]:           1385 :     if (restartpoint)
                               6684   [ +  -  -  +  :            198 :         ereport(LOG,
                                     -  +  +  +  +  
                                     +  +  +  +  +  
                                        -  +  +  + ]
                               6685                 :                :         /* translator: the placeholders show checkpoint options */
                               6686                 :                :                 (errmsg("restartpoint starting:%s%s%s%s%s%s%s%s",
                               6687                 :                :                         (flags & CHECKPOINT_IS_SHUTDOWN) ? " shutdown" : "",
                               6688                 :                :                         (flags & CHECKPOINT_END_OF_RECOVERY) ? " end-of-recovery" : "",
                               6689                 :                :                         (flags & CHECKPOINT_FAST) ? " fast" : "",
                               6690                 :                :                         (flags & CHECKPOINT_FORCE) ? " force" : "",
                               6691                 :                :                         (flags & CHECKPOINT_WAIT) ? " wait" : "",
                               6692                 :                :                         (flags & CHECKPOINT_CAUSE_XLOG) ? " wal" : "",
                               6693                 :                :                         (flags & CHECKPOINT_CAUSE_TIME) ? " time" : "",
                               6694                 :                :                         (flags & CHECKPOINT_FLUSH_UNLOGGED) ? " flush-unlogged" : "")));
                               6695                 :                :     else
                               6696   [ +  -  +  +  :           1187 :         ereport(LOG,
                                     -  +  +  +  +  
                                     +  +  +  +  +  
                                        +  +  +  + ]
                               6697                 :                :         /* translator: the placeholders show checkpoint options */
                               6698                 :                :                 (errmsg("checkpoint starting:%s%s%s%s%s%s%s%s",
                               6699                 :                :                         (flags & CHECKPOINT_IS_SHUTDOWN) ? " shutdown" : "",
                               6700                 :                :                         (flags & CHECKPOINT_END_OF_RECOVERY) ? " end-of-recovery" : "",
                               6701                 :                :                         (flags & CHECKPOINT_FAST) ? " fast" : "",
                               6702                 :                :                         (flags & CHECKPOINT_FORCE) ? " force" : "",
                               6703                 :                :                         (flags & CHECKPOINT_WAIT) ? " wait" : "",
                               6704                 :                :                         (flags & CHECKPOINT_CAUSE_XLOG) ? " wal" : "",
                               6705                 :                :                         (flags & CHECKPOINT_CAUSE_TIME) ? " time" : "",
                               6706                 :                :                         (flags & CHECKPOINT_FLUSH_UNLOGGED) ? " flush-unlogged" : "")));
 6643 tgl@sss.pgh.pa.us        6707                 :           1385 : }
                               6708                 :                : 
                               6709                 :                : /*
                               6710                 :                :  * Log end of a checkpoint.
                               6711                 :                :  */
                               6712                 :                : static void
 6044 heikki.linnakangas@i     6713                 :           1677 : LogCheckpointEnd(bool restartpoint)
                               6714                 :                : {
                               6715                 :                :     long        write_msecs,
                               6716                 :                :                 sync_msecs,
                               6717                 :                :                 total_msecs,
                               6718                 :                :                 longest_msecs,
                               6719                 :                :                 average_msecs;
                               6720                 :                :     uint64      average_sync_time;
                               6721                 :                : 
 6643 tgl@sss.pgh.pa.us        6722                 :           1677 :     CheckpointStats.ckpt_end_t = GetCurrentTimestamp();
                               6723                 :                : 
 1761                          6724                 :           1677 :     write_msecs = TimestampDifferenceMilliseconds(CheckpointStats.ckpt_write_t,
                               6725                 :                :                                                   CheckpointStats.ckpt_sync_t);
                               6726                 :                : 
                               6727                 :           1677 :     sync_msecs = TimestampDifferenceMilliseconds(CheckpointStats.ckpt_sync_t,
                               6728                 :                :                                                  CheckpointStats.ckpt_sync_end_t);
                               6729                 :                : 
                               6730                 :                :     /* Accumulate checkpoint timing summary data, in milliseconds. */
  677 michael@paquier.xyz      6731                 :           1677 :     PendingCheckpointerStats.write_time += write_msecs;
                               6732                 :           1677 :     PendingCheckpointerStats.sync_time += sync_msecs;
                               6733                 :                : 
                               6734                 :                :     /*
                               6735                 :                :      * All of the published timing statistics are accounted for.  Only
                               6736                 :                :      * continue if a log message is to be written.
                               6737                 :                :      */
 4902 rhaas@postgresql.org     6738         [ +  + ]:           1677 :     if (!log_checkpoints)
                               6739                 :            292 :         return;
                               6740                 :                : 
 1761 tgl@sss.pgh.pa.us        6741                 :           1385 :     total_msecs = TimestampDifferenceMilliseconds(CheckpointStats.ckpt_start_t,
                               6742                 :                :                                                   CheckpointStats.ckpt_end_t);
                               6743                 :                : 
                               6744                 :                :     /*
                               6745                 :                :      * Timing values returned from CheckpointStats are in microseconds.
                               6746                 :                :      * Convert to milliseconds for consistent printing.
                               6747                 :                :      */
                               6748                 :           1385 :     longest_msecs = (long) ((CheckpointStats.ckpt_longest_sync + 999) / 1000);
                               6749                 :                : 
 5380 rhaas@postgresql.org     6750                 :           1385 :     average_sync_time = 0;
 5263 bruce@momjian.us         6751         [ -  + ]:           1385 :     if (CheckpointStats.ckpt_sync_rels > 0)
 5380 rhaas@postgresql.org     6752                 :UBC           0 :         average_sync_time = CheckpointStats.ckpt_agg_sync_time /
                               6753                 :              0 :             CheckpointStats.ckpt_sync_rels;
 1761 tgl@sss.pgh.pa.us        6754                 :CBC        1385 :     average_msecs = (long) ((average_sync_time + 999) / 1000);
                               6755                 :                : 
                               6756                 :                :     /*
                               6757                 :                :      * ControlFileLock is not required to see ControlFile->checkPoint and
                               6758                 :                :      * ->checkPointCopy here as we are the only updator of those variables at
                               6759                 :                :      * this moment.
                               6760                 :                :      */
 1737 peter@eisentraut.org     6761         [ +  + ]:           1385 :     if (restartpoint)
                               6762         [ +  - ]:            198 :         ereport(LOG,
                               6763                 :                :                 (errmsg("restartpoint complete: wrote %d buffers (%.1f%%), "
                               6764                 :                :                         "wrote %d SLRU buffers; %d WAL file(s) added, "
                               6765                 :                :                         "%d removed, %d recycled; write=%ld.%03d s, "
                               6766                 :                :                         "sync=%ld.%03d s, total=%ld.%03d s; sync files=%d, "
                               6767                 :                :                         "longest=%ld.%03d s, average=%ld.%03d s; distance=%d kB, "
                               6768                 :                :                         "estimate=%d kB; lsn=%X/%08X, redo lsn=%X/%08X",
                               6769                 :                :                         CheckpointStats.ckpt_bufs_written,
                               6770                 :                :                         (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
                               6771                 :                :                         CheckpointStats.ckpt_slru_written,
                               6772                 :                :                         CheckpointStats.ckpt_segs_added,
                               6773                 :                :                         CheckpointStats.ckpt_segs_removed,
                               6774                 :                :                         CheckpointStats.ckpt_segs_recycled,
                               6775                 :                :                         write_msecs / 1000, (int) (write_msecs % 1000),
                               6776                 :                :                         sync_msecs / 1000, (int) (sync_msecs % 1000),
                               6777                 :                :                         total_msecs / 1000, (int) (total_msecs % 1000),
                               6778                 :                :                         CheckpointStats.ckpt_sync_rels,
                               6779                 :                :                         longest_msecs / 1000, (int) (longest_msecs % 1000),
                               6780                 :                :                         average_msecs / 1000, (int) (average_msecs % 1000),
                               6781                 :                :                         (int) (PrevCheckPointDistance / 1024.0),
                               6782                 :                :                         (int) (CheckPointDistanceEstimate / 1024.0),
                               6783                 :                :                         LSN_FORMAT_ARGS(ControlFile->checkPoint),
                               6784                 :                :                         LSN_FORMAT_ARGS(ControlFile->checkPointCopy.redo))));
                               6785                 :                :     else
                               6786         [ +  - ]:           1187 :         ereport(LOG,
                               6787                 :                :                 (errmsg("checkpoint complete: wrote %d buffers (%.1f%%), "
                               6788                 :                :                         "wrote %d SLRU buffers; %d WAL file(s) added, "
                               6789                 :                :                         "%d removed, %d recycled; write=%ld.%03d s, "
                               6790                 :                :                         "sync=%ld.%03d s, total=%ld.%03d s; sync files=%d, "
                               6791                 :                :                         "longest=%ld.%03d s, average=%ld.%03d s; distance=%d kB, "
                               6792                 :                :                         "estimate=%d kB; lsn=%X/%08X, redo lsn=%X/%08X",
                               6793                 :                :                         CheckpointStats.ckpt_bufs_written,
                               6794                 :                :                         (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
                               6795                 :                :                         CheckpointStats.ckpt_slru_written,
                               6796                 :                :                         CheckpointStats.ckpt_segs_added,
                               6797                 :                :                         CheckpointStats.ckpt_segs_removed,
                               6798                 :                :                         CheckpointStats.ckpt_segs_recycled,
                               6799                 :                :                         write_msecs / 1000, (int) (write_msecs % 1000),
                               6800                 :                :                         sync_msecs / 1000, (int) (sync_msecs % 1000),
                               6801                 :                :                         total_msecs / 1000, (int) (total_msecs % 1000),
                               6802                 :                :                         CheckpointStats.ckpt_sync_rels,
                               6803                 :                :                         longest_msecs / 1000, (int) (longest_msecs % 1000),
                               6804                 :                :                         average_msecs / 1000, (int) (average_msecs % 1000),
                               6805                 :                :                         (int) (PrevCheckPointDistance / 1024.0),
                               6806                 :                :                         (int) (CheckPointDistanceEstimate / 1024.0),
                               6807                 :                :                         LSN_FORMAT_ARGS(ControlFile->checkPoint),
                               6808                 :                :                         LSN_FORMAT_ARGS(ControlFile->checkPointCopy.redo))));
                               6809                 :                : }
                               6810                 :                : 
                               6811                 :                : /*
                               6812                 :                :  * Update the estimate of distance between checkpoints.
                               6813                 :                :  *
                               6814                 :                :  * The estimate is used to calculate the number of WAL segments to keep
                               6815                 :                :  * preallocated, see XLOGfileslop().
                               6816                 :                :  */
                               6817                 :                : static void
 3848 heikki.linnakangas@i     6818                 :           1677 : UpdateCheckPointDistanceEstimate(uint64 nbytes)
                               6819                 :                : {
                               6820                 :                :     /*
                               6821                 :                :      * To estimate the number of segments consumed between checkpoints, keep a
                               6822                 :                :      * moving average of the amount of WAL generated in previous checkpoint
                               6823                 :                :      * cycles. However, if the load is bursty, with quiet periods and busy
                               6824                 :                :      * periods, we want to cater for the peak load. So instead of a plain
                               6825                 :                :      * moving average, let the average decline slowly if the previous cycle
                               6826                 :                :      * used less WAL than estimated, but bump it up immediately if it used
                               6827                 :                :      * more.
                               6828                 :                :      *
                               6829                 :                :      * When checkpoints are triggered by max_wal_size, this should converge to
                               6830                 :                :      * CheckpointSegments * wal_segment_size,
                               6831                 :                :      *
                               6832                 :                :      * Note: This doesn't pay any attention to what caused the checkpoint.
                               6833                 :                :      * Checkpoints triggered manually with CHECKPOINT command, or by e.g.
                               6834                 :                :      * starting a base backup, are counted the same as those created
                               6835                 :                :      * automatically. The slow-decline will largely mask them out, if they are
                               6836                 :                :      * not frequent. If they are frequent, it seems reasonable to count them
                               6837                 :                :      * in as any others; if you issue a manual checkpoint every 5 minutes and
                               6838                 :                :      * never let a timed checkpoint happen, it makes sense to base the
                               6839                 :                :      * preallocation on that 5 minute interval rather than whatever
                               6840                 :                :      * checkpoint_timeout is set to.
                               6841                 :                :      */
                               6842                 :           1677 :     PrevCheckPointDistance = nbytes;
                               6843         [ +  + ]:           1677 :     if (CheckPointDistanceEstimate < nbytes)
                               6844                 :            695 :         CheckPointDistanceEstimate = nbytes;
                               6845                 :                :     else
                               6846                 :            982 :         CheckPointDistanceEstimate =
                               6847                 :            982 :             (0.90 * CheckPointDistanceEstimate + 0.10 * (double) nbytes);
 6643 tgl@sss.pgh.pa.us        6848                 :           1677 : }
                               6849                 :                : 
                               6850                 :                : /*
                               6851                 :                :  * Update the ps display for a process running a checkpoint.  Note that
                               6852                 :                :  * this routine should not do any allocations so as it can be called
                               6853                 :                :  * from a critical section.
                               6854                 :                :  */
                               6855                 :                : static void
 1727 michael@paquier.xyz      6856                 :           3354 : update_checkpoint_display(int flags, bool restartpoint, bool reset)
                               6857                 :                : {
                               6858                 :                :     /*
                               6859                 :                :      * The status is reported only for end-of-recovery and shutdown
                               6860                 :                :      * checkpoints or shutdown restartpoints.  Updating the ps display is
                               6861                 :                :      * useful in those situations as it may not be possible to rely on
                               6862                 :                :      * pg_stat_activity to see the status of the checkpointer or the startup
                               6863                 :                :      * process.
                               6864                 :                :      */
                               6865         [ +  + ]:           3354 :     if ((flags & (CHECKPOINT_END_OF_RECOVERY | CHECKPOINT_IS_SHUTDOWN)) == 0)
                               6866                 :           2144 :         return;
                               6867                 :                : 
                               6868         [ +  + ]:           1210 :     if (reset)
                               6869                 :            605 :         set_ps_display("");
                               6870                 :                :     else
                               6871                 :                :     {
                               6872                 :                :         char        activitymsg[128];
                               6873                 :                : 
                               6874         [ +  + ]:           1815 :         snprintf(activitymsg, sizeof(activitymsg), "performing %s%s%s",
                               6875         [ +  + ]:            605 :                  (flags & CHECKPOINT_END_OF_RECOVERY) ? "end-of-recovery " : "",
                               6876         [ +  + ]:            605 :                  (flags & CHECKPOINT_IS_SHUTDOWN) ? "shutdown " : "",
                               6877                 :                :                  restartpoint ? "restartpoint" : "checkpoint");
                               6878                 :            605 :         set_ps_display(activitymsg);
                               6879                 :                :     }
                               6880                 :                : }
                               6881                 :                : 
                               6882                 :                : 
                               6883                 :                : /*
                               6884                 :                :  * Perform a checkpoint --- either during shutdown, or on-the-fly
                               6885                 :                :  *
                               6886                 :                :  * flags is a bitwise OR of the following:
                               6887                 :                :  *  CHECKPOINT_IS_SHUTDOWN: checkpoint is for database shutdown.
                               6888                 :                :  *  CHECKPOINT_END_OF_RECOVERY: checkpoint is for end of WAL recovery.
                               6889                 :                :  *  CHECKPOINT_FAST: finish the checkpoint ASAP, ignoring
                               6890                 :                :  *      checkpoint_completion_target parameter.
                               6891                 :                :  *  CHECKPOINT_FORCE: force a checkpoint even if no XLOG activity has occurred
                               6892                 :                :  *      since the last one (implied by CHECKPOINT_IS_SHUTDOWN or
                               6893                 :                :  *      CHECKPOINT_END_OF_RECOVERY).
                               6894                 :                :  *  CHECKPOINT_FLUSH_UNLOGGED: also flush buffers of unlogged tables.
                               6895                 :                :  *
                               6896                 :                :  * Note: flags contains other bits, of interest here only for logging purposes.
                               6897                 :                :  * In particular note that this routine is synchronous and does not pay
                               6898                 :                :  * attention to CHECKPOINT_WAIT.
                               6899                 :                :  *
                               6900                 :                :  * If !shutdown then we are writing an online checkpoint. An XLOG_CHECKPOINT_REDO
                               6901                 :                :  * record is inserted into WAL at the logical location of the checkpoint, before
                               6902                 :                :  * flushing anything to disk, and when the checkpoint is eventually completed,
                               6903                 :                :  * and it is from this point that WAL replay will begin in the case of a recovery
                               6904                 :                :  * from this checkpoint. Once everything is written to disk, an
                               6905                 :                :  * XLOG_CHECKPOINT_ONLINE record is written to complete the checkpoint, and
                               6906                 :                :  * points back to the earlier XLOG_CHECKPOINT_REDO record. This mechanism allows
                               6907                 :                :  * other write-ahead log records to be written while the checkpoint is in
                               6908                 :                :  * progress, but we must be very careful about order of operations. This function
                               6909                 :                :  * may take many minutes to execute on a busy system.
                               6910                 :                :  *
                               6911                 :                :  * On the other hand, when shutdown is true, concurrent insertion into the
                               6912                 :                :  * write-ahead log is impossible, so there is no need for two separate records.
                               6913                 :                :  * In this case, we only insert an XLOG_CHECKPOINT_SHUTDOWN record, and it's
                               6914                 :                :  * both the record marking the completion of the checkpoint and the location
                               6915                 :                :  * from which WAL replay would begin if needed.
                               6916                 :                :  *
                               6917                 :                :  * Returns true if a new checkpoint was performed, or false if it was skipped
                               6918                 :                :  * because the system was idle.
                               6919                 :                :  */
                               6920                 :                : bool
 6645 tgl@sss.pgh.pa.us        6921                 :           1479 : CreateCheckPoint(int flags)
                               6922                 :                : {
                               6923                 :                :     bool        shutdown;
                               6924                 :                :     CheckPoint  checkPoint;
                               6925                 :                :     XLogRecPtr  recptr;
                               6926                 :                :     XLogSegNo   _logSegNo;
 9278 bruce@momjian.us         6927                 :           1479 :     XLogCtlInsert *Insert = &XLogCtl->Insert;
                               6928                 :                :     uint32      freespace;
                               6929                 :                :     XLogRecPtr  PriorRedoPtr;
                               6930                 :                :     XLogRecPtr  last_important_lsn;
                               6931                 :                :     VirtualTransactionId *vxids;
                               6932                 :                :     int         nvxids;
 1412 rhaas@postgresql.org     6933                 :           1479 :     int         oldXLogAllowed = 0;
                               6934                 :                : 
                               6935                 :                :     /*
                               6936                 :                :      * An end-of-recovery checkpoint is really a shutdown checkpoint, just
                               6937                 :                :      * issued at a different time.
                               6938                 :                :      */
 5916 tgl@sss.pgh.pa.us        6939         [ +  + ]:           1479 :     if (flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY))
 5917 heikki.linnakangas@i     6940                 :            583 :         shutdown = true;
                               6941                 :                :     else
                               6942                 :            896 :         shutdown = false;
                               6943                 :                : 
                               6944                 :                :     /* sanity check */
 5916 tgl@sss.pgh.pa.us        6945   [ +  +  -  + ]:           1479 :     if (RecoveryInProgress() && (flags & CHECKPOINT_END_OF_RECOVERY) == 0)
 5916 tgl@sss.pgh.pa.us        6946         [ #  # ]:UBC           0 :         elog(ERROR, "can't create a checkpoint during recovery");
                               6947                 :                : 
                               6948                 :                :     /*
                               6949                 :                :      * Prepare to accumulate statistics.
                               6950                 :                :      *
                               6951                 :                :      * Note: because it is possible for log_checkpoints to change while a
                               6952                 :                :      * checkpoint proceeds, we always accumulate stats, even if
                               6953                 :                :      * log_checkpoints is currently off.
                               6954                 :                :      */
 6643 tgl@sss.pgh.pa.us        6955   [ +  -  +  -  :CBC       16269 :     MemSet(&CheckpointStats, 0, sizeof(CheckpointStats));
                                     +  -  +  -  +  
                                                 + ]
                               6956                 :           1479 :     CheckpointStats.ckpt_start_t = GetCurrentTimestamp();
                               6957                 :                : 
                               6958                 :                :     /*
                               6959                 :                :      * Let smgr prepare for checkpoint; this has to happen outside the
                               6960                 :                :      * critical section and before we determine the REDO pointer.  Note that
                               6961                 :                :      * smgr must not do anything that'd have to be undone if we decide no
                               6962                 :                :      * checkpoint is needed.
                               6963                 :                :      */
 1270 tmunro@postgresql.or     6964                 :           1479 :     SyncPreCheckpoint();
                               6965                 :                : 
                               6966                 :                :     /*
                               6967                 :                :      * Use a critical section to force system panic if we have trouble.
                               6968                 :                :      */
 8743 tgl@sss.pgh.pa.us        6969                 :           1479 :     START_CRIT_SECTION();
                               6970                 :                : 
 9476 vadim4o@yahoo.com        6971         [ +  + ]:           1479 :     if (shutdown)
                               6972                 :                :     {
 6044 heikki.linnakangas@i     6973                 :            583 :         LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 9476 vadim4o@yahoo.com        6974                 :            583 :         ControlFile->state = DB_SHUTDOWNING;
                               6975                 :            583 :         UpdateControlFile();
 6044 heikki.linnakangas@i     6976                 :            583 :         LWLockRelease(ControlFileLock);
                               6977                 :                :     }
                               6978                 :                : 
                               6979                 :                :     /* Begin filling in the checkpoint WAL record */
 8143 tgl@sss.pgh.pa.us        6980   [ +  -  +  -  :          17748 :     MemSet(&checkPoint, 0, sizeof(checkPoint));
                                     +  -  +  -  +  
                                                 + ]
 6411                          6981                 :           1479 :     checkPoint.time = (pg_time_t) time(NULL);
                               6982                 :                : 
                               6983                 :                :     /*
                               6984                 :                :      * For Hot Standby, derive the oldestActiveXid before we fix the redo
                               6985                 :                :      * pointer. This allows us to begin accumulating changes to assemble our
                               6986                 :                :      * starting snapshot of locks and transactions.
                               6987                 :                :      */
 5057 simon@2ndQuadrant.co     6988   [ +  +  +  + ]:           1479 :     if (!shutdown && XLogStandbyInfoActive())
   45 akapila@postgresql.o     6989                 :GNC         850 :         checkPoint.oldestActiveXid = GetOldestActiveTransactionId(false, true);
                               6990                 :                :     else
 5057 simon@2ndQuadrant.co     6991                 :CBC         629 :         checkPoint.oldestActiveXid = InvalidTransactionId;
                               6992                 :                : 
                               6993                 :                :     /*
                               6994                 :                :      * Get location of last important record before acquiring insert locks (as
                               6995                 :                :      * GetLastImportantRecPtr() also locks WAL locks).
                               6996                 :                :      */
 3180 andres@anarazel.de       6997                 :           1479 :     last_important_lsn = GetLastImportantRecPtr();
                               6998                 :                : 
                               6999                 :                :     /*
                               7000                 :                :      * If this isn't a shutdown or forced checkpoint, and if there has been no
                               7001                 :                :      * WAL activity requiring a checkpoint, skip it.  The idea here is to
                               7002                 :                :      * avoid inserting duplicate checkpoints when the system is idle.
                               7003                 :                :      */
 5917 heikki.linnakangas@i     7004         [ +  + ]:           1479 :     if ((flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY |
                               7005                 :                :                   CHECKPOINT_FORCE)) == 0)
                               7006                 :                :     {
 3180 andres@anarazel.de       7007         [ -  + ]:            189 :         if (last_important_lsn == ControlFile->checkPoint)
                               7008                 :                :         {
 8943 tgl@sss.pgh.pa.us        7009         [ #  # ]:LBC         (1) :             END_CRIT_SECTION();
 3180 andres@anarazel.de       7010         [ #  # ]:            (1) :             ereport(DEBUG1,
                               7011                 :                :                     (errmsg_internal("checkpoint skipped because system is idle")));
  341 fujii@postgresql.org     7012                 :            (1) :             return false;
                               7013                 :                :         }
                               7014                 :                :     }
                               7015                 :                : 
                               7016                 :                :     /*
                               7017                 :                :      * An end-of-recovery checkpoint is created before anyone is allowed to
                               7018                 :                :      * write WAL. To allow us to write the checkpoint record, temporarily
                               7019                 :                :      * enable XLogInsertAllowed.
                               7020                 :                :      */
 5854 heikki.linnakangas@i     7021         [ +  + ]:CBC        1479 :     if (flags & CHECKPOINT_END_OF_RECOVERY)
 1412 rhaas@postgresql.org     7022                 :             28 :         oldXLogAllowed = LocalSetXLogInsertAllowed();
                               7023                 :                : 
 1396                          7024                 :           1479 :     checkPoint.ThisTimeLineID = XLogCtl->InsertTimeLineID;
 4590 heikki.linnakangas@i     7025         [ +  + ]:           1479 :     if (flags & CHECKPOINT_END_OF_RECOVERY)
                               7026                 :             28 :         checkPoint.PrevTimeLineID = XLogCtl->PrevTimeLineID;
                               7027                 :                :     else
 1401 rhaas@postgresql.org     7028                 :           1451 :         checkPoint.PrevTimeLineID = checkPoint.ThisTimeLineID;
                               7029                 :                : 
                               7030                 :                :     /*
                               7031                 :                :      * We must block concurrent insertions while examining insert state.
                               7032                 :                :      */
  688                          7033                 :           1479 :     WALInsertLockAcquireExclusive();
                               7034                 :                : 
                               7035                 :           1479 :     checkPoint.fullPageWrites = Insert->fullPageWrites;
  415                          7036                 :           1479 :     checkPoint.wal_level = wal_level;
                               7037                 :                : 
  688                          7038         [ +  + ]:           1479 :     if (shutdown)
                               7039                 :                :     {
                               7040                 :            583 :         XLogRecPtr  curInsert = XLogBytePosToRecPtr(Insert->CurrBytePos);
                               7041                 :                : 
                               7042                 :                :         /*
                               7043                 :                :          * Compute new REDO record ptr = location of next XLOG record.
                               7044                 :                :          *
                               7045                 :                :          * Since this is a shutdown checkpoint, there can't be any concurrent
                               7046                 :                :          * WAL insertion.
                               7047                 :                :          */
                               7048         [ +  - ]:            583 :         freespace = INSERT_FREESPACE(curInsert);
                               7049         [ -  + ]:            583 :         if (freespace == 0)
                               7050                 :                :         {
  688 rhaas@postgresql.org     7051         [ #  # ]:UBC           0 :             if (XLogSegmentOffset(curInsert, wal_segment_size) == 0)
                               7052                 :              0 :                 curInsert += SizeOfXLogLongPHD;
                               7053                 :                :             else
                               7054                 :              0 :                 curInsert += SizeOfXLogShortPHD;
                               7055                 :                :         }
  688 rhaas@postgresql.org     7056                 :CBC         583 :         checkPoint.redo = curInsert;
                               7057                 :                : 
                               7058                 :                :         /*
                               7059                 :                :          * Here we update the shared RedoRecPtr for future XLogInsert calls;
                               7060                 :                :          * this must be done while holding all the insertion locks.
                               7061                 :                :          *
                               7062                 :                :          * Note: if we fail to complete the checkpoint, RedoRecPtr will be
                               7063                 :                :          * left pointing past where it really needs to point.  This is okay;
                               7064                 :                :          * the only consequence is that XLogInsert might back up whole buffers
                               7065                 :                :          * that it didn't really need to.  We can't postpone advancing
                               7066                 :                :          * RedoRecPtr because XLogInserts that happen while we are dumping
                               7067                 :                :          * buffers must assume that their buffer changes are not included in
                               7068                 :                :          * the checkpoint.
                               7069                 :                :          */
                               7070                 :            583 :         RedoRecPtr = XLogCtl->Insert.RedoRecPtr = checkPoint.redo;
                               7071                 :                :     }
                               7072                 :                : 
                               7073                 :                :     /*
                               7074                 :                :      * Now we can release the WAL insertion locks, allowing other xacts to
                               7075                 :                :      * proceed while we are flushing disk buffers.
                               7076                 :                :      */
 4187 heikki.linnakangas@i     7077                 :           1479 :     WALInsertLockRelease();
                               7078                 :                : 
                               7079                 :                :     /*
                               7080                 :                :      * If this is an online checkpoint, we have not yet determined the redo
                               7081                 :                :      * point. We do so now by inserting the special XLOG_CHECKPOINT_REDO
                               7082                 :                :      * record; the LSN at which it starts becomes the new redo pointer. We
                               7083                 :                :      * don't do this for a shutdown checkpoint, because in that case no WAL
                               7084                 :                :      * can be written between the redo point and the insertion of the
                               7085                 :                :      * checkpoint record itself, so the checkpoint record itself serves to
                               7086                 :                :      * mark the redo point.
                               7087                 :                :      */
  688 rhaas@postgresql.org     7088         [ +  + ]:           1479 :     if (!shutdown)
                               7089                 :                :     {
                               7090                 :                :         /* Include WAL level in record for WAL summarizer's benefit. */
                               7091                 :            896 :         XLogBeginInsert();
  207 peter@eisentraut.org     7092                 :            896 :         XLogRegisterData(&wal_level, sizeof(wal_level));
  688 rhaas@postgresql.org     7093                 :            896 :         (void) XLogInsert(RM_XLOG_ID, XLOG_CHECKPOINT_REDO);
                               7094                 :                : 
                               7095                 :                :         /*
                               7096                 :                :          * XLogInsertRecord will have updated XLogCtl->Insert.RedoRecPtr in
                               7097                 :                :          * shared memory and RedoRecPtr in backend-local memory, but we need
                               7098                 :                :          * to copy that into the record that will be inserted when the
                               7099                 :                :          * checkpoint is complete.
                               7100                 :                :          */
                               7101                 :            896 :         checkPoint.redo = RedoRecPtr;
                               7102                 :                :     }
                               7103                 :                : 
                               7104                 :                :     /* Update the info_lck-protected copy of RedoRecPtr as well */
 4002 andres@anarazel.de       7105         [ -  + ]:           1479 :     SpinLockAcquire(&XLogCtl->info_lck);
                               7106                 :           1479 :     XLogCtl->RedoRecPtr = checkPoint.redo;
                               7107                 :           1479 :     SpinLockRelease(&XLogCtl->info_lck);
                               7108                 :                : 
                               7109                 :                :     /*
                               7110                 :                :      * If enabled, log checkpoint start.  We postpone this until now so as not
                               7111                 :                :      * to log anything if we decided to skip the checkpoint.
                               7112                 :                :      */
 6643 tgl@sss.pgh.pa.us        7113         [ +  + ]:           1479 :     if (log_checkpoints)
 6044 heikki.linnakangas@i     7114                 :           1187 :         LogCheckpointStart(flags, false);
                               7115                 :                : 
                               7116                 :                :     /* Update the process title */
 1727 michael@paquier.xyz      7117                 :           1479 :     update_checkpoint_display(flags, false, false);
                               7118                 :                : 
                               7119                 :                :     TRACE_POSTGRESQL_CHECKPOINT_START(flags);
                               7120                 :                : 
                               7121                 :                :     /*
                               7122                 :                :      * Get the other info we need for the checkpoint record.
                               7123                 :                :      *
                               7124                 :                :      * We don't need to save oldestClogXid in the checkpoint, it only matters
                               7125                 :                :      * for the short period in which clog is being truncated, and if we crash
                               7126                 :                :      * during that we'll redo the clog truncation and fix up oldestClogXid
                               7127                 :                :      * there.
                               7128                 :                :      */
 4173 heikki.linnakangas@i     7129                 :           1479 :     LWLockAcquire(XidGenLock, LW_SHARED);
  638                          7130                 :           1479 :     checkPoint.nextXid = TransamVariables->nextXid;
                               7131                 :           1479 :     checkPoint.oldestXid = TransamVariables->oldestXid;
                               7132                 :           1479 :     checkPoint.oldestXidDB = TransamVariables->oldestXidDB;
 4173                          7133                 :           1479 :     LWLockRelease(XidGenLock);
                               7134                 :                : 
 3930 alvherre@alvh.no-ip.     7135                 :           1479 :     LWLockAcquire(CommitTsLock, LW_SHARED);
  638 heikki.linnakangas@i     7136                 :           1479 :     checkPoint.oldestCommitTsXid = TransamVariables->oldestCommitTsXid;
                               7137                 :           1479 :     checkPoint.newestCommitTsXid = TransamVariables->newestCommitTsXid;
 3930 alvherre@alvh.no-ip.     7138                 :           1479 :     LWLockRelease(CommitTsLock);
                               7139                 :                : 
 4173 heikki.linnakangas@i     7140                 :           1479 :     LWLockAcquire(OidGenLock, LW_SHARED);
  638                          7141                 :           1479 :     checkPoint.nextOid = TransamVariables->nextOid;
 4173                          7142         [ +  + ]:           1479 :     if (!shutdown)
  638                          7143                 :            896 :         checkPoint.nextOid += TransamVariables->oidCount;
 4173                          7144                 :           1479 :     LWLockRelease(OidGenLock);
                               7145                 :                : 
                               7146                 :           1479 :     MultiXactGetCheckptMulti(shutdown,
                               7147                 :                :                              &checkPoint.nextMulti,
                               7148                 :                :                              &checkPoint.nextMultiOffset,
                               7149                 :                :                              &checkPoint.oldestMulti,
                               7150                 :                :                              &checkPoint.oldestMultiDB);
                               7151                 :                : 
                               7152                 :                :     /*
                               7153                 :                :      * Having constructed the checkpoint record, ensure all shmem disk buffers
                               7154                 :                :      * and commit-log buffers are flushed to disk.
                               7155                 :                :      *
                               7156                 :                :      * This I/O could fail for various reasons.  If so, we will fail to
                               7157                 :                :      * complete the checkpoint, but there is no reason to force a system
                               7158                 :                :      * panic. Accordingly, exit critical section while doing it.
                               7159                 :                :      */
                               7160         [ -  + ]:           1479 :     END_CRIT_SECTION();
                               7161                 :                : 
                               7162                 :                :     /*
                               7163                 :                :      * In some cases there are groups of actions that must all occur on one
                               7164                 :                :      * side or the other of a checkpoint record. Before flushing the
                               7165                 :                :      * checkpoint record we must explicitly wait for any backend currently
                               7166                 :                :      * performing those groups of actions.
                               7167                 :                :      *
                               7168                 :                :      * One example is end of transaction, so we must wait for any transactions
                               7169                 :                :      * that are currently in commit critical sections.  If an xact inserted
                               7170                 :                :      * its commit record into XLOG just before the REDO point, then a crash
                               7171                 :                :      * restart from the REDO point would not replay that record, which means
                               7172                 :                :      * that our flushing had better include the xact's update of pg_xact.  So
                               7173                 :                :      * we wait till he's out of his commit critical section before proceeding.
                               7174                 :                :      * See notes in RecordTransactionCommit().
                               7175                 :                :      *
                               7176                 :                :      * Because we've already released the insertion locks, this test is a bit
                               7177                 :                :      * fuzzy: it is possible that we will wait for xacts we didn't really need
                               7178                 :                :      * to wait for.  But the delay should be short and it seems better to make
                               7179                 :                :      * checkpoint take a bit longer than to hold off insertions longer than
                               7180                 :                :      * necessary. (In fact, the whole reason we have this issue is that xact.c
                               7181                 :                :      * does commit record XLOG insertion and clog update as two separate steps
                               7182                 :                :      * protected by different locks, but again that seems best on grounds of
                               7183                 :                :      * minimizing lock contention.)
                               7184                 :                :      *
                               7185                 :                :      * A transaction that has not yet set delayChkptFlags when we look cannot
                               7186                 :                :      * be at risk, since it has not inserted its commit record yet; and one
                               7187                 :                :      * that's already cleared it is not at risk either, since it's done fixing
                               7188                 :                :      * clog and we will correctly flush the update below.  So we cannot miss
                               7189                 :                :      * any xacts we need to wait for.
                               7190                 :                :      */
 1262 rhaas@postgresql.org     7191                 :           1479 :     vxids = GetVirtualXIDsDelayingChkpt(&nvxids, DELAY_CHKPT_START);
 4660 simon@2ndQuadrant.co     7192         [ +  + ]:           1479 :     if (nvxids > 0)
                               7193                 :                :     {
                               7194                 :                :         do
                               7195                 :                :         {
                               7196                 :                :             /*
                               7197                 :                :              * Keep absorbing fsync requests while we wait. There could even
                               7198                 :                :              * be a deadlock if we don't, if the process that prevents the
                               7199                 :                :              * checkpoint is trying to add a request to the queue.
                               7200                 :                :              */
  442 heikki.linnakangas@i     7201                 :             28 :             AbsorbSyncRequests();
                               7202                 :                : 
  694 tmunro@postgresql.or     7203                 :             28 :             pgstat_report_wait_start(WAIT_EVENT_CHECKPOINT_DELAY_START);
 6505 bruce@momjian.us         7204                 :             28 :             pg_usleep(10000L);  /* wait for 10 msec */
  694 tmunro@postgresql.or     7205                 :             28 :             pgstat_report_wait_end();
 1262 rhaas@postgresql.org     7206         [ +  + ]:             28 :         } while (HaveVirtualXIDsDelayingChkpt(vxids, nvxids,
                               7207                 :                :                                               DELAY_CHKPT_START));
                               7208                 :                :     }
 4660 simon@2ndQuadrant.co     7209                 :           1479 :     pfree(vxids);
                               7210                 :                : 
 6645 tgl@sss.pgh.pa.us        7211                 :           1479 :     CheckPointGuts(checkPoint.redo, flags);
                               7212                 :                : 
 1262 rhaas@postgresql.org     7213                 :           1479 :     vxids = GetVirtualXIDsDelayingChkpt(&nvxids, DELAY_CHKPT_COMPLETE);
                               7214         [ -  + ]:           1479 :     if (nvxids > 0)
                               7215                 :                :     {
                               7216                 :                :         do
                               7217                 :                :         {
  442 heikki.linnakangas@i     7218                 :UBC           0 :             AbsorbSyncRequests();
                               7219                 :                : 
  694 tmunro@postgresql.or     7220                 :              0 :             pgstat_report_wait_start(WAIT_EVENT_CHECKPOINT_DELAY_COMPLETE);
 1262 rhaas@postgresql.org     7221                 :              0 :             pg_usleep(10000L);  /* wait for 10 msec */
  694 tmunro@postgresql.or     7222                 :              0 :             pgstat_report_wait_end();
 1262 rhaas@postgresql.org     7223         [ #  # ]:              0 :         } while (HaveVirtualXIDsDelayingChkpt(vxids, nvxids,
                               7224                 :                :                                               DELAY_CHKPT_COMPLETE));
                               7225                 :                :     }
 1262 rhaas@postgresql.org     7226                 :CBC        1479 :     pfree(vxids);
                               7227                 :                : 
                               7228                 :                :     /*
                               7229                 :                :      * Take a snapshot of running transactions and write this to WAL. This
                               7230                 :                :      * allows us to reconstruct the state of running transactions during
                               7231                 :                :      * archive recovery, if required. Skip, if this info disabled.
                               7232                 :                :      *
                               7233                 :                :      * If we are shutting down, or Startup process is completing crash
                               7234                 :                :      * recovery we don't need to write running xact data.
                               7235                 :                :      */
 5740 simon@2ndQuadrant.co     7236   [ +  +  +  + ]:           1479 :     if (!shutdown && XLogStandbyInfoActive())
 4661 tgl@sss.pgh.pa.us        7237                 :            850 :         LogStandbySnapshot();
                               7238                 :                : 
 8155                          7239                 :           1479 :     START_CRIT_SECTION();
                               7240                 :                : 
                               7241                 :                :     /*
                               7242                 :                :      * Now insert the checkpoint record into XLOG.
                               7243                 :                :      */
 3943 heikki.linnakangas@i     7244                 :           1479 :     XLogBeginInsert();
  207 peter@eisentraut.org     7245                 :           1479 :     XLogRegisterData(&checkPoint, sizeof(checkPoint));
 8943 tgl@sss.pgh.pa.us        7246         [ +  + ]:           1479 :     recptr = XLogInsert(RM_XLOG_ID,
                               7247                 :                :                         shutdown ? XLOG_CHECKPOINT_SHUTDOWN :
                               7248                 :                :                         XLOG_CHECKPOINT_ONLINE);
                               7249                 :                : 
                               7250                 :           1479 :     XLogFlush(recptr);
                               7251                 :                : 
                               7252                 :                :     /*
                               7253                 :                :      * We mustn't write any new WAL after a shutdown checkpoint, or it will be
                               7254                 :                :      * overwritten at next startup.  No-one should even try, this just allows
                               7255                 :                :      * sanity-checking.  In the case of an end-of-recovery checkpoint, we want
                               7256                 :                :      * to just temporarily disable writing until the system has exited
                               7257                 :                :      * recovery.
                               7258                 :                :      */
 5916                          7259         [ +  + ]:           1479 :     if (shutdown)
                               7260                 :                :     {
                               7261         [ +  + ]:            583 :         if (flags & CHECKPOINT_END_OF_RECOVERY)
 1412 rhaas@postgresql.org     7262                 :             28 :             LocalXLogInsertAllowed = oldXLogAllowed;
                               7263                 :                :         else
 5671 bruce@momjian.us         7264                 :            555 :             LocalXLogInsertAllowed = 0; /* never again write WAL */
                               7265                 :                :     }
                               7266                 :                : 
                               7267                 :                :     /*
                               7268                 :                :      * We now have ProcLastRecPtr = start of actual checkpoint record, recptr
                               7269                 :                :      * = end of actual checkpoint record.
                               7270                 :                :      */
 4635 alvherre@alvh.no-ip.     7271   [ +  +  -  + ]:           1479 :     if (shutdown && checkPoint.redo != ProcLastRecPtr)
 8083 tgl@sss.pgh.pa.us        7272         [ #  # ]:UBC           0 :         ereport(PANIC,
                               7273                 :                :                 (errmsg("concurrent write-ahead log activity while database system is shutting down")));
                               7274                 :                : 
                               7275                 :                :     /*
                               7276                 :                :      * Remember the prior checkpoint's redo ptr for
                               7277                 :                :      * UpdateCheckPointDistanceEstimate()
                               7278                 :                :      */
 3848 heikki.linnakangas@i     7279                 :CBC        1479 :     PriorRedoPtr = ControlFile->checkPointCopy.redo;
                               7280                 :                : 
                               7281                 :                :     /*
                               7282                 :                :      * Update the control file.
                               7283                 :                :      */
 8743 tgl@sss.pgh.pa.us        7284                 :           1479 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 9476 vadim4o@yahoo.com        7285         [ +  + ]:           1479 :     if (shutdown)
                               7286                 :            583 :         ControlFile->state = DB_SHUTDOWNED;
 8943 tgl@sss.pgh.pa.us        7287                 :           1479 :     ControlFile->checkPoint = ProcLastRecPtr;
                               7288                 :           1479 :     ControlFile->checkPointCopy = checkPoint;
                               7289                 :                :     /* crash recovery should always recover to the end of WAL */
 4636 alvherre@alvh.no-ip.     7290                 :           1479 :     ControlFile->minRecoveryPoint = InvalidXLogRecPtr;
 4659 heikki.linnakangas@i     7291                 :           1479 :     ControlFile->minRecoveryPointTLI = 0;
                               7292                 :                : 
                               7293                 :                :     /*
                               7294                 :                :      * Persist unloggedLSN value. It's reset on crash recovery, so this goes
                               7295                 :                :      * unused on non-shutdown checkpoints, but seems useful to store it always
                               7296                 :                :      * for debugging purposes.
                               7297                 :                :      */
  555 nathan@postgresql.or     7298                 :           1479 :     ControlFile->unloggedLSN = pg_atomic_read_membarrier_u64(&XLogCtl->unloggedLSN);
                               7299                 :                : 
 9476 vadim4o@yahoo.com        7300                 :           1479 :     UpdateControlFile();
 8743 tgl@sss.pgh.pa.us        7301                 :           1479 :     LWLockRelease(ControlFileLock);
                               7302                 :                : 
                               7303                 :                :     /*
                               7304                 :                :      * We are now done with critical updates; no need for system panic if we
                               7305                 :                :      * have trouble while fooling with old log segments.
                               7306                 :                :      */
 8155                          7307         [ -  + ]:           1479 :     END_CRIT_SECTION();
                               7308                 :                : 
                               7309                 :                :     /*
                               7310                 :                :      * WAL summaries end when the next XLOG_CHECKPOINT_REDO or
                               7311                 :                :      * XLOG_CHECKPOINT_SHUTDOWN record is reached. This is the first point
                               7312                 :                :      * where (a) we're not inside of a critical section and (b) we can be
                               7313                 :                :      * certain that the relevant record has been flushed to disk, which must
                               7314                 :                :      * happen before it can be summarized.
                               7315                 :                :      *
                               7316                 :                :      * If this is a shutdown checkpoint, then this happens reasonably
                               7317                 :                :      * promptly: we've only just inserted and flushed the
                               7318                 :                :      * XLOG_CHECKPOINT_SHUTDOWN record. If this is not a shutdown checkpoint,
                               7319                 :                :      * then this might not be very prompt at all: the XLOG_CHECKPOINT_REDO
                               7320                 :                :      * record was written before we began flushing data to disk, and that
                               7321                 :                :      * could be many minutes ago at this point. However, we don't XLogFlush()
                               7322                 :                :      * after inserting that record, so we're not guaranteed that it's on disk
                               7323                 :                :      * until after the above call that flushes the XLOG_CHECKPOINT_ONLINE
                               7324                 :                :      * record.
                               7325                 :                :      */
  309 heikki.linnakangas@i     7326                 :           1479 :     WakeupWalSummarizer();
                               7327                 :                : 
                               7328                 :                :     /*
                               7329                 :                :      * Let smgr do post-checkpoint cleanup (eg, deleting old files).
                               7330                 :                :      */
 2347 tmunro@postgresql.or     7331                 :           1479 :     SyncPostCheckpoint();
                               7332                 :                : 
                               7333                 :                :     /*
                               7334                 :                :      * Update the average distance between checkpoints if the prior checkpoint
                               7335                 :                :      * exists.
                               7336                 :                :      */
 3848 heikki.linnakangas@i     7337         [ +  - ]:           1479 :     if (PriorRedoPtr != InvalidXLogRecPtr)
                               7338                 :           1479 :         UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr);
                               7339                 :                : 
                               7340                 :                : #ifdef USE_INJECTION_POINTS
                               7341                 :                :     INJECTION_POINT("checkpoint-before-old-wal-removal", NULL);
                               7342                 :                : #endif
                               7343                 :                : 
                               7344                 :                :     /*
                               7345                 :                :      * Delete old log files, those no longer needed for last checkpoint to
                               7346                 :                :      * prevent the disk holding the xlog from growing full.
                               7347                 :                :      */
 2601 michael@paquier.xyz      7348                 :           1479 :     XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
                               7349                 :           1479 :     KeepLogSeg(recptr, &_logSegNo);
  199 akapila@postgresql.o     7350         [ +  + ]:           1479 :     if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT,
                               7351                 :                :                                            _logSegNo, InvalidOid,
                               7352                 :                :                                            InvalidTransactionId))
                               7353                 :                :     {
                               7354                 :                :         /*
                               7355                 :                :          * Some slots have been invalidated; recalculate the old-segment
                               7356                 :                :          * horizon, starting again from RedoRecPtr.
                               7357                 :                :          */
 1513 alvherre@alvh.no-ip.     7358                 :              3 :         XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
                               7359                 :              3 :         KeepLogSeg(recptr, &_logSegNo);
                               7360                 :                :     }
 2601 michael@paquier.xyz      7361                 :           1479 :     _logSegNo--;
 1401 rhaas@postgresql.org     7362                 :           1479 :     RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr,
                               7363                 :                :                        checkPoint.ThisTimeLineID);
                               7364                 :                : 
                               7365                 :                :     /*
                               7366                 :                :      * Make more log segments if needed.  (Do this after recycling old log
                               7367                 :                :      * segments, since that may supply some of the needed files.)
                               7368                 :                :      */
 8943 tgl@sss.pgh.pa.us        7369         [ +  + ]:           1479 :     if (!shutdown)
 1401 rhaas@postgresql.org     7370                 :            896 :         PreallocXlogFiles(recptr, checkPoint.ThisTimeLineID);
                               7371                 :                : 
                               7372                 :                :     /*
                               7373                 :                :      * Truncate pg_subtrans if possible.  We can throw away all data before
                               7374                 :                :      * the oldest XMIN of any running transaction.  No future transaction will
                               7375                 :                :      * attempt to reference any pg_subtrans entry older than that (see Asserts
                               7376                 :                :      * in subtrans.c).  During recovery, though, we mustn't do this because
                               7377                 :                :      * StartupSUBTRANS hasn't been called yet.
                               7378                 :                :      */
 5916 tgl@sss.pgh.pa.us        7379         [ +  + ]:           1479 :     if (!RecoveryInProgress())
 1851 andres@anarazel.de       7380                 :           1451 :         TruncateSUBTRANS(GetOldestTransactionIdConsideredRunning());
                               7381                 :                : 
                               7382                 :                :     /* Real work is done; log and update stats. */
 4902 rhaas@postgresql.org     7383                 :           1479 :     LogCheckpointEnd(false);
                               7384                 :                : 
                               7385                 :                :     /* Reset the process title */
 1727 michael@paquier.xyz      7386                 :           1479 :     update_checkpoint_display(flags, false, true);
                               7387                 :                : 
                               7388                 :                :     TRACE_POSTGRESQL_CHECKPOINT_DONE(CheckpointStats.ckpt_bufs_written,
                               7389                 :                :                                      NBuffers,
                               7390                 :                :                                      CheckpointStats.ckpt_segs_added,
                               7391                 :                :                                      CheckpointStats.ckpt_segs_removed,
                               7392                 :                :                                      CheckpointStats.ckpt_segs_recycled);
                               7393                 :                : 
  341 fujii@postgresql.org     7394                 :           1479 :     return true;
                               7395                 :                : }
                               7396                 :                : 
                               7397                 :                : /*
                               7398                 :                :  * Mark the end of recovery in WAL though without running a full checkpoint.
                               7399                 :                :  * We can expect that a restartpoint is likely to be in progress as we
                               7400                 :                :  * do this, though we are unwilling to wait for it to complete.
                               7401                 :                :  *
                               7402                 :                :  * CreateRestartPoint() allows for the case where recovery may end before
                               7403                 :                :  * the restartpoint completes so there is no concern of concurrent behaviour.
                               7404                 :                :  */
                               7405                 :                : static void
 4603 simon@2ndQuadrant.co     7406                 :             40 : CreateEndOfRecoveryRecord(void)
                               7407                 :                : {
                               7408                 :                :     xl_end_of_recovery xlrec;
                               7409                 :                :     XLogRecPtr  recptr;
                               7410                 :                : 
                               7411                 :                :     /* sanity check */
                               7412         [ -  + ]:             40 :     if (!RecoveryInProgress())
 4603 simon@2ndQuadrant.co     7413         [ #  # ]:UBC           0 :         elog(ERROR, "can only be used to end recovery");
                               7414                 :                : 
 3914 heikki.linnakangas@i     7415                 :CBC          40 :     xlrec.end_time = GetCurrentTimestamp();
  415 rhaas@postgresql.org     7416                 :             40 :     xlrec.wal_level = wal_level;
                               7417                 :                : 
 4187 heikki.linnakangas@i     7418                 :             40 :     WALInsertLockAcquireExclusive();
 1396 rhaas@postgresql.org     7419                 :             40 :     xlrec.ThisTimeLineID = XLogCtl->InsertTimeLineID;
 4590 heikki.linnakangas@i     7420                 :             40 :     xlrec.PrevTimeLineID = XLogCtl->PrevTimeLineID;
 4187                          7421                 :             40 :     WALInsertLockRelease();
                               7422                 :                : 
 4603 simon@2ndQuadrant.co     7423                 :             40 :     START_CRIT_SECTION();
                               7424                 :                : 
 3943 heikki.linnakangas@i     7425                 :             40 :     XLogBeginInsert();
  207 peter@eisentraut.org     7426                 :             40 :     XLogRegisterData(&xlrec, sizeof(xl_end_of_recovery));
 3943 heikki.linnakangas@i     7427                 :             40 :     recptr = XLogInsert(RM_XLOG_ID, XLOG_END_OF_RECOVERY);
                               7428                 :                : 
 4601 simon@2ndQuadrant.co     7429                 :             40 :     XLogFlush(recptr);
                               7430                 :                : 
                               7431                 :                :     /*
                               7432                 :                :      * Update the control file so that crash recovery can follow the timeline
                               7433                 :                :      * changes to this point.
                               7434                 :                :      */
                               7435                 :             40 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               7436                 :             40 :     ControlFile->minRecoveryPoint = recptr;
 1401 rhaas@postgresql.org     7437                 :             40 :     ControlFile->minRecoveryPointTLI = xlrec.ThisTimeLineID;
 4601 simon@2ndQuadrant.co     7438                 :             40 :     UpdateControlFile();
                               7439                 :             40 :     LWLockRelease(ControlFileLock);
                               7440                 :                : 
 4603                          7441         [ -  + ]:             40 :     END_CRIT_SECTION();
                               7442                 :             40 : }
                               7443                 :                : 
                               7444                 :                : /*
                               7445                 :                :  * Write an OVERWRITE_CONTRECORD message.
                               7446                 :                :  *
                               7447                 :                :  * When on WAL replay we expect a continuation record at the start of a page
                               7448                 :                :  * that is not there, recovery ends and WAL writing resumes at that point.
                               7449                 :                :  * But it's wrong to resume writing new WAL back at the start of the record
                               7450                 :                :  * that was broken, because downstream consumers of that WAL (physical
                               7451                 :                :  * replicas) are not prepared to "rewind".  So the first action after
                               7452                 :                :  * finishing replay of all valid WAL must be to write a record of this type
                               7453                 :                :  * at the point where the contrecord was missing; to support xlogreader
                               7454                 :                :  * detecting the special case, XLP_FIRST_IS_OVERWRITE_CONTRECORD is also added
                               7455                 :                :  * to the page header where the record occurs.  xlogreader has an ad-hoc
                               7456                 :                :  * mechanism to report metadata about the broken record, which is what we
                               7457                 :                :  * use here.
                               7458                 :                :  *
                               7459                 :                :  * At replay time, XLP_FIRST_IS_OVERWRITE_CONTRECORD instructs xlogreader to
                               7460                 :                :  * skip the record it was reading, and pass back the LSN of the skipped
                               7461                 :                :  * record, so that its caller can verify (on "replay" of that record) that the
                               7462                 :                :  * XLOG_OVERWRITE_CONTRECORD matches what was effectively overwritten.
                               7463                 :                :  *
                               7464                 :                :  * 'aborted_lsn' is the beginning position of the record that was incomplete.
                               7465                 :                :  * It is included in the WAL record.  'pagePtr' and 'newTLI' point to the
                               7466                 :                :  * beginning of the XLOG page where the record is to be inserted.  They must
                               7467                 :                :  * match the current WAL insert position, they're passed here just so that we
                               7468                 :                :  * can verify that.
                               7469                 :                :  */
                               7470                 :                : static XLogRecPtr
 1298 heikki.linnakangas@i     7471                 :             11 : CreateOverwriteContrecordRecord(XLogRecPtr aborted_lsn, XLogRecPtr pagePtr,
                               7472                 :                :                                 TimeLineID newTLI)
                               7473                 :                : {
                               7474                 :                :     xl_overwrite_contrecord xlrec;
                               7475                 :                :     XLogRecPtr  recptr;
                               7476                 :                :     XLogPageHeader pagehdr;
                               7477                 :                :     XLogRecPtr  startPos;
                               7478                 :                : 
                               7479                 :                :     /* sanity checks */
 1438 alvherre@alvh.no-ip.     7480         [ -  + ]:             11 :     if (!RecoveryInProgress())
 1438 alvherre@alvh.no-ip.     7481         [ #  # ]:UBC           0 :         elog(ERROR, "can only be used at end of recovery");
 1298 heikki.linnakangas@i     7482         [ -  + ]:CBC          11 :     if (pagePtr % XLOG_BLCKSZ != 0)
   61 alvherre@kurilemu.de     7483         [ #  # ]:UNC           0 :         elog(ERROR, "invalid position for missing continuation record %X/%08X",
                               7484                 :                :              LSN_FORMAT_ARGS(pagePtr));
                               7485                 :                : 
                               7486                 :                :     /* The current WAL insert position should be right after the page header */
 1298 heikki.linnakangas@i     7487                 :CBC          11 :     startPos = pagePtr;
                               7488         [ +  + ]:             11 :     if (XLogSegmentOffset(startPos, wal_segment_size) == 0)
                               7489                 :              1 :         startPos += SizeOfXLogLongPHD;
                               7490                 :                :     else
                               7491                 :             10 :         startPos += SizeOfXLogShortPHD;
                               7492                 :             11 :     recptr = GetXLogInsertRecPtr();
                               7493         [ -  + ]:             11 :     if (recptr != startPos)
   61 alvherre@kurilemu.de     7494         [ #  # ]:UNC           0 :         elog(ERROR, "invalid WAL insert position %X/%08X for OVERWRITE_CONTRECORD",
                               7495                 :                :              LSN_FORMAT_ARGS(recptr));
                               7496                 :                : 
 1438 alvherre@alvh.no-ip.     7497                 :CBC          11 :     START_CRIT_SECTION();
                               7498                 :                : 
                               7499                 :                :     /*
                               7500                 :                :      * Initialize the XLOG page header (by GetXLogBuffer), and set the
                               7501                 :                :      * XLP_FIRST_IS_OVERWRITE_CONTRECORD flag.
                               7502                 :                :      *
                               7503                 :                :      * No other backend is allowed to write WAL yet, so acquiring the WAL
                               7504                 :                :      * insertion lock is just pro forma.
                               7505                 :                :      */
 1298 heikki.linnakangas@i     7506                 :             11 :     WALInsertLockAcquire();
                               7507                 :             11 :     pagehdr = (XLogPageHeader) GetXLogBuffer(pagePtr, newTLI);
                               7508                 :             11 :     pagehdr->xlp_info |= XLP_FIRST_IS_OVERWRITE_CONTRECORD;
                               7509                 :             11 :     WALInsertLockRelease();
                               7510                 :                : 
                               7511                 :                :     /*
                               7512                 :                :      * Insert the XLOG_OVERWRITE_CONTRECORD record as the first record on the
                               7513                 :                :      * page.  We know it becomes the first record, because no other backend is
                               7514                 :                :      * allowed to write WAL yet.
                               7515                 :                :      */
 1438 alvherre@alvh.no-ip.     7516                 :             11 :     XLogBeginInsert();
 1298 heikki.linnakangas@i     7517                 :             11 :     xlrec.overwritten_lsn = aborted_lsn;
                               7518                 :             11 :     xlrec.overwrite_time = GetCurrentTimestamp();
  207 peter@eisentraut.org     7519                 :             11 :     XLogRegisterData(&xlrec, sizeof(xl_overwrite_contrecord));
 1438 alvherre@alvh.no-ip.     7520                 :             11 :     recptr = XLogInsert(RM_XLOG_ID, XLOG_OVERWRITE_CONTRECORD);
                               7521                 :                : 
                               7522                 :                :     /* check that the record was inserted to the right place */
 1298 heikki.linnakangas@i     7523         [ -  + ]:             11 :     if (ProcLastRecPtr != startPos)
   61 alvherre@kurilemu.de     7524         [ #  # ]:UNC           0 :         elog(ERROR, "OVERWRITE_CONTRECORD was inserted to unexpected position %X/%08X",
                               7525                 :                :              LSN_FORMAT_ARGS(ProcLastRecPtr));
                               7526                 :                : 
 1438 alvherre@alvh.no-ip.     7527                 :CBC          11 :     XLogFlush(recptr);
                               7528                 :                : 
                               7529         [ -  + ]:             11 :     END_CRIT_SECTION();
                               7530                 :                : 
                               7531                 :             11 :     return recptr;
                               7532                 :                : }
                               7533                 :                : 
                               7534                 :                : /*
                               7535                 :                :  * Flush all data in shared memory to disk, and fsync
                               7536                 :                :  *
                               7537                 :                :  * This is the common code shared between regular checkpoints and
                               7538                 :                :  * recovery restartpoints.
                               7539                 :                :  */
                               7540                 :                : static void
 6645 tgl@sss.pgh.pa.us        7541                 :           1677 : CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
                               7542                 :                : {
 5690                          7543                 :           1677 :     CheckPointRelationMap();
  723 akapila@postgresql.o     7544                 :           1677 :     CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 4205 rhaas@postgresql.org     7545                 :           1677 :     CheckPointSnapBuild();
                               7546                 :           1677 :     CheckPointLogicalRewriteHeap();
 3783 andres@anarazel.de       7547                 :           1677 :     CheckPointReplicationOrigin();
                               7548                 :                : 
                               7549                 :                :     /* Write out all dirty data in SLRUs and the main buffer pool */
                               7550                 :                :     TRACE_POSTGRESQL_BUFFER_CHECKPOINT_START(flags);
 1807 tmunro@postgresql.or     7551                 :           1677 :     CheckpointStats.ckpt_write_t = GetCurrentTimestamp();
                               7552                 :           1677 :     CheckPointCLOG();
                               7553                 :           1677 :     CheckPointCommitTs();
                               7554                 :           1677 :     CheckPointSUBTRANS();
                               7555                 :           1677 :     CheckPointMultiXact();
                               7556                 :           1677 :     CheckPointPredicate();
                               7557                 :           1677 :     CheckPointBuffers(flags);
                               7558                 :                : 
                               7559                 :                :     /* Perform all queued up fsyncs */
                               7560                 :                :     TRACE_POSTGRESQL_BUFFER_CHECKPOINT_SYNC_START();
                               7561                 :           1677 :     CheckpointStats.ckpt_sync_t = GetCurrentTimestamp();
                               7562                 :           1677 :     ProcessSyncRequests();
                               7563                 :           1677 :     CheckpointStats.ckpt_sync_end_t = GetCurrentTimestamp();
                               7564                 :                :     TRACE_POSTGRESQL_BUFFER_CHECKPOINT_DONE();
                               7565                 :                : 
                               7566                 :                :     /* We deliberately delay 2PC checkpointing as long as possible */
 6970 tgl@sss.pgh.pa.us        7567                 :           1677 :     CheckPointTwoPhase(checkPointRedo);
                               7568                 :           1677 : }
                               7569                 :                : 
                               7570                 :                : /*
                               7571                 :                :  * Save a checkpoint for recovery restart if appropriate
                               7572                 :                :  *
                               7573                 :                :  * This function is called each time a checkpoint record is read from XLOG.
                               7574                 :                :  * It must determine whether the checkpoint represents a safe restartpoint or
                               7575                 :                :  * not.  If so, the checkpoint record is stashed in shared memory so that
                               7576                 :                :  * CreateRestartPoint can consult it.  (Note that the latter function is
                               7577                 :                :  * executed by the checkpointer, while this one will be executed by the
                               7578                 :                :  * startup process.)
                               7579                 :                :  */
                               7580                 :                : static void
 1382 rhaas@postgresql.org     7581                 :            694 : RecoveryRestartPoint(const CheckPoint *checkPoint, XLogReaderState *record)
                               7582                 :                : {
                               7583                 :                :     /*
                               7584                 :                :      * Also refrain from creating a restartpoint if we have seen any
                               7585                 :                :      * references to non-existent pages. Restarting recovery from the
                               7586                 :                :      * restartpoint would not see the references, so we would lose the
                               7587                 :                :      * cross-check that the pages belonged to a relation that was dropped
                               7588                 :                :      * later.
                               7589                 :                :      */
 5027 heikki.linnakangas@i     7590         [ -  + ]:            694 :     if (XLogHaveInvalidPages())
                               7591                 :                :     {
  635 michael@paquier.xyz      7592         [ #  # ]:UBC           0 :         elog(DEBUG2,
                               7593                 :                :              "could not record restart point at %X/%08X because there are unresolved references to invalid pages",
                               7594                 :                :              LSN_FORMAT_ARGS(checkPoint->redo));
 5027 heikki.linnakangas@i     7595                 :              0 :         return;
                               7596                 :                :     }
                               7597                 :                : 
                               7598                 :                :     /*
                               7599                 :                :      * Copy the checkpoint record to shared memory, so that checkpointer can
                               7600                 :                :      * work out the next time it wants to perform a restartpoint.
                               7601                 :                :      */
 4002 andres@anarazel.de       7602         [ -  + ]:CBC         694 :     SpinLockAcquire(&XLogCtl->info_lck);
 1382 rhaas@postgresql.org     7603                 :            694 :     XLogCtl->lastCheckPointRecPtr = record->ReadRecPtr;
                               7604                 :            694 :     XLogCtl->lastCheckPointEndPtr = record->EndRecPtr;
 4002 andres@anarazel.de       7605                 :            694 :     XLogCtl->lastCheckPoint = *checkPoint;
                               7606                 :            694 :     SpinLockRelease(&XLogCtl->info_lck);
                               7607                 :                : }
                               7608                 :                : 
                               7609                 :                : /*
                               7610                 :                :  * Establish a restartpoint if possible.
                               7611                 :                :  *
                               7612                 :                :  * This is similar to CreateCheckPoint, but is used during WAL recovery
                               7613                 :                :  * to establish a point from which recovery can roll forward without
                               7614                 :                :  * replaying the entire recovery log.
                               7615                 :                :  *
                               7616                 :                :  * Returns true if a new restartpoint was established. We can only establish
                               7617                 :                :  * a restartpoint if we have replayed a safe checkpoint record since last
                               7618                 :                :  * restartpoint.
                               7619                 :                :  */
                               7620                 :                : bool
 6044 heikki.linnakangas@i     7621                 :            588 : CreateRestartPoint(int flags)
                               7622                 :                : {
                               7623                 :                :     XLogRecPtr  lastCheckPointRecPtr;
                               7624                 :                :     XLogRecPtr  lastCheckPointEndPtr;
                               7625                 :                :     CheckPoint  lastCheckPoint;
                               7626                 :                :     XLogRecPtr  PriorRedoPtr;
                               7627                 :                :     XLogRecPtr  receivePtr;
                               7628                 :                :     XLogRecPtr  replayPtr;
                               7629                 :                :     TimeLineID  replayTLI;
                               7630                 :                :     XLogRecPtr  endptr;
                               7631                 :                :     XLogSegNo   _logSegNo;
                               7632                 :                :     TimestampTz xtime;
                               7633                 :                : 
                               7634                 :                :     /* Concurrent checkpoint/restartpoint cannot happen */
 1216 michael@paquier.xyz      7635   [ +  -  -  + ]:            588 :     Assert(!IsUnderPostmaster || MyBackendType == B_CHECKPOINTER);
                               7636                 :                : 
                               7637                 :                :     /* Get a local copy of the last safe checkpoint record. */
 4002 andres@anarazel.de       7638         [ -  + ]:            588 :     SpinLockAcquire(&XLogCtl->info_lck);
                               7639                 :            588 :     lastCheckPointRecPtr = XLogCtl->lastCheckPointRecPtr;
 3236 rhaas@postgresql.org     7640                 :            588 :     lastCheckPointEndPtr = XLogCtl->lastCheckPointEndPtr;
 4002 andres@anarazel.de       7641                 :            588 :     lastCheckPoint = XLogCtl->lastCheckPoint;
                               7642                 :            588 :     SpinLockRelease(&XLogCtl->info_lck);
                               7643                 :                : 
                               7644                 :                :     /*
                               7645                 :                :      * Check that we're still in recovery mode. It's ok if we exit recovery
                               7646                 :                :      * mode after this check, the restart point is valid anyway.
                               7647                 :                :      */
 6044 heikki.linnakangas@i     7648         [ -  + ]:            588 :     if (!RecoveryInProgress())
                               7649                 :                :     {
 6044 heikki.linnakangas@i     7650         [ #  # ]:UBC           0 :         ereport(DEBUG2,
                               7651                 :                :                 (errmsg_internal("skipping restartpoint, recovery has already ended")));
                               7652                 :              0 :         return false;
                               7653                 :                :     }
                               7654                 :                : 
                               7655                 :                :     /*
                               7656                 :                :      * If the last checkpoint record we've replayed is already our last
                               7657                 :                :      * restartpoint, we can't perform a new restart point. We still update
                               7658                 :                :      * minRecoveryPoint in that case, so that if this is a shutdown restart
                               7659                 :                :      * point, we won't start up earlier than before. That's not strictly
                               7660                 :                :      * necessary, but when hot standby is enabled, it would be rather weird if
                               7661                 :                :      * the database opened up for read-only connections at a point-in-time
                               7662                 :                :      * before the last shutdown. Such time travel is still possible in case of
                               7663                 :                :      * immediate shutdown, though.
                               7664                 :                :      *
                               7665                 :                :      * We don't explicitly advance minRecoveryPoint when we do create a
                               7666                 :                :      * restartpoint. It's assumed that flushing the buffers will do that as a
                               7667                 :                :      * side-effect.
                               7668                 :                :      */
 6044 heikki.linnakangas@i     7669         [ +  + ]:CBC         588 :     if (XLogRecPtrIsInvalid(lastCheckPointRecPtr) ||
 4635 alvherre@alvh.no-ip.     7670         [ +  + ]:            256 :         lastCheckPoint.redo <= ControlFile->checkPointCopy.redo)
                               7671                 :                :     {
 6044 heikki.linnakangas@i     7672         [ -  + ]:            390 :         ereport(DEBUG2,
                               7673                 :                :                 errmsg_internal("skipping restartpoint, already performed at %X/%08X",
                               7674                 :                :                                 LSN_FORMAT_ARGS(lastCheckPoint.redo)));
                               7675                 :                : 
                               7676                 :            390 :         UpdateMinRecoveryPoint(InvalidXLogRecPtr, true);
 5574 rhaas@postgresql.org     7677         [ +  + ]:            390 :         if (flags & CHECKPOINT_IS_SHUTDOWN)
                               7678                 :                :         {
                               7679                 :             30 :             LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               7680                 :             30 :             ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY;
                               7681                 :             30 :             UpdateControlFile();
                               7682                 :             30 :             LWLockRelease(ControlFileLock);
                               7683                 :                :         }
 6044 heikki.linnakangas@i     7684                 :            390 :         return false;
                               7685                 :                :     }
                               7686                 :                : 
                               7687                 :                :     /*
                               7688                 :                :      * Update the shared RedoRecPtr so that the startup process can calculate
                               7689                 :                :      * the number of segments replayed since last restartpoint, and request a
                               7690                 :                :      * restartpoint if it exceeds CheckPointSegments.
                               7691                 :                :      *
                               7692                 :                :      * Like in CreateCheckPoint(), hold off insertions to update it, although
                               7693                 :                :      * during recovery this is just pro forma, because no WAL insertions are
                               7694                 :                :      * happening.
                               7695                 :                :      */
 4187                          7696                 :            198 :     WALInsertLockAcquireExclusive();
 3848                          7697                 :            198 :     RedoRecPtr = XLogCtl->Insert.RedoRecPtr = lastCheckPoint.redo;
 4187                          7698                 :            198 :     WALInsertLockRelease();
                               7699                 :                : 
                               7700                 :                :     /* Also update the info_lck-protected copy */
 4002 andres@anarazel.de       7701         [ -  + ]:            198 :     SpinLockAcquire(&XLogCtl->info_lck);
                               7702                 :            198 :     XLogCtl->RedoRecPtr = lastCheckPoint.redo;
                               7703                 :            198 :     SpinLockRelease(&XLogCtl->info_lck);
                               7704                 :                : 
                               7705                 :                :     /*
                               7706                 :                :      * Prepare to accumulate statistics.
                               7707                 :                :      *
                               7708                 :                :      * Note: because it is possible for log_checkpoints to change while a
                               7709                 :                :      * checkpoint proceeds, we always accumulate stats, even if
                               7710                 :                :      * log_checkpoints is currently off.
                               7711                 :                :      */
 5330 rhaas@postgresql.org     7712   [ +  -  +  -  :           2178 :     MemSet(&CheckpointStats, 0, sizeof(CheckpointStats));
                                     +  -  +  -  +  
                                                 + ]
                               7713                 :            198 :     CheckpointStats.ckpt_start_t = GetCurrentTimestamp();
                               7714                 :                : 
                               7715         [ +  - ]:            198 :     if (log_checkpoints)
 6044 heikki.linnakangas@i     7716                 :            198 :         LogCheckpointStart(flags, true);
                               7717                 :                : 
                               7718                 :                :     /* Update the process title */
 1727 michael@paquier.xyz      7719                 :            198 :     update_checkpoint_display(flags, true, false);
                               7720                 :                : 
 6044 heikki.linnakangas@i     7721                 :            198 :     CheckPointGuts(lastCheckPoint.redo, flags);
                               7722                 :                : 
                               7723                 :                :     /*
                               7724                 :                :      * This location needs to be after CheckPointGuts() to ensure that some
                               7725                 :                :      * work has already happened during this checkpoint.
                               7726                 :                :      */
                               7727                 :                :     INJECTION_POINT("create-restart-point", NULL);
                               7728                 :                : 
                               7729                 :                :     /*
                               7730                 :                :      * Remember the prior checkpoint's redo ptr for
                               7731                 :                :      * UpdateCheckPointDistanceEstimate()
                               7732                 :                :      */
 3848                          7733                 :            198 :     PriorRedoPtr = ControlFile->checkPointCopy.redo;
                               7734                 :                : 
                               7735                 :                :     /*
                               7736                 :                :      * Update pg_control, using current time.  Check that it still shows an
                               7737                 :                :      * older checkpoint, else do nothing; this is a quick hack to make sure
                               7738                 :                :      * nothing really bad happens if somehow we get here after the
                               7739                 :                :      * end-of-recovery checkpoint.
                               7740                 :                :      */
 6044                          7741                 :            198 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 1216 michael@paquier.xyz      7742         [ +  - ]:            198 :     if (ControlFile->checkPointCopy.redo < lastCheckPoint.redo)
                               7743                 :                :     {
                               7744                 :                :         /*
                               7745                 :                :          * Update the checkpoint information.  We do this even if the cluster
                               7746                 :                :          * does not show DB_IN_ARCHIVE_RECOVERY to match with the set of WAL
                               7747                 :                :          * segments recycled below.
                               7748                 :                :          */
 5916 tgl@sss.pgh.pa.us        7749                 :            198 :         ControlFile->checkPoint = lastCheckPointRecPtr;
                               7750                 :            198 :         ControlFile->checkPointCopy = lastCheckPoint;
                               7751                 :                : 
                               7752                 :                :         /*
                               7753                 :                :          * Ensure minRecoveryPoint is past the checkpoint record and update it
                               7754                 :                :          * if the control file still shows DB_IN_ARCHIVE_RECOVERY.  Normally,
                               7755                 :                :          * this will have happened already while writing out dirty buffers,
                               7756                 :                :          * but not necessarily - e.g. because no buffers were dirtied.  We do
                               7757                 :                :          * this because a backup performed in recovery uses minRecoveryPoint
                               7758                 :                :          * to determine which WAL files must be included in the backup, and
                               7759                 :                :          * the file (or files) containing the checkpoint record must be
                               7760                 :                :          * included, at a minimum.  Note that for an ordinary restart of
                               7761                 :                :          * recovery there's no value in having the minimum recovery point any
                               7762                 :                :          * earlier than this anyway, because redo will begin just after the
                               7763                 :                :          * checkpoint record.
                               7764                 :                :          */
 1216 michael@paquier.xyz      7765         [ +  - ]:            198 :         if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY)
                               7766                 :                :         {
                               7767         [ +  + ]:            198 :             if (ControlFile->minRecoveryPoint < lastCheckPointEndPtr)
                               7768                 :                :             {
                               7769                 :             19 :                 ControlFile->minRecoveryPoint = lastCheckPointEndPtr;
                               7770                 :             19 :                 ControlFile->minRecoveryPointTLI = lastCheckPoint.ThisTimeLineID;
                               7771                 :                : 
                               7772                 :                :                 /* update local copy */
                               7773                 :             19 :                 LocalMinRecoveryPoint = ControlFile->minRecoveryPoint;
                               7774                 :             19 :                 LocalMinRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
                               7775                 :                :             }
                               7776         [ +  + ]:            198 :             if (flags & CHECKPOINT_IS_SHUTDOWN)
                               7777                 :             22 :                 ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY;
                               7778                 :                :         }
 5916 tgl@sss.pgh.pa.us        7779                 :            198 :         UpdateControlFile();
                               7780                 :                :     }
 6044 heikki.linnakangas@i     7781                 :            198 :     LWLockRelease(ControlFileLock);
                               7782                 :                : 
                               7783                 :                :     /*
                               7784                 :                :      * Update the average distance between checkpoints/restartpoints if the
                               7785                 :                :      * prior checkpoint exists.
                               7786                 :                :      */
 3848                          7787         [ +  - ]:            198 :     if (PriorRedoPtr != InvalidXLogRecPtr)
                               7788                 :            198 :         UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr);
                               7789                 :                : 
                               7790                 :                :     /*
                               7791                 :                :      * Delete old log files, those no longer needed for last restartpoint to
                               7792                 :                :      * prevent the disk holding the xlog from growing full.
                               7793                 :                :      */
 2601 michael@paquier.xyz      7794                 :            198 :     XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
                               7795                 :                : 
                               7796                 :                :     /*
                               7797                 :                :      * Retreat _logSegNo using the current end of xlog replayed or received,
                               7798                 :                :      * whichever is later.
                               7799                 :                :      */
 1977 tmunro@postgresql.or     7800                 :            198 :     receivePtr = GetWalRcvFlushRecPtr(NULL, NULL);
 2601 michael@paquier.xyz      7801                 :            198 :     replayPtr = GetXLogReplayRecPtr(&replayTLI);
                               7802                 :            198 :     endptr = (receivePtr < replayPtr) ? replayPtr : receivePtr;
                               7803                 :            198 :     KeepLogSeg(endptr, &_logSegNo);
  199 akapila@postgresql.o     7804         [ +  + ]:            198 :     if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT,
                               7805                 :                :                                            _logSegNo, InvalidOid,
                               7806                 :                :                                            InvalidTransactionId))
                               7807                 :                :     {
                               7808                 :                :         /*
                               7809                 :                :          * Some slots have been invalidated; recalculate the old-segment
                               7810                 :                :          * horizon, starting again from RedoRecPtr.
                               7811                 :                :          */
 1513 alvherre@alvh.no-ip.     7812                 :              1 :         XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
                               7813                 :              1 :         KeepLogSeg(endptr, &_logSegNo);
                               7814                 :                :     }
 2601 michael@paquier.xyz      7815                 :            198 :     _logSegNo--;
                               7816                 :                : 
                               7817                 :                :     /*
                               7818                 :                :      * Try to recycle segments on a useful timeline. If we've been promoted
                               7819                 :                :      * since the beginning of this restartpoint, use the new timeline chosen
                               7820                 :                :      * at end of recovery.  If we're still in recovery, use the timeline we're
                               7821                 :                :      * currently replaying.
                               7822                 :                :      *
                               7823                 :                :      * There is no guarantee that the WAL segments will be useful on the
                               7824                 :                :      * current timeline; if recovery proceeds to a new timeline right after
                               7825                 :                :      * this, the pre-allocated WAL segments on this timeline will not be used,
                               7826                 :                :      * and will go wasted until recycled on the next restartpoint. We'll live
                               7827                 :                :      * with that.
                               7828                 :                :      */
 1401 rhaas@postgresql.org     7829         [ -  + ]:            198 :     if (!RecoveryInProgress())
 1396 rhaas@postgresql.org     7830                 :UBC           0 :         replayTLI = XLogCtl->InsertTimeLineID;
                               7831                 :                : 
 1401 rhaas@postgresql.org     7832                 :CBC         198 :     RemoveOldXlogFiles(_logSegNo, RedoRecPtr, endptr, replayTLI);
                               7833                 :                : 
                               7834                 :                :     /*
                               7835                 :                :      * Make more log segments if needed.  (Do this after recycling old log
                               7836                 :                :      * segments, since that may supply some of the needed files.)
                               7837                 :                :      */
                               7838                 :            198 :     PreallocXlogFiles(endptr, replayTLI);
                               7839                 :                : 
                               7840                 :                :     /*
                               7841                 :                :      * Truncate pg_subtrans if possible.  We can throw away all data before
                               7842                 :                :      * the oldest XMIN of any running transaction.  No future transaction will
                               7843                 :                :      * attempt to reference any pg_subtrans entry older than that (see Asserts
                               7844                 :                :      * in subtrans.c).  When hot standby is disabled, though, we mustn't do
                               7845                 :                :      * this because StartupSUBTRANS hasn't been called yet.
                               7846                 :                :      */
 5486 simon@2ndQuadrant.co     7847         [ +  - ]:            198 :     if (EnableHotStandby)
 1851 andres@anarazel.de       7848                 :            198 :         TruncateSUBTRANS(GetOldestTransactionIdConsideredRunning());
                               7849                 :                : 
                               7850                 :                :     /* Real work is done; log and update stats. */
 4902 rhaas@postgresql.org     7851                 :            198 :     LogCheckpointEnd(true);
                               7852                 :                : 
                               7853                 :                :     /* Reset the process title */
 1727 michael@paquier.xyz      7854                 :            198 :     update_checkpoint_display(flags, true, true);
                               7855                 :                : 
 5544 tgl@sss.pgh.pa.us        7856                 :            198 :     xtime = GetLatestXTime();
 6044 heikki.linnakangas@i     7857   [ +  -  +  -  :            198 :     ereport((log_checkpoints ? LOG : DEBUG2),
                                              +  + ]
                               7858                 :                :             errmsg("recovery restart point at %X/%08X",
                               7859                 :                :                    LSN_FORMAT_ARGS(lastCheckPoint.redo)),
                               7860                 :                :             xtime ? errdetail("Last completed transaction was at log time %s.",
                               7861                 :                :                               timestamptz_to_str(xtime)) : 0);
                               7862                 :                : 
                               7863                 :                :     /*
                               7864                 :                :      * Finally, execute archive_cleanup_command, if any.
                               7865                 :                :      */
 2477 peter_e@gmx.net          7866   [ +  -  -  + ]:            198 :     if (archiveCleanupCommand && strcmp(archiveCleanupCommand, "") != 0)
  943 michael@paquier.xyz      7867                 :UBC           0 :         ExecuteRecoveryCommand(archiveCleanupCommand,
                               7868                 :                :                                "archive_cleanup_command",
                               7869                 :                :                                false,
                               7870                 :                :                                WAIT_EVENT_ARCHIVE_CLEANUP_COMMAND);
                               7871                 :                : 
 6044 heikki.linnakangas@i     7872                 :CBC         198 :     return true;
                               7873                 :                : }
                               7874                 :                : 
                               7875                 :                : /*
                               7876                 :                :  * Report availability of WAL for the given target LSN
                               7877                 :                :  *      (typically a slot's restart_lsn)
                               7878                 :                :  *
                               7879                 :                :  * Returns one of the following enum values:
                               7880                 :                :  *
                               7881                 :                :  * * WALAVAIL_RESERVED means targetLSN is available and it is in the range of
                               7882                 :                :  *   max_wal_size.
                               7883                 :                :  *
                               7884                 :                :  * * WALAVAIL_EXTENDED means it is still available by preserving extra
                               7885                 :                :  *   segments beyond max_wal_size. If max_slot_wal_keep_size is smaller
                               7886                 :                :  *   than max_wal_size, this state is not returned.
                               7887                 :                :  *
                               7888                 :                :  * * WALAVAIL_UNRESERVED means it is being lost and the next checkpoint will
                               7889                 :                :  *   remove reserved segments. The walsender using this slot may return to the
                               7890                 :                :  *   above.
                               7891                 :                :  *
                               7892                 :                :  * * WALAVAIL_REMOVED means it has been removed. A replication stream on
                               7893                 :                :  *   a slot with this LSN cannot continue.  (Any associated walsender
                               7894                 :                :  *   processes should have been terminated already.)
                               7895                 :                :  *
                               7896                 :                :  * * WALAVAIL_INVALID_LSN means the slot hasn't been set to reserve WAL.
                               7897                 :                :  */
                               7898                 :                : WALAvailability
 1978 alvherre@alvh.no-ip.     7899                 :            381 : GetWALAvailability(XLogRecPtr targetLSN)
                               7900                 :                : {
                               7901                 :                :     XLogRecPtr  currpos;        /* current write LSN */
                               7902                 :                :     XLogSegNo   currSeg;        /* segid of currpos */
                               7903                 :                :     XLogSegNo   targetSeg;      /* segid of targetLSN */
                               7904                 :                :     XLogSegNo   oldestSeg;      /* actual oldest segid */
                               7905                 :                :     XLogSegNo   oldestSegMaxWalSize;    /* oldest segid kept by max_wal_size */
                               7906                 :                :     XLogSegNo   oldestSlotSeg;  /* oldest segid kept by slot */
                               7907                 :                :     uint64      keepSegs;
                               7908                 :                : 
                               7909                 :                :     /*
                               7910                 :                :      * slot does not reserve WAL. Either deactivated, or has never been active
                               7911                 :                :      */
                               7912         [ +  + ]:            381 :     if (XLogRecPtrIsInvalid(targetLSN))
                               7913                 :             20 :         return WALAVAIL_INVALID_LSN;
                               7914                 :                : 
                               7915                 :                :     /*
                               7916                 :                :      * Calculate the oldest segment currently reserved by all slots,
                               7917                 :                :      * considering wal_keep_size and max_slot_wal_keep_size.  Initialize
                               7918                 :                :      * oldestSlotSeg to the current segment.
                               7919                 :                :      */
 1881                          7920                 :            361 :     currpos = GetXLogWriteRecPtr();
                               7921                 :            361 :     XLByteToSeg(currpos, oldestSlotSeg, wal_segment_size);
 1978                          7922                 :            361 :     KeepLogSeg(currpos, &oldestSlotSeg);
                               7923                 :                : 
                               7924                 :                :     /*
                               7925                 :                :      * Find the oldest extant segment file. We get 1 until checkpoint removes
                               7926                 :                :      * the first WAL segment file since startup, which causes the status being
                               7927                 :                :      * wrong under certain abnormal conditions but that doesn't actually harm.
                               7928                 :                :      */
                               7929                 :            361 :     oldestSeg = XLogGetLastRemovedSegno() + 1;
                               7930                 :                : 
                               7931                 :                :     /* calculate oldest segment by max_wal_size */
                               7932                 :            361 :     XLByteToSeg(currpos, currSeg, wal_segment_size);
 1900                          7933                 :            361 :     keepSegs = ConvertToXSegs(max_wal_size_mb, wal_segment_size) + 1;
                               7934                 :                : 
 1978                          7935         [ +  + ]:            361 :     if (currSeg > keepSegs)
                               7936                 :              8 :         oldestSegMaxWalSize = currSeg - keepSegs;
                               7937                 :                :     else
                               7938                 :            353 :         oldestSegMaxWalSize = 1;
                               7939                 :                : 
                               7940                 :                :     /* the segment we care about */
 1881                          7941                 :            361 :     XLByteToSeg(targetLSN, targetSeg, wal_segment_size);
                               7942                 :                : 
                               7943                 :                :     /*
                               7944                 :                :      * No point in returning reserved or extended status values if the
                               7945                 :                :      * targetSeg is known to be lost.
                               7946                 :                :      */
 1900                          7947         [ +  + ]:            361 :     if (targetSeg >= oldestSlotSeg)
                               7948                 :                :     {
                               7949                 :                :         /* show "reserved" when targetSeg is within max_wal_size */
                               7950         [ +  + ]:            360 :         if (targetSeg >= oldestSegMaxWalSize)
 1978                          7951                 :            358 :             return WALAVAIL_RESERVED;
                               7952                 :                : 
                               7953                 :                :         /* being retained by slots exceeding max_wal_size */
 1900                          7954                 :              2 :         return WALAVAIL_EXTENDED;
                               7955                 :                :     }
                               7956                 :                : 
                               7957                 :                :     /* WAL segments are no longer retained but haven't been removed yet */
                               7958         [ +  - ]:              1 :     if (targetSeg >= oldestSeg)
                               7959                 :              1 :         return WALAVAIL_UNRESERVED;
                               7960                 :                : 
                               7961                 :                :     /* Definitely lost */
 1978 alvherre@alvh.no-ip.     7962                 :UBC           0 :     return WALAVAIL_REMOVED;
                               7963                 :                : }
                               7964                 :                : 
                               7965                 :                : 
                               7966                 :                : /*
                               7967                 :                :  * Retreat *logSegNo to the last segment that we need to retain because of
                               7968                 :                :  * either wal_keep_size or replication slots.
                               7969                 :                :  *
                               7970                 :                :  * This is calculated by subtracting wal_keep_size from the given xlog
                               7971                 :                :  * location, recptr and by making sure that that result is below the
                               7972                 :                :  * requirement of replication slots.  For the latter criterion we do consider
                               7973                 :                :  * the effects of max_slot_wal_keep_size: reserve at most that much space back
                               7974                 :                :  * from recptr.
                               7975                 :                :  *
                               7976                 :                :  * Note about replication slots: if this function calculates a value
                               7977                 :                :  * that's further ahead than what slots need reserved, then affected
                               7978                 :                :  * slots need to be invalidated and this function invoked again.
                               7979                 :                :  * XXX it might be a good idea to rewrite this function so that
                               7980                 :                :  * invalidation is optionally done here, instead.
                               7981                 :                :  */
                               7982                 :                : static void
 4822 heikki.linnakangas@i     7983                 :CBC        2042 : KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
                               7984                 :                : {
                               7985                 :                :     XLogSegNo   currSegNo;
                               7986                 :                :     XLogSegNo   segno;
                               7987                 :                :     XLogRecPtr  keep;
                               7988                 :                : 
 1978 alvherre@alvh.no-ip.     7989                 :           2042 :     XLByteToSeg(recptr, currSegNo, wal_segment_size);
                               7990                 :           2042 :     segno = currSegNo;
                               7991                 :                : 
                               7992                 :                :     /* Calculate how many segments are kept by slots. */
                               7993                 :           2042 :     keep = XLogGetReplicationSlotMinimumLSN();
  863 nathan@postgresql.or     7994   [ +  +  +  + ]:           2042 :     if (keep != InvalidXLogRecPtr && keep < recptr)
                               7995                 :                :     {
 1978 alvherre@alvh.no-ip.     7996                 :            621 :         XLByteToSeg(keep, segno, wal_segment_size);
                               7997                 :                : 
                               7998                 :                :         /*
                               7999                 :                :          * Account for max_slot_wal_keep_size to avoid keeping more than
                               8000                 :                :          * configured.  However, don't do that during a binary upgrade: if
                               8001                 :                :          * slots were to be invalidated because of this, it would not be
                               8002                 :                :          * possible to preserve logical ones during the upgrade.
                               8003                 :                :          */
   57 akapila@postgresql.o     8004   [ +  +  +  - ]:            621 :         if (max_slot_wal_keep_size_mb >= 0 && !IsBinaryUpgrade)
                               8005                 :                :         {
                               8006                 :                :             uint64      slot_keep_segs;
                               8007                 :                : 
 1978 alvherre@alvh.no-ip.     8008                 :             21 :             slot_keep_segs =
                               8009                 :             21 :                 ConvertToXSegs(max_slot_wal_keep_size_mb, wal_segment_size);
                               8010                 :                : 
                               8011         [ +  + ]:             21 :             if (currSegNo - segno > slot_keep_segs)
                               8012                 :              5 :                 segno = currSegNo - slot_keep_segs;
                               8013                 :                :         }
                               8014                 :                :     }
                               8015                 :                : 
                               8016                 :                :     /*
                               8017                 :                :      * If WAL summarization is in use, don't remove WAL that has yet to be
                               8018                 :                :      * summarized.
                               8019                 :                :      */
  438 rhaas@postgresql.org     8020                 :           2042 :     keep = GetOldestUnsummarizedLSN(NULL, NULL);
  626                          8021         [ +  + ]:           2042 :     if (keep != InvalidXLogRecPtr)
                               8022                 :                :     {
                               8023                 :                :         XLogSegNo   unsummarized_segno;
                               8024                 :                : 
                               8025                 :              2 :         XLByteToSeg(keep, unsummarized_segno, wal_segment_size);
                               8026         [ +  - ]:              2 :         if (unsummarized_segno < segno)
                               8027                 :              2 :             segno = unsummarized_segno;
                               8028                 :                :     }
                               8029                 :                : 
                               8030                 :                :     /* but, keep at least wal_keep_size if that's set */
 1874 fujii@postgresql.org     8031         [ +  + ]:           2042 :     if (wal_keep_size_mb > 0)
                               8032                 :                :     {
                               8033                 :                :         uint64      keep_segs;
                               8034                 :                : 
                               8035                 :             69 :         keep_segs = ConvertToXSegs(wal_keep_size_mb, wal_segment_size);
                               8036         [ +  - ]:             69 :         if (currSegNo - segno < keep_segs)
                               8037                 :                :         {
                               8038                 :                :             /* avoid underflow, don't go below 1 */
                               8039         [ +  + ]:             69 :             if (currSegNo <= keep_segs)
                               8040                 :             65 :                 segno = 1;
                               8041                 :                :             else
                               8042                 :              4 :                 segno = currSegNo - keep_segs;
                               8043                 :                :         }
                               8044                 :                :     }
                               8045                 :                : 
                               8046                 :                :     /* don't delete WAL segments newer than the calculated segment */
 1881 alvherre@alvh.no-ip.     8047         [ +  + ]:           2042 :     if (segno < *logSegNo)
 4822 heikki.linnakangas@i     8048                 :            354 :         *logSegNo = segno;
 5163 simon@2ndQuadrant.co     8049                 :           2042 : }
                               8050                 :                : 
                               8051                 :                : /*
                               8052                 :                :  * Write a NEXTOID log record
                               8053                 :                :  */
                               8054                 :                : void
 9073 vadim4o@yahoo.com        8055                 :            582 : XLogPutNextOid(Oid nextOid)
                               8056                 :                : {
 3943 heikki.linnakangas@i     8057                 :            582 :     XLogBeginInsert();
  207 peter@eisentraut.org     8058                 :            582 :     XLogRegisterData(&nextOid, sizeof(Oid));
 3943 heikki.linnakangas@i     8059                 :            582 :     (void) XLogInsert(RM_XLOG_ID, XLOG_NEXTOID);
                               8060                 :                : 
                               8061                 :                :     /*
                               8062                 :                :      * We need not flush the NEXTOID record immediately, because any of the
                               8063                 :                :      * just-allocated OIDs could only reach disk as part of a tuple insert or
                               8064                 :                :      * update that would have its own XLOG record that must follow the NEXTOID
                               8065                 :                :      * record.  Therefore, the standard buffer LSN interlock applied to those
                               8066                 :                :      * records will ensure no such OID reaches disk before the NEXTOID record
                               8067                 :                :      * does.
                               8068                 :                :      *
                               8069                 :                :      * Note, however, that the above statement only covers state "within" the
                               8070                 :                :      * database.  When we use a generated OID as a file or directory name, we
                               8071                 :                :      * are in a sense violating the basic WAL rule, because that filesystem
                               8072                 :                :      * change may reach disk before the NEXTOID WAL record does.  The impact
                               8073                 :                :      * of this is that if a database crash occurs immediately afterward, we
                               8074                 :                :      * might after restart re-generate the same OID and find that it conflicts
                               8075                 :                :      * with the leftover file or directory.  But since for safety's sake we
                               8076                 :                :      * always loop until finding a nonconflicting filename, this poses no real
                               8077                 :                :      * problem in practice. See pgsql-hackers discussion 27-Sep-2006.
                               8078                 :                :      */
 7436 tgl@sss.pgh.pa.us        8079                 :            582 : }
                               8080                 :                : 
                               8081                 :                : /*
                               8082                 :                :  * Write an XLOG SWITCH record.
                               8083                 :                :  *
                               8084                 :                :  * Here we just blindly issue an XLogInsert request for the record.
                               8085                 :                :  * All the magic happens inside XLogInsert.
                               8086                 :                :  *
                               8087                 :                :  * The return value is either the end+1 address of the switch record,
                               8088                 :                :  * or the end+1 address of the prior segment if we did not need to
                               8089                 :                :  * write a switch record because we are already at segment start.
                               8090                 :                :  */
                               8091                 :                : XLogRecPtr
 3180 andres@anarazel.de       8092                 :            707 : RequestXLogSwitch(bool mark_unimportant)
                               8093                 :                : {
                               8094                 :                :     XLogRecPtr  RecPtr;
                               8095                 :                : 
                               8096                 :                :     /* XLOG SWITCH has no data */
 3943 heikki.linnakangas@i     8097                 :            707 :     XLogBeginInsert();
                               8098                 :                : 
 3180 andres@anarazel.de       8099         [ -  + ]:            707 :     if (mark_unimportant)
 3180 andres@anarazel.de       8100                 :UBC           0 :         XLogSetRecordFlags(XLOG_MARK_UNIMPORTANT);
 3943 heikki.linnakangas@i     8101                 :CBC         707 :     RecPtr = XLogInsert(RM_XLOG_ID, XLOG_SWITCH);
                               8102                 :                : 
 6971 tgl@sss.pgh.pa.us        8103                 :            707 :     return RecPtr;
                               8104                 :                : }
                               8105                 :                : 
                               8106                 :                : /*
                               8107                 :                :  * Write a RESTORE POINT record
                               8108                 :                :  */
                               8109                 :                : XLogRecPtr
 5324 simon@2ndQuadrant.co     8110                 :              3 : XLogRestorePoint(const char *rpName)
                               8111                 :                : {
                               8112                 :                :     XLogRecPtr  RecPtr;
                               8113                 :                :     xl_restore_point xlrec;
                               8114                 :                : 
                               8115                 :              3 :     xlrec.rp_time = GetCurrentTimestamp();
 4219 tgl@sss.pgh.pa.us        8116                 :              3 :     strlcpy(xlrec.rp_name, rpName, MAXFNAMELEN);
                               8117                 :                : 
 3943 heikki.linnakangas@i     8118                 :              3 :     XLogBeginInsert();
  207 peter@eisentraut.org     8119                 :              3 :     XLogRegisterData(&xlrec, sizeof(xl_restore_point));
                               8120                 :                : 
 3943 heikki.linnakangas@i     8121                 :              3 :     RecPtr = XLogInsert(RM_XLOG_ID, XLOG_RESTORE_POINT);
                               8122                 :                : 
 5308 rhaas@postgresql.org     8123         [ +  - ]:              3 :     ereport(LOG,
                               8124                 :                :             errmsg("restore point \"%s\" created at %X/%08X",
                               8125                 :                :                    rpName, LSN_FORMAT_ARGS(RecPtr)));
                               8126                 :                : 
 5324 simon@2ndQuadrant.co     8127                 :              3 :     return RecPtr;
                               8128                 :                : }
                               8129                 :                : 
                               8130                 :                : /*
                               8131                 :                :  * Check if any of the GUC parameters that are critical for hot standby
                               8132                 :                :  * have changed, and update the value in pg_control file if necessary.
                               8133                 :                :  */
                               8134                 :                : static void
 5610 heikki.linnakangas@i     8135                 :            832 : XLogReportParameters(void)
                               8136                 :                : {
                               8137         [ +  + ]:            832 :     if (wal_level != ControlFile->wal_level ||
 4266 rhaas@postgresql.org     8138         [ +  + ]:            616 :         wal_log_hints != ControlFile->wal_log_hints ||
 5610 heikki.linnakangas@i     8139         [ +  + ]:            534 :         MaxConnections != ControlFile->MaxConnections ||
 4447 rhaas@postgresql.org     8140         [ +  + ]:            533 :         max_worker_processes != ControlFile->max_worker_processes ||
 2398 michael@paquier.xyz      8141         [ +  + ]:            532 :         max_wal_senders != ControlFile->max_wal_senders ||
 5610 heikki.linnakangas@i     8142         [ +  + ]:            512 :         max_prepared_xacts != ControlFile->max_prepared_xacts ||
 3930 alvherre@alvh.no-ip.     8143         [ +  - ]:            426 :         max_locks_per_xact != ControlFile->max_locks_per_xact ||
                               8144         [ +  + ]:            426 :         track_commit_timestamp != ControlFile->track_commit_timestamp)
                               8145                 :                :     {
                               8146                 :                :         /*
                               8147                 :                :          * The change in number of backend slots doesn't need to be WAL-logged
                               8148                 :                :          * if archiving is not enabled, as you can't start archive recovery
                               8149                 :                :          * with wal_level=minimal anyway. We don't really care about the
                               8150                 :                :          * values in pg_control either if wal_level=minimal, but seems better
                               8151                 :                :          * to keep them up-to-date to avoid confusion.
                               8152                 :                :          */
 5610 heikki.linnakangas@i     8153   [ +  +  +  + ]:            418 :         if (wal_level != ControlFile->wal_level || XLogIsNeeded())
                               8154                 :                :         {
                               8155                 :                :             xl_parameter_change xlrec;
                               8156                 :                :             XLogRecPtr  recptr;
                               8157                 :                : 
                               8158                 :            398 :             xlrec.MaxConnections = MaxConnections;
 4447 rhaas@postgresql.org     8159                 :            398 :             xlrec.max_worker_processes = max_worker_processes;
 2398 michael@paquier.xyz      8160                 :            398 :             xlrec.max_wal_senders = max_wal_senders;
 5610 heikki.linnakangas@i     8161                 :            398 :             xlrec.max_prepared_xacts = max_prepared_xacts;
                               8162                 :            398 :             xlrec.max_locks_per_xact = max_locks_per_xact;
                               8163                 :            398 :             xlrec.wal_level = wal_level;
 4266 rhaas@postgresql.org     8164                 :            398 :             xlrec.wal_log_hints = wal_log_hints;
 3930 alvherre@alvh.no-ip.     8165                 :            398 :             xlrec.track_commit_timestamp = track_commit_timestamp;
                               8166                 :                : 
 3943 heikki.linnakangas@i     8167                 :            398 :             XLogBeginInsert();
  207 peter@eisentraut.org     8168                 :            398 :             XLogRegisterData(&xlrec, sizeof(xlrec));
                               8169                 :                : 
 3943 heikki.linnakangas@i     8170                 :            398 :             recptr = XLogInsert(RM_XLOG_ID, XLOG_PARAMETER_CHANGE);
 4182 fujii@postgresql.org     8171                 :            398 :             XLogFlush(recptr);
                               8172                 :                :         }
                               8173                 :                : 
 1916 tmunro@postgresql.or     8174                 :            418 :         LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               8175                 :                : 
 5610 heikki.linnakangas@i     8176                 :            418 :         ControlFile->MaxConnections = MaxConnections;
 4447 rhaas@postgresql.org     8177                 :            418 :         ControlFile->max_worker_processes = max_worker_processes;
 2398 michael@paquier.xyz      8178                 :            418 :         ControlFile->max_wal_senders = max_wal_senders;
 5610 heikki.linnakangas@i     8179                 :            418 :         ControlFile->max_prepared_xacts = max_prepared_xacts;
                               8180                 :            418 :         ControlFile->max_locks_per_xact = max_locks_per_xact;
                               8181                 :            418 :         ControlFile->wal_level = wal_level;
 4266 rhaas@postgresql.org     8182                 :            418 :         ControlFile->wal_log_hints = wal_log_hints;
 3930 alvherre@alvh.no-ip.     8183                 :            418 :         ControlFile->track_commit_timestamp = track_commit_timestamp;
 5610 heikki.linnakangas@i     8184                 :            418 :         UpdateControlFile();
                               8185                 :                : 
 1916 tmunro@postgresql.or     8186                 :            418 :         LWLockRelease(ControlFileLock);
                               8187                 :                :     }
 5708 heikki.linnakangas@i     8188                 :            832 : }
                               8189                 :                : 
                               8190                 :                : /*
                               8191                 :                :  * Update full_page_writes in shared memory, and write an
                               8192                 :                :  * XLOG_FPW_CHANGE record if necessary.
                               8193                 :                :  *
                               8194                 :                :  * Note: this function assumes there is no other process running
                               8195                 :                :  * concurrently that could update it.
                               8196                 :                :  */
                               8197                 :                : void
 4973 simon@2ndQuadrant.co     8198                 :           1380 : UpdateFullPageWrites(void)
                               8199                 :                : {
                               8200                 :           1380 :     XLogCtlInsert *Insert = &XLogCtl->Insert;
                               8201                 :                :     bool        recoveryInProgress;
                               8202                 :                : 
                               8203                 :                :     /*
                               8204                 :                :      * Do nothing if full_page_writes has not been changed.
                               8205                 :                :      *
                               8206                 :                :      * It's safe to check the shared full_page_writes without the lock,
                               8207                 :                :      * because we assume that there is no concurrently running process which
                               8208                 :                :      * can update it.
                               8209                 :                :      */
                               8210         [ +  + ]:           1380 :     if (fullPageWrites == Insert->fullPageWrites)
                               8211                 :            940 :         return;
                               8212                 :                : 
                               8213                 :                :     /*
                               8214                 :                :      * Perform this outside critical section so that the WAL insert
                               8215                 :                :      * initialization done by RecoveryInProgress() doesn't trigger an
                               8216                 :                :      * assertion failure.
                               8217                 :                :      */
 2535 akapila@postgresql.o     8218                 :            440 :     recoveryInProgress = RecoveryInProgress();
                               8219                 :                : 
 4932 heikki.linnakangas@i     8220                 :            440 :     START_CRIT_SECTION();
                               8221                 :                : 
                               8222                 :                :     /*
                               8223                 :                :      * It's always safe to take full page images, even when not strictly
                               8224                 :                :      * required, but not the other round. So if we're setting full_page_writes
                               8225                 :                :      * to true, first set it true and then write the WAL record. If we're
                               8226                 :                :      * setting it to false, first write the WAL record and then set the global
                               8227                 :                :      * flag.
                               8228                 :                :      */
                               8229         [ +  + ]:            440 :     if (fullPageWrites)
                               8230                 :                :     {
 4187                          8231                 :            430 :         WALInsertLockAcquireExclusive();
 4932                          8232                 :            430 :         Insert->fullPageWrites = true;
 4187                          8233                 :            430 :         WALInsertLockRelease();
                               8234                 :                :     }
                               8235                 :                : 
                               8236                 :                :     /*
                               8237                 :                :      * Write an XLOG_FPW_CHANGE record. This allows us to keep track of
                               8238                 :                :      * full_page_writes during archive recovery, if required.
                               8239                 :                :      */
 2535 akapila@postgresql.o     8240   [ +  +  -  + ]:            440 :     if (XLogStandbyInfoActive() && !recoveryInProgress)
                               8241                 :                :     {
 3943 heikki.linnakangas@i     8242                 :UBC           0 :         XLogBeginInsert();
  207 peter@eisentraut.org     8243                 :              0 :         XLogRegisterData(&fullPageWrites, sizeof(bool));
                               8244                 :                : 
 3943 heikki.linnakangas@i     8245                 :              0 :         XLogInsert(RM_XLOG_ID, XLOG_FPW_CHANGE);
                               8246                 :                :     }
                               8247                 :                : 
 4932 heikki.linnakangas@i     8248         [ +  + ]:CBC         440 :     if (!fullPageWrites)
                               8249                 :                :     {
 4187                          8250                 :             10 :         WALInsertLockAcquireExclusive();
 4932                          8251                 :             10 :         Insert->fullPageWrites = false;
 4187                          8252                 :             10 :         WALInsertLockRelease();
                               8253                 :                :     }
 4932                          8254         [ -  + ]:            440 :     END_CRIT_SECTION();
                               8255                 :                : }
                               8256                 :                : 
                               8257                 :                : /*
                               8258                 :                :  * XLOG resource manager's routines
                               8259                 :                :  *
                               8260                 :                :  * Definitions of info values are in include/catalog/pg_control.h, though
                               8261                 :                :  * not all record types are related to control file updates.
                               8262                 :                :  *
                               8263                 :                :  * NOTE: Some XLOG record types that are directly related to WAL recovery
                               8264                 :                :  * are handled in xlogrecovery_redo().
                               8265                 :                :  */
                               8266                 :                : void
 3943                          8267                 :          41435 : xlog_redo(XLogReaderState *record)
                               8268                 :                : {
                               8269                 :          41435 :     uint8       info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
                               8270                 :          41435 :     XLogRecPtr  lsn = record->EndRecPtr;
                               8271                 :                : 
                               8272                 :                :     /*
                               8273                 :                :      * In XLOG rmgr, backup blocks are only used by XLOG_FPI and
                               8274                 :                :      * XLOG_FPI_FOR_HINT records.
                               8275                 :                :      */
 3939                          8276   [ +  +  +  +  :          41435 :     Assert(info == XLOG_FPI || info == XLOG_FPI_FOR_HINT ||
                                              -  + ]
                               8277                 :                :            !XLogRecHasAnyBlockRefs(record));
                               8278                 :                : 
 8938 tgl@sss.pgh.pa.us        8279         [ +  + ]:          41435 :     if (info == XLOG_NEXTOID)
                               8280                 :                :     {
                               8281                 :                :         Oid         nextOid;
                               8282                 :                : 
                               8283                 :                :         /*
                               8284                 :                :          * We used to try to take the maximum of TransamVariables->nextOid and
                               8285                 :                :          * the recorded nextOid, but that fails if the OID counter wraps
                               8286                 :                :          * around.  Since no OID allocation should be happening during replay
                               8287                 :                :          * anyway, better to just believe the record exactly.  We still take
                               8288                 :                :          * OidGenLock while setting the variable, just in case.
                               8289                 :                :          */
 9073 vadim4o@yahoo.com        8290                 :             90 :         memcpy(&nextOid, XLogRecGetData(record), sizeof(Oid));
 4961 tgl@sss.pgh.pa.us        8291                 :             90 :         LWLockAcquire(OidGenLock, LW_EXCLUSIVE);
  638 heikki.linnakangas@i     8292                 :             90 :         TransamVariables->nextOid = nextOid;
                               8293                 :             90 :         TransamVariables->oidCount = 0;
 4961 tgl@sss.pgh.pa.us        8294                 :             90 :         LWLockRelease(OidGenLock);
                               8295                 :                :     }
 8943                          8296         [ +  + ]:          41345 :     else if (info == XLOG_CHECKPOINT_SHUTDOWN)
                               8297                 :                :     {
                               8298                 :                :         CheckPoint  checkPoint;
                               8299                 :                :         TimeLineID  replayTLI;
                               8300                 :                : 
                               8301                 :             30 :         memcpy(&checkPoint, XLogRecGetData(record), sizeof(CheckPoint));
                               8302                 :                :         /* In a SHUTDOWN checkpoint, believe the counters exactly */
 4961                          8303                 :             30 :         LWLockAcquire(XidGenLock, LW_EXCLUSIVE);
  638 heikki.linnakangas@i     8304                 :             30 :         TransamVariables->nextXid = checkPoint.nextXid;
 4961 tgl@sss.pgh.pa.us        8305                 :             30 :         LWLockRelease(XidGenLock);
                               8306                 :             30 :         LWLockAcquire(OidGenLock, LW_EXCLUSIVE);
  638 heikki.linnakangas@i     8307                 :             30 :         TransamVariables->nextOid = checkPoint.nextOid;
                               8308                 :             30 :         TransamVariables->oidCount = 0;
 4961 tgl@sss.pgh.pa.us        8309                 :             30 :         LWLockRelease(OidGenLock);
 7395                          8310                 :             30 :         MultiXactSetNextMXact(checkPoint.nextMulti,
                               8311                 :                :                               checkPoint.nextMultiOffset);
                               8312                 :                : 
 3633 andres@anarazel.de       8313                 :             30 :         MultiXactAdvanceOldest(checkPoint.oldestMulti,
                               8314                 :                :                                checkPoint.oldestMultiDB);
                               8315                 :                : 
                               8316                 :                :         /*
                               8317                 :                :          * No need to set oldestClogXid here as well; it'll be set when we
                               8318                 :                :          * redo an xl_clog_truncate if it changed since initialization.
                               8319                 :                :          */
 5680 tgl@sss.pgh.pa.us        8320                 :             30 :         SetTransactionIdLimit(checkPoint.oldestXid, checkPoint.oldestXidDB);
                               8321                 :                : 
                               8322                 :                :         /*
                               8323                 :                :          * If we see a shutdown checkpoint while waiting for an end-of-backup
                               8324                 :                :          * record, the backup was canceled and the end-of-backup record will
                               8325                 :                :          * never arrive.
                               8326                 :                :          */
 4579 heikki.linnakangas@i     8327         [ +  - ]:             30 :         if (ArchiveRecoveryRequested &&
 4973 simon@2ndQuadrant.co     8328         [ -  + ]:             30 :             !XLogRecPtrIsInvalid(ControlFile->backupStartPoint) &&
 4973 simon@2ndQuadrant.co     8329         [ #  # ]:UBC           0 :             XLogRecPtrIsInvalid(ControlFile->backupEndPoint))
 4961 tgl@sss.pgh.pa.us        8330         [ #  # ]:              0 :             ereport(PANIC,
                               8331                 :                :                     (errmsg("online backup was canceled, recovery cannot continue")));
                               8332                 :                : 
                               8333                 :                :         /*
                               8334                 :                :          * If we see a shutdown checkpoint, we know that nothing was running
                               8335                 :                :          * on the primary at this point. So fake-up an empty running-xacts
                               8336                 :                :          * record and use that here and now. Recover additional standby state
                               8337                 :                :          * for prepared transactions.
                               8338                 :                :          */
 5740 simon@2ndQuadrant.co     8339         [ +  + ]:CBC          30 :         if (standbyState >= STANDBY_INITIALIZED)
                               8340                 :                :         {
                               8341                 :                :             TransactionId *xids;
                               8342                 :                :             int         nxids;
                               8343                 :                :             TransactionId oldestActiveXID;
                               8344                 :                :             TransactionId latestCompletedXid;
                               8345                 :                :             RunningTransactionsData running;
                               8346                 :                : 
 5625 heikki.linnakangas@i     8347                 :             28 :             oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
                               8348                 :                : 
                               8349                 :                :             /* Update pg_subtrans entries for any prepared transactions */
  436                          8350                 :             28 :             StandbyRecoverPreparedTransactions();
                               8351                 :                : 
                               8352                 :                :             /*
                               8353                 :                :              * Construct a RunningTransactions snapshot representing a shut
                               8354                 :                :              * down server, with only prepared transactions still alive. We're
                               8355                 :                :              * never overflowed at this point because all subxids are listed
                               8356                 :                :              * with their parent prepared transactions.
                               8357                 :                :              */
 5625                          8358                 :             28 :             running.xcnt = nxids;
 4661 simon@2ndQuadrant.co     8359                 :             28 :             running.subxcnt = 0;
  436 heikki.linnakangas@i     8360                 :             28 :             running.subxid_status = SUBXIDS_IN_SUBTRANS;
 1852 andres@anarazel.de       8361                 :             28 :             running.nextXid = XidFromFullTransactionId(checkPoint.nextXid);
 5625 heikki.linnakangas@i     8362                 :             28 :             running.oldestRunningXid = oldestActiveXID;
 1852 andres@anarazel.de       8363                 :             28 :             latestCompletedXid = XidFromFullTransactionId(checkPoint.nextXid);
 5595 simon@2ndQuadrant.co     8364         [ -  + ]:             28 :             TransactionIdRetreat(latestCompletedXid);
 5594                          8365         [ -  + ]:             28 :             Assert(TransactionIdIsNormal(latestCompletedXid));
 5595                          8366                 :             28 :             running.latestCompletedXid = latestCompletedXid;
 5625 heikki.linnakangas@i     8367                 :             28 :             running.xids = xids;
                               8368                 :                : 
                               8369                 :             28 :             ProcArrayApplyRecoveryInfo(&running);
                               8370                 :                :         }
                               8371                 :                : 
                               8372                 :                :         /* ControlFile->checkPointCopy always tracks the latest ckpt XID */
 1916 tmunro@postgresql.or     8373                 :             30 :         LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 1852 andres@anarazel.de       8374                 :             30 :         ControlFile->checkPointCopy.nextXid = checkPoint.nextXid;
 1916 tmunro@postgresql.or     8375                 :             30 :         LWLockRelease(ControlFileLock);
                               8376                 :                : 
                               8377                 :                :         /*
                               8378                 :                :          * We should've already switched to the new TLI before replaying this
                               8379                 :                :          * record.
                               8380                 :                :          */
 1298 heikki.linnakangas@i     8381                 :             30 :         (void) GetCurrentReplayRecPtr(&replayTLI);
 1401 rhaas@postgresql.org     8382         [ -  + ]:             30 :         if (checkPoint.ThisTimeLineID != replayTLI)
 4656 heikki.linnakangas@i     8383         [ #  # ]:UBC           0 :             ereport(PANIC,
                               8384                 :                :                     (errmsg("unexpected timeline ID %u (should be %u) in shutdown checkpoint record",
                               8385                 :                :                             checkPoint.ThisTimeLineID, replayTLI)));
                               8386                 :                : 
 1382 rhaas@postgresql.org     8387                 :CBC          30 :         RecoveryRestartPoint(&checkPoint, record);
                               8388                 :                :     }
 8943 tgl@sss.pgh.pa.us        8389         [ +  + ]:          41315 :     else if (info == XLOG_CHECKPOINT_ONLINE)
                               8390                 :                :     {
                               8391                 :                :         CheckPoint  checkPoint;
                               8392                 :                :         TimeLineID  replayTLI;
                               8393                 :                : 
                               8394                 :            664 :         memcpy(&checkPoint, XLogRecGetData(record), sizeof(CheckPoint));
                               8395                 :                :         /* In an ONLINE checkpoint, treat the XID counter as a minimum */
 4961                          8396                 :            664 :         LWLockAcquire(XidGenLock, LW_EXCLUSIVE);
  638 heikki.linnakangas@i     8397         [ -  + ]:            664 :         if (FullTransactionIdPrecedes(TransamVariables->nextXid,
                               8398                 :                :                                       checkPoint.nextXid))
  638 heikki.linnakangas@i     8399                 :UBC           0 :             TransamVariables->nextXid = checkPoint.nextXid;
 4961 tgl@sss.pgh.pa.us        8400                 :CBC         664 :         LWLockRelease(XidGenLock);
                               8401                 :                : 
                               8402                 :                :         /*
                               8403                 :                :          * We ignore the nextOid counter in an ONLINE checkpoint, preferring
                               8404                 :                :          * to track OID assignment through XLOG_NEXTOID records.  The nextOid
                               8405                 :                :          * counter is from the start of the checkpoint and might well be stale
                               8406                 :                :          * compared to later XLOG_NEXTOID records.  We could try to take the
                               8407                 :                :          * maximum of the nextOid counter and our latest value, but since
                               8408                 :                :          * there's no particular guarantee about the speed with which the OID
                               8409                 :                :          * counter wraps around, that's a risky thing to do.  In any case,
                               8410                 :                :          * users of the nextOid counter are required to avoid assignment of
                               8411                 :                :          * duplicates, so that a somewhat out-of-date value should be safe.
                               8412                 :                :          */
                               8413                 :                : 
                               8414                 :                :         /* Handle multixact */
 7395                          8415                 :            664 :         MultiXactAdvanceNextMXact(checkPoint.nextMulti,
                               8416                 :                :                                   checkPoint.nextMultiOffset);
                               8417                 :                : 
                               8418                 :                :         /*
                               8419                 :                :          * NB: This may perform multixact truncation when replaying WAL
                               8420                 :                :          * generated by an older primary.
                               8421                 :                :          */
 3633 andres@anarazel.de       8422                 :            664 :         MultiXactAdvanceOldest(checkPoint.oldestMulti,
                               8423                 :                :                                checkPoint.oldestMultiDB);
  638 heikki.linnakangas@i     8424         [ -  + ]:            664 :         if (TransactionIdPrecedes(TransamVariables->oldestXid,
                               8425                 :                :                                   checkPoint.oldestXid))
 5680 tgl@sss.pgh.pa.us        8426                 :UBC           0 :             SetTransactionIdLimit(checkPoint.oldestXid,
                               8427                 :                :                                   checkPoint.oldestXidDB);
                               8428                 :                :         /* ControlFile->checkPointCopy always tracks the latest ckpt XID */
 1916 tmunro@postgresql.or     8429                 :CBC         664 :         LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 1852 andres@anarazel.de       8430                 :            664 :         ControlFile->checkPointCopy.nextXid = checkPoint.nextXid;
 1916 tmunro@postgresql.or     8431                 :            664 :         LWLockRelease(ControlFileLock);
                               8432                 :                : 
                               8433                 :                :         /* TLI should not change in an on-line checkpoint */
 1298 heikki.linnakangas@i     8434                 :            664 :         (void) GetCurrentReplayRecPtr(&replayTLI);
 1401 rhaas@postgresql.org     8435         [ -  + ]:            664 :         if (checkPoint.ThisTimeLineID != replayTLI)
 7878 tgl@sss.pgh.pa.us        8436         [ #  # ]:UBC           0 :             ereport(PANIC,
                               8437                 :                :                     (errmsg("unexpected timeline ID %u (should be %u) in online checkpoint record",
                               8438                 :                :                             checkPoint.ThisTimeLineID, replayTLI)));
                               8439                 :                : 
 1382 rhaas@postgresql.org     8440                 :CBC         664 :         RecoveryRestartPoint(&checkPoint, record);
                               8441                 :                :     }
 1438 alvherre@alvh.no-ip.     8442         [ +  + ]:          40651 :     else if (info == XLOG_OVERWRITE_CONTRECORD)
                               8443                 :                :     {
                               8444                 :                :         /* nothing to do here, handled in xlogrecovery_redo() */
                               8445                 :                :     }
 4603 simon@2ndQuadrant.co     8446         [ +  + ]:          40650 :     else if (info == XLOG_END_OF_RECOVERY)
                               8447                 :                :     {
                               8448                 :                :         xl_end_of_recovery xlrec;
                               8449                 :                :         TimeLineID  replayTLI;
                               8450                 :                : 
                               8451                 :              9 :         memcpy(&xlrec, XLogRecGetData(record), sizeof(xl_end_of_recovery));
                               8452                 :                : 
                               8453                 :                :         /*
                               8454                 :                :          * For Hot Standby, we could treat this like a Shutdown Checkpoint,
                               8455                 :                :          * but this case is rarer and harder to test, so the benefit doesn't
                               8456                 :                :          * outweigh the potential extra cost of maintenance.
                               8457                 :                :          */
                               8458                 :                : 
                               8459                 :                :         /*
                               8460                 :                :          * We should've already switched to the new TLI before replaying this
                               8461                 :                :          * record.
                               8462                 :                :          */
 1298 heikki.linnakangas@i     8463                 :              9 :         (void) GetCurrentReplayRecPtr(&replayTLI);
 1401 rhaas@postgresql.org     8464         [ -  + ]:              9 :         if (xlrec.ThisTimeLineID != replayTLI)
 4603 simon@2ndQuadrant.co     8465         [ #  # ]:UBC           0 :             ereport(PANIC,
                               8466                 :                :                     (errmsg("unexpected timeline ID %u (should be %u) in end-of-recovery record",
                               8467                 :                :                             xlrec.ThisTimeLineID, replayTLI)));
                               8468                 :                :     }
 6684 tgl@sss.pgh.pa.us        8469         [ +  - ]:CBC       40641 :     else if (info == XLOG_NOOP)
                               8470                 :                :     {
                               8471                 :                :         /* nothing to do here */
                               8472                 :                :     }
 6971                          8473         [ +  + ]:          40641 :     else if (info == XLOG_SWITCH)
                               8474                 :                :     {
                               8475                 :                :         /* nothing to do here */
                               8476                 :                :     }
 5324 simon@2ndQuadrant.co     8477         [ +  + ]:          40209 :     else if (info == XLOG_RESTORE_POINT)
                               8478                 :                :     {
                               8479                 :                :         /* nothing to do here, handled in xlogrecovery.c */
                               8480                 :                :     }
 3939 heikki.linnakangas@i     8481   [ +  +  +  + ]:          40204 :     else if (info == XLOG_FPI || info == XLOG_FPI_FOR_HINT)
                               8482                 :                :     {
                               8483                 :                :         /*
                               8484                 :                :          * XLOG_FPI records contain nothing else but one or more block
                               8485                 :                :          * references. Every block reference must include a full-page image
                               8486                 :                :          * even if full_page_writes was disabled when the record was generated
                               8487                 :                :          * - otherwise there would be no point in this record.
                               8488                 :                :          *
                               8489                 :                :          * XLOG_FPI_FOR_HINT records are generated when a page needs to be
                               8490                 :                :          * WAL-logged because of a hint bit update. They are only generated
                               8491                 :                :          * when checksums and/or wal_log_hints are enabled. They may include
                               8492                 :                :          * no full-page images if full_page_writes was disabled when they were
                               8493                 :                :          * generated. In this case there is nothing to do here.
                               8494                 :                :          *
                               8495                 :                :          * No recovery conflicts are generated by these generic records - if a
                               8496                 :                :          * resource manager needs to generate conflicts, it has to define a
                               8497                 :                :          * separate WAL record type and redo routine.
                               8498                 :                :          */
 1268 tmunro@postgresql.or     8499         [ +  + ]:          83637 :         for (uint8 block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
                               8500                 :                :         {
                               8501                 :                :             Buffer      buffer;
                               8502                 :                : 
 1508 fujii@postgresql.org     8503         [ +  + ]:          44209 :             if (!XLogRecHasBlockImage(record, block_id))
                               8504                 :                :             {
                               8505         [ -  + ]:             66 :                 if (info == XLOG_FPI)
 1508 fujii@postgresql.org     8506         [ #  # ]:UBC           0 :                     elog(ERROR, "XLOG_FPI record did not contain a full-page image");
 1508 fujii@postgresql.org     8507                 :CBC          66 :                 continue;
                               8508                 :                :             }
                               8509                 :                : 
 2348 heikki.linnakangas@i     8510         [ -  + ]:          44143 :             if (XLogReadBufferForRedo(record, block_id, &buffer) != BLK_RESTORED)
 2348 heikki.linnakangas@i     8511         [ #  # ]:UBC           0 :                 elog(ERROR, "unexpected XLogReadBufferForRedo result when restoring backup block");
 2348 heikki.linnakangas@i     8512                 :CBC       44143 :             UnlockReleaseBuffer(buffer);
                               8513                 :                :         }
                               8514                 :                :     }
 5724                          8515         [ +  + ]:            776 :     else if (info == XLOG_BACKUP_END)
                               8516                 :                :     {
                               8517                 :                :         /* nothing to do here, handled in xlogrecovery_redo() */
                               8518                 :                :     }
 5610                          8519         [ +  + ]:            695 :     else if (info == XLOG_PARAMETER_CHANGE)
                               8520                 :                :     {
                               8521                 :                :         xl_parameter_change xlrec;
                               8522                 :                : 
                               8523                 :                :         /* Update our copy of the parameters in pg_control */
                               8524                 :             30 :         memcpy(&xlrec, XLogRecGetData(record), sizeof(xl_parameter_change));
                               8525                 :                : 
                               8526                 :                :         /*
                               8527                 :                :          * Invalidate logical slots if we are in hot standby and the primary
                               8528                 :                :          * does not have a WAL level sufficient for logical decoding. No need
                               8529                 :                :          * to search for potentially conflicting logically slots if standby is
                               8530                 :                :          * running with wal_level lower than logical, because in that case, we
                               8531                 :                :          * would have either disallowed creation of logical slots or
                               8532                 :                :          * invalidated existing ones.
                               8533                 :                :          */
  883 andres@anarazel.de       8534   [ +  -  +  + ]:             30 :         if (InRecovery && InHotStandby &&
                               8535         [ +  + ]:             15 :             xlrec.wal_level < WAL_LEVEL_LOGICAL &&
                               8536         [ +  + ]:              5 :             wal_level >= WAL_LEVEL_LOGICAL)
                               8537                 :              3 :             InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_LEVEL,
                               8538                 :                :                                                0, InvalidOid,
                               8539                 :                :                                                InvalidTransactionId);
                               8540                 :                : 
 5605 heikki.linnakangas@i     8541                 :             30 :         LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 5610                          8542                 :             30 :         ControlFile->MaxConnections = xlrec.MaxConnections;
 4447 rhaas@postgresql.org     8543                 :             30 :         ControlFile->max_worker_processes = xlrec.max_worker_processes;
 2398 michael@paquier.xyz      8544                 :             30 :         ControlFile->max_wal_senders = xlrec.max_wal_senders;
 5610 heikki.linnakangas@i     8545                 :             30 :         ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
                               8546                 :             30 :         ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
                               8547                 :             30 :         ControlFile->wal_level = xlrec.wal_level;
 3887                          8548                 :             30 :         ControlFile->wal_log_hints = xlrec.wal_log_hints;
                               8549                 :                : 
                               8550                 :                :         /*
                               8551                 :                :          * Update minRecoveryPoint to ensure that if recovery is aborted, we
                               8552                 :                :          * recover back up to this point before allowing hot standby again.
                               8553                 :                :          * This is important if the max_* settings are decreased, to ensure
                               8554                 :                :          * you don't run queries against the WAL preceding the change. The
                               8555                 :                :          * local copies cannot be updated as long as crash recovery is
                               8556                 :                :          * happening and we expect all the WAL to be replayed.
                               8557                 :                :          */
 2620 michael@paquier.xyz      8558         [ +  + ]:             30 :         if (InArchiveRecovery)
                               8559                 :                :         {
 1298 heikki.linnakangas@i     8560                 :             16 :             LocalMinRecoveryPoint = ControlFile->minRecoveryPoint;
                               8561                 :             16 :             LocalMinRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
                               8562                 :                :         }
                               8563   [ +  +  +  + ]:             30 :         if (LocalMinRecoveryPoint != InvalidXLogRecPtr && LocalMinRecoveryPoint < lsn)
                               8564                 :                :         {
                               8565                 :                :             TimeLineID  replayTLI;
                               8566                 :                : 
                               8567                 :              5 :             (void) GetCurrentReplayRecPtr(&replayTLI);
 5605                          8568                 :              5 :             ControlFile->minRecoveryPoint = lsn;
 1401 rhaas@postgresql.org     8569                 :              5 :             ControlFile->minRecoveryPointTLI = replayTLI;
                               8570                 :                :         }
                               8571                 :                : 
 3628 alvherre@alvh.no-ip.     8572                 :             30 :         CommitTsParameterChange(xlrec.track_commit_timestamp,
                               8573                 :             30 :                                 ControlFile->track_commit_timestamp);
                               8574                 :             30 :         ControlFile->track_commit_timestamp = xlrec.track_commit_timestamp;
                               8575                 :                : 
 5610 heikki.linnakangas@i     8576                 :             30 :         UpdateControlFile();
 5605                          8577                 :             30 :         LWLockRelease(ControlFileLock);
                               8578                 :                : 
                               8579                 :                :         /* Check to see if any parameter change gives a problem on recovery */
 5610                          8580                 :             30 :         CheckRequiredParameterValues();
                               8581                 :                :     }
 4973 simon@2ndQuadrant.co     8582         [ -  + ]:            665 :     else if (info == XLOG_FPW_CHANGE)
                               8583                 :                :     {
                               8584                 :                :         bool        fpw;
                               8585                 :                : 
 4973 simon@2ndQuadrant.co     8586                 :UBC           0 :         memcpy(&fpw, XLogRecGetData(record), sizeof(bool));
                               8587                 :                : 
                               8588                 :                :         /*
                               8589                 :                :          * Update the LSN of the last replayed XLOG_FPW_CHANGE record so that
                               8590                 :                :          * do_pg_backup_start() and do_pg_backup_stop() can check whether
                               8591                 :                :          * full_page_writes has been disabled during online backup.
                               8592                 :                :          */
                               8593         [ #  # ]:              0 :         if (!fpw)
                               8594                 :                :         {
 4002 andres@anarazel.de       8595         [ #  # ]:              0 :             SpinLockAcquire(&XLogCtl->info_lck);
 1382 rhaas@postgresql.org     8596         [ #  # ]:              0 :             if (XLogCtl->lastFpwDisableRecPtr < record->ReadRecPtr)
                               8597                 :              0 :                 XLogCtl->lastFpwDisableRecPtr = record->ReadRecPtr;
 4002 andres@anarazel.de       8598                 :              0 :             SpinLockRelease(&XLogCtl->info_lck);
                               8599                 :                :         }
                               8600                 :                : 
                               8601                 :                :         /* Keep track of full_page_writes */
 4973 simon@2ndQuadrant.co     8602                 :              0 :         lastFullPageWrites = fpw;
                               8603                 :                :     }
                               8604                 :                :     else if (info == XLOG_CHECKPOINT_REDO)
                               8605                 :                :     {
                               8606                 :                :         /* nothing to do here, just for informational purposes */
                               8607                 :                :     }
 9086 vadim4o@yahoo.com        8608                 :CBC       41433 : }
                               8609                 :                : 
                               8610                 :                : /*
                               8611                 :                :  * Return the extra open flags used for opening a file, depending on the
                               8612                 :                :  * value of the GUCs wal_sync_method, fsync and debug_io_direct.
                               8613                 :                :  */
                               8614                 :                : static int
 6324 magnus@hagander.net      8615                 :          15257 : get_sync_bit(int method)
                               8616                 :                : {
 5671 bruce@momjian.us         8617                 :          15257 :     int         o_direct_flag = 0;
                               8618                 :                : 
                               8619                 :                :     /*
                               8620                 :                :      * Use O_DIRECT if requested, except in walreceiver process.  The WAL
                               8621                 :                :      * written by walreceiver is normally read by the startup process soon
                               8622                 :                :      * after it's written.  Also, walreceiver performs unaligned writes, which
                               8623                 :                :      * don't work with O_DIRECT, so it is required for correctness too.
                               8624                 :                :      */
  882 tmunro@postgresql.or     8625   [ +  +  +  - ]:          15257 :     if ((io_direct_flags & IO_DIRECT_WAL) && !AmWalReceiverProcess())
 5678 heikki.linnakangas@i     8626                 :              9 :         o_direct_flag = PG_O_DIRECT;
                               8627                 :                : 
                               8628                 :                :     /* If fsync is disabled, never open in sync mode */
  882 tmunro@postgresql.or     8629         [ +  - ]:          15257 :     if (!enableFsync)
                               8630                 :          15257 :         return o_direct_flag;
                               8631                 :                : 
 6324 magnus@hagander.net      8632   [ #  #  #  # ]:UBC           0 :     switch (method)
                               8633                 :                :     {
                               8634                 :                :             /*
                               8635                 :                :              * enum values for all sync options are defined even if they are
                               8636                 :                :              * not supported on the current platform.  But if not, they are
                               8637                 :                :              * not included in the enum option array, and therefore will never
                               8638                 :                :              * be seen here.
                               8639                 :                :              */
  694 nathan@postgresql.or     8640                 :              0 :         case WAL_SYNC_METHOD_FSYNC:
                               8641                 :                :         case WAL_SYNC_METHOD_FSYNC_WRITETHROUGH:
                               8642                 :                :         case WAL_SYNC_METHOD_FDATASYNC:
  882 tmunro@postgresql.or     8643                 :              0 :             return o_direct_flag;
                               8644                 :                : #ifdef O_SYNC
  694 nathan@postgresql.or     8645                 :              0 :         case WAL_SYNC_METHOD_OPEN:
 1142 tmunro@postgresql.or     8646                 :              0 :             return O_SYNC | o_direct_flag;
                               8647                 :                : #endif
                               8648                 :                : #ifdef O_DSYNC
  694 nathan@postgresql.or     8649                 :              0 :         case WAL_SYNC_METHOD_OPEN_DSYNC:
 1142 tmunro@postgresql.or     8650                 :              0 :             return O_DSYNC | o_direct_flag;
                               8651                 :                : #endif
 6326 magnus@hagander.net      8652                 :              0 :         default:
                               8653                 :                :             /* can't happen (unless we are out of sync with option array) */
  477 peter@eisentraut.org     8654         [ #  # ]:              0 :             elog(ERROR, "unrecognized \"wal_sync_method\": %d", method);
                               8655                 :                :             return 0;           /* silence warning */
                               8656                 :                :     }
                               8657                 :                : }
                               8658                 :                : 
                               8659                 :                : /*
                               8660                 :                :  * GUC support
                               8661                 :                :  */
                               8662                 :                : void
  694 nathan@postgresql.or     8663                 :CBC        1067 : assign_wal_sync_method(int new_wal_sync_method, void *extra)
                               8664                 :                : {
                               8665         [ -  + ]:           1067 :     if (wal_sync_method != new_wal_sync_method)
                               8666                 :                :     {
                               8667                 :                :         /*
                               8668                 :                :          * To ensure that no blocks escape unsynced, force an fsync on the
                               8669                 :                :          * currently open log segment (if any).  Also, if the open flag is
                               8670                 :                :          * changing, close the log file so it will be reopened (with new flag
                               8671                 :                :          * bit) at next use.
                               8672                 :                :          */
 8940 tgl@sss.pgh.pa.us        8673         [ #  # ]:UBC           0 :         if (openLogFile >= 0)
                               8674                 :                :         {
 3094 rhaas@postgresql.org     8675                 :              0 :             pgstat_report_wait_start(WAIT_EVENT_WAL_SYNC_METHOD_ASSIGN);
 8940 tgl@sss.pgh.pa.us        8676         [ #  # ]:              0 :             if (pg_fsync(openLogFile) != 0)
                               8677                 :                :             {
                               8678                 :                :                 char        xlogfname[MAXFNAMELEN];
                               8679                 :                :                 int         save_errno;
                               8680                 :                : 
 2104 michael@paquier.xyz      8681                 :              0 :                 save_errno = errno;
 1401 rhaas@postgresql.org     8682                 :              0 :                 XLogFileName(xlogfname, openLogTLI, openLogSegNo,
                               8683                 :                :                              wal_segment_size);
 2104 michael@paquier.xyz      8684                 :              0 :                 errno = save_errno;
 8083 tgl@sss.pgh.pa.us        8685         [ #  # ]:              0 :                 ereport(PANIC,
                               8686                 :                :                         (errcode_for_file_access(),
                               8687                 :                :                          errmsg("could not fsync file \"%s\": %m", xlogfname)));
                               8688                 :                :             }
                               8689                 :                : 
 3094 rhaas@postgresql.org     8690                 :              0 :             pgstat_report_wait_end();
  694 nathan@postgresql.or     8691         [ #  # ]:              0 :             if (get_sync_bit(wal_sync_method) != get_sync_bit(new_wal_sync_method))
 7023 bruce@momjian.us         8692                 :              0 :                 XLogFileClose();
                               8693                 :                :         }
                               8694                 :                :     }
 8940 tgl@sss.pgh.pa.us        8695                 :CBC        1067 : }
                               8696                 :                : 
                               8697                 :                : 
                               8698                 :                : /*
                               8699                 :                :  * Issue appropriate kind of fsync (if any) for an XLOG output file.
                               8700                 :                :  *
                               8701                 :                :  * 'fd' is a file descriptor for the XLOG file to be fsync'd.
                               8702                 :                :  * 'segno' is for error reporting purposes.
                               8703                 :                :  */
                               8704                 :                : void
 1401 rhaas@postgresql.org     8705                 :         191240 : issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
                               8706                 :                : {
 2104 michael@paquier.xyz      8707                 :         191240 :     char       *msg = NULL;
                               8708                 :                :     instr_time  start;
                               8709                 :                : 
 1401 rhaas@postgresql.org     8710         [ -  + ]:         191240 :     Assert(tli != 0);
                               8711                 :                : 
                               8712                 :                :     /*
                               8713                 :                :      * Quick exit if fsync is disabled or write() has already synced the WAL
                               8714                 :                :      * file.
                               8715                 :                :      */
 1642 fujii@postgresql.org     8716         [ -  + ]:         191240 :     if (!enableFsync ||
  694 nathan@postgresql.or     8717         [ #  # ]:UBC           0 :         wal_sync_method == WAL_SYNC_METHOD_OPEN ||
                               8718         [ #  # ]:              0 :         wal_sync_method == WAL_SYNC_METHOD_OPEN_DSYNC)
 1642 fujii@postgresql.org     8719                 :CBC      191240 :         return;
                               8720                 :                : 
                               8721                 :                :     /*
                               8722                 :                :      * Measure I/O timing to sync the WAL file for pg_stat_io.
                               8723                 :                :      */
  192 michael@paquier.xyz      8724                 :UBC           0 :     start = pgstat_prepare_io_time(track_wal_io_timing);
                               8725                 :                : 
 2623                          8726                 :              0 :     pgstat_report_wait_start(WAIT_EVENT_WAL_SYNC);
  694 nathan@postgresql.or     8727   [ #  #  #  # ]:              0 :     switch (wal_sync_method)
                               8728                 :                :     {
                               8729                 :              0 :         case WAL_SYNC_METHOD_FSYNC:
 5713 heikki.linnakangas@i     8730         [ #  # ]:              0 :             if (pg_fsync_no_writethrough(fd) != 0)
 2104 michael@paquier.xyz      8731                 :              0 :                 msg = _("could not fsync file \"%s\": %m");
 8940 tgl@sss.pgh.pa.us        8732                 :              0 :             break;
                               8733                 :                : #ifdef HAVE_FSYNC_WRITETHROUGH
                               8734                 :                :         case WAL_SYNC_METHOD_FSYNC_WRITETHROUGH:
                               8735                 :                :             if (pg_fsync_writethrough(fd) != 0)
                               8736                 :                :                 msg = _("could not fsync write-through file \"%s\": %m");
                               8737                 :                :             break;
                               8738                 :                : #endif
  694 nathan@postgresql.or     8739                 :              0 :         case WAL_SYNC_METHOD_FDATASYNC:
 5713 heikki.linnakangas@i     8740         [ #  # ]:              0 :             if (pg_fdatasync(fd) != 0)
 2104 michael@paquier.xyz      8741                 :              0 :                 msg = _("could not fdatasync file \"%s\": %m");
 8940 tgl@sss.pgh.pa.us        8742                 :              0 :             break;
  694 nathan@postgresql.or     8743                 :              0 :         case WAL_SYNC_METHOD_OPEN:
                               8744                 :                :         case WAL_SYNC_METHOD_OPEN_DSYNC:
                               8745                 :                :             /* not reachable */
 1642 fujii@postgresql.org     8746                 :              0 :             Assert(false);
                               8747                 :                :             break;
 8940 tgl@sss.pgh.pa.us        8748                 :              0 :         default:
  521 dgustafsson@postgres     8749         [ #  # ]:              0 :             ereport(PANIC,
                               8750                 :                :                     errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                               8751                 :                :                     errmsg_internal("unrecognized \"wal_sync_method\": %d", wal_sync_method));
                               8752                 :                :             break;
                               8753                 :                :     }
                               8754                 :                : 
                               8755                 :                :     /* PANIC if failed to fsync */
 2104 michael@paquier.xyz      8756         [ #  # ]:              0 :     if (msg)
                               8757                 :                :     {
                               8758                 :                :         char        xlogfname[MAXFNAMELEN];
                               8759                 :              0 :         int         save_errno = errno;
                               8760                 :                : 
 1401 rhaas@postgresql.org     8761                 :              0 :         XLogFileName(xlogfname, tli, segno, wal_segment_size);
 2104 michael@paquier.xyz      8762                 :              0 :         errno = save_errno;
                               8763         [ #  # ]:              0 :         ereport(PANIC,
                               8764                 :                :                 (errcode_for_file_access(),
                               8765                 :                :                  errmsg(msg, xlogfname)));
                               8766                 :                :     }
                               8767                 :                : 
                               8768                 :              0 :     pgstat_report_wait_end();
                               8769                 :                : 
  214                          8770                 :              0 :     pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL, IOOP_FSYNC,
                               8771                 :                :                             start, 1, 0);
                               8772                 :                : }
                               8773                 :                : 
                               8774                 :                : /*
                               8775                 :                :  * do_pg_backup_start is the workhorse of the user-visible pg_backup_start()
                               8776                 :                :  * function. It creates the necessary starting checkpoint and constructs the
                               8777                 :                :  * backup state and tablespace map.
                               8778                 :                :  *
                               8779                 :                :  * Input parameters are "state" (the backup state), "fast" (if true, we do
                               8780                 :                :  * the checkpoint in fast mode), and "tablespaces" (if non-NULL, indicates a
                               8781                 :                :  * list of tablespaceinfo structs describing the cluster's tablespaces.).
                               8782                 :                :  *
                               8783                 :                :  * The tablespace map contents are appended to passed-in parameter
                               8784                 :                :  * tablespace_map and the caller is responsible for including it in the backup
                               8785                 :                :  * archive as 'tablespace_map'. The tablespace_map file is required mainly for
                               8786                 :                :  * tar format in windows as native windows utilities are not able to create
                               8787                 :                :  * symlinks while extracting files from tar. However for consistency and
                               8788                 :                :  * platform-independence, we do it the same way everywhere.
                               8789                 :                :  *
                               8790                 :                :  * It fills in "state" with the information required for the backup, such
                               8791                 :                :  * as the minimum WAL location that must be present to restore from this
                               8792                 :                :  * backup (starttli) and the corresponding timeline ID (starttli).
                               8793                 :                :  *
                               8794                 :                :  * Every successfully started backup must be stopped by calling
                               8795                 :                :  * do_pg_backup_stop() or do_pg_abort_backup(). There can be many
                               8796                 :                :  * backups active at the same time.
                               8797                 :                :  *
                               8798                 :                :  * It is the responsibility of the caller of this function to verify the
                               8799                 :                :  * permissions of the calling user!
                               8800                 :                :  */
                               8801                 :                : void
 1076 michael@paquier.xyz      8802                 :CBC         168 : do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces,
                               8803                 :                :                    BackupState *state, StringInfo tblspcmapfile)
                               8804                 :                : {
                               8805                 :                :     bool        backup_started_in_recovery;
                               8806                 :                : 
                               8807         [ -  + ]:            168 :     Assert(state != NULL);
 4973 simon@2ndQuadrant.co     8808                 :            168 :     backup_started_in_recovery = RecoveryInProgress();
                               8809                 :                : 
                               8810                 :                :     /*
                               8811                 :                :      * During recovery, we don't need to check WAL level. Because, if WAL
                               8812                 :                :      * level is not sufficient, it's impossible to get here during recovery.
                               8813                 :                :      */
                               8814   [ +  +  -  + ]:            168 :     if (!backup_started_in_recovery && !XLogIsNeeded())
 6555 tgl@sss.pgh.pa.us        8815         [ #  # ]:UBC           0 :         ereport(ERROR,
                               8816                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               8817                 :                :                  errmsg("WAL level not sufficient for making an online backup"),
                               8818                 :                :                  errhint("\"wal_level\" must be set to \"replica\" or \"logical\" at server start.")));
                               8819                 :                : 
 5332 heikki.linnakangas@i     8820         [ +  + ]:CBC         168 :     if (strlen(backupidstr) > MAXPGPATH)
                               8821         [ +  - ]:              1 :         ereport(ERROR,
                               8822                 :                :                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                               8823                 :                :                  errmsg("backup label too long (max %d bytes)",
                               8824                 :                :                         MAXPGPATH)));
                               8825                 :                : 
  431 dgustafsson@postgres     8826                 :            167 :     strlcpy(state->name, backupidstr, sizeof(state->name));
                               8827                 :                : 
                               8828                 :                :     /*
                               8829                 :                :      * Mark backup active in shared memory.  We must do full-page WAL writes
                               8830                 :                :      * during an on-line backup even if not doing so at other times, because
                               8831                 :                :      * it's quite possible for the backup dump to obtain a "torn" (partially
                               8832                 :                :      * written) copy of a database page if it reads the page concurrently with
                               8833                 :                :      * our write to the same page.  This can be fixed as long as the first
                               8834                 :                :      * write to the page in the WAL sequence is a full-page write. Hence, we
                               8835                 :                :      * increment runningBackups then force a CHECKPOINT, to ensure there are
                               8836                 :                :      * no dirty pages in shared memory that might get dumped while the backup
                               8837                 :                :      * is in progress without having a corresponding WAL record.  (Once the
                               8838                 :                :      * backup is complete, we need not force full-page writes anymore, since
                               8839                 :                :      * we expect that any pages not modified during the backup interval must
                               8840                 :                :      * have been correctly captured by the backup.)
                               8841                 :                :      *
                               8842                 :                :      * Note that forcing full-page writes has no effect during an online
                               8843                 :                :      * backup from the standby.
                               8844                 :                :      *
                               8845                 :                :      * We must hold all the insertion locks to change the value of
                               8846                 :                :      * runningBackups, to ensure adequate interlocking against
                               8847                 :                :      * XLogInsertRecord().
                               8848                 :                :      */
 4187 heikki.linnakangas@i     8849                 :            167 :     WALInsertLockAcquireExclusive();
 1249 sfrost@snowman.net       8850                 :            167 :     XLogCtl->Insert.runningBackups++;
 4187 heikki.linnakangas@i     8851                 :            167 :     WALInsertLockRelease();
                               8852                 :                : 
                               8853                 :                :     /*
                               8854                 :                :      * Ensure we decrement runningBackups if we fail below. NB -- for this to
                               8855                 :                :      * work correctly, it is critical that sessionBackupState is only updated
                               8856                 :                :      * after this block is over.
                               8857                 :                :      */
   32 peter@eisentraut.org     8858         [ +  - ]:GNC         167 :     PG_ENSURE_ERROR_CLEANUP(do_pg_abort_backup, BoolGetDatum(true));
                               8859                 :                :     {
 5263 bruce@momjian.us         8860                 :CBC         167 :         bool        gotUniqueStartpoint = false;
                               8861                 :                :         DIR        *tblspcdir;
                               8862                 :                :         struct dirent *de;
                               8863                 :                :         tablespaceinfo *ti;
                               8864                 :                :         int         datadirpathlen;
                               8865                 :                : 
                               8866                 :                :         /*
                               8867                 :                :          * Force an XLOG file switch before the checkpoint, to ensure that the
                               8868                 :                :          * WAL segment the checkpoint is written to doesn't contain pages with
                               8869                 :                :          * old timeline IDs.  That would otherwise happen if you called
                               8870                 :                :          * pg_backup_start() right after restoring from a PITR archive: the
                               8871                 :                :          * first WAL segment containing the startup checkpoint has pages in
                               8872                 :                :          * the beginning with the old timeline ID.  That can cause trouble at
                               8873                 :                :          * recovery: we won't have a history file covering the old timeline if
                               8874                 :                :          * pg_wal directory was not included in the base backup and the WAL
                               8875                 :                :          * archive was cleared too before starting the backup.
                               8876                 :                :          *
                               8877                 :                :          * This also ensures that we have emitted a WAL page header that has
                               8878                 :                :          * XLP_BKP_REMOVABLE off before we emit the checkpoint record.
                               8879                 :                :          * Therefore, if a WAL archiver (such as pglesslog) is trying to
                               8880                 :                :          * compress out removable backup blocks, it won't remove any that
                               8881                 :                :          * occur after this point.
                               8882                 :                :          *
                               8883                 :                :          * During recovery, we skip forcing XLOG file switch, which means that
                               8884                 :                :          * the backup taken during recovery is not available for the special
                               8885                 :                :          * recovery case described above.
                               8886                 :                :          */
 4973 simon@2ndQuadrant.co     8887         [ +  + ]:            167 :         if (!backup_started_in_recovery)
 3180 andres@anarazel.de       8888                 :            161 :             RequestXLogSwitch(false);
                               8889                 :                : 
                               8890                 :                :         do
                               8891                 :                :         {
                               8892                 :                :             bool        checkpointfpw;
                               8893                 :                : 
                               8894                 :                :             /*
                               8895                 :                :              * Force a CHECKPOINT.  Aside from being necessary to prevent torn
                               8896                 :                :              * page problems, this guarantees that two successive backup runs
                               8897                 :                :              * will have different checkpoint positions and hence different
                               8898                 :                :              * history file names, even if nothing happened in between.
                               8899                 :                :              *
                               8900                 :                :              * During recovery, establish a restartpoint if possible. We use
                               8901                 :                :              * the last restartpoint as the backup starting checkpoint. This
                               8902                 :                :              * means that two successive backup runs can have same checkpoint
                               8903                 :                :              * positions.
                               8904                 :                :              *
                               8905                 :                :              * Since the fact that we are executing do_pg_backup_start()
                               8906                 :                :              * during recovery means that checkpointer is running, we can use
                               8907                 :                :              * RequestCheckpoint() to establish a restartpoint.
                               8908                 :                :              *
                               8909                 :                :              * We use CHECKPOINT_FAST only if requested by user (via passing
                               8910                 :                :              * fast = true).  Otherwise this can take awhile.
                               8911                 :                :              */
 5283 heikki.linnakangas@i     8912         [ +  + ]:            167 :             RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT |
                               8913                 :                :                               (fast ? CHECKPOINT_FAST : 0));
                               8914                 :                : 
                               8915                 :                :             /*
                               8916                 :                :              * Now we need to fetch the checkpoint record location, and also
                               8917                 :                :              * its REDO pointer.  The oldest point in WAL that would be needed
                               8918                 :                :              * to restore starting from the checkpoint is precisely the REDO
                               8919                 :                :              * pointer.
                               8920                 :                :              */
                               8921                 :            167 :             LWLockAcquire(ControlFileLock, LW_SHARED);
 1076 michael@paquier.xyz      8922                 :            167 :             state->checkpointloc = ControlFile->checkPoint;
                               8923                 :            167 :             state->startpoint = ControlFile->checkPointCopy.redo;
                               8924                 :            167 :             state->starttli = ControlFile->checkPointCopy.ThisTimeLineID;
 4973 simon@2ndQuadrant.co     8925                 :            167 :             checkpointfpw = ControlFile->checkPointCopy.fullPageWrites;
 5283 heikki.linnakangas@i     8926                 :            167 :             LWLockRelease(ControlFileLock);
                               8927                 :                : 
 4973 simon@2ndQuadrant.co     8928         [ +  + ]:            167 :             if (backup_started_in_recovery)
                               8929                 :                :             {
                               8930                 :                :                 XLogRecPtr  recptr;
                               8931                 :                : 
                               8932                 :                :                 /*
                               8933                 :                :                  * Check to see if all WAL replayed during online backup
                               8934                 :                :                  * (i.e., since last restartpoint used as backup starting
                               8935                 :                :                  * checkpoint) contain full-page writes.
                               8936                 :                :                  */
 4002 andres@anarazel.de       8937         [ -  + ]:              6 :                 SpinLockAcquire(&XLogCtl->info_lck);
                               8938                 :              6 :                 recptr = XLogCtl->lastFpwDisableRecPtr;
                               8939                 :              6 :                 SpinLockRelease(&XLogCtl->info_lck);
                               8940                 :                : 
 1076 michael@paquier.xyz      8941   [ +  -  -  + ]:              6 :                 if (!checkpointfpw || state->startpoint <= recptr)
 4973 simon@2ndQuadrant.co     8942         [ #  # ]:UBC           0 :                     ereport(ERROR,
                               8943                 :                :                             (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               8944                 :                :                              errmsg("WAL generated with \"full_page_writes=off\" was replayed "
                               8945                 :                :                                     "since last restartpoint"),
                               8946                 :                :                              errhint("This means that the backup being taken on the standby "
                               8947                 :                :                                      "is corrupt and should not be used. "
                               8948                 :                :                                      "Enable \"full_page_writes\" and run CHECKPOINT on the primary, "
                               8949                 :                :                                      "and then try an online backup again.")));
                               8950                 :                : 
                               8951                 :                :                 /*
                               8952                 :                :                  * During recovery, since we don't use the end-of-backup WAL
                               8953                 :                :                  * record and don't write the backup history file, the
                               8954                 :                :                  * starting WAL location doesn't need to be unique. This means
                               8955                 :                :                  * that two base backups started at the same time might use
                               8956                 :                :                  * the same checkpoint as starting locations.
                               8957                 :                :                  */
 4973 simon@2ndQuadrant.co     8958                 :CBC           6 :                 gotUniqueStartpoint = true;
                               8959                 :                :             }
                               8960                 :                : 
                               8961                 :                :             /*
                               8962                 :                :              * If two base backups are started at the same time (in WAL sender
                               8963                 :                :              * processes), we need to make sure that they use different
                               8964                 :                :              * checkpoints as starting locations, because we use the starting
                               8965                 :                :              * WAL location as a unique identifier for the base backup in the
                               8966                 :                :              * end-of-backup WAL record and when we write the backup history
                               8967                 :                :              * file. Perhaps it would be better generate a separate unique ID
                               8968                 :                :              * for each backup instead of forcing another checkpoint, but
                               8969                 :                :              * taking a checkpoint right after another is not that expensive
                               8970                 :                :              * either because only few buffers have been dirtied yet.
                               8971                 :                :              */
 4187 heikki.linnakangas@i     8972                 :            167 :             WALInsertLockAcquireExclusive();
 1076 michael@paquier.xyz      8973         [ +  - ]:            167 :             if (XLogCtl->Insert.lastBackupStart < state->startpoint)
                               8974                 :                :             {
                               8975                 :            167 :                 XLogCtl->Insert.lastBackupStart = state->startpoint;
 5283 heikki.linnakangas@i     8976                 :            167 :                 gotUniqueStartpoint = true;
                               8977                 :                :             }
 4187                          8978                 :            167 :             WALInsertLockRelease();
 5263 bruce@momjian.us         8979         [ -  + ]:            167 :         } while (!gotUniqueStartpoint);
                               8980                 :                : 
                               8981                 :                :         /*
                               8982                 :                :          * Construct tablespace_map file.
                               8983                 :                :          */
 3770 andrew@dunslane.net      8984                 :            167 :         datadirpathlen = strlen(DataDir);
                               8985                 :                : 
                               8986                 :                :         /* Collect information about all tablespaces */
  368 michael@paquier.xyz      8987                 :            167 :         tblspcdir = AllocateDir(PG_TBLSPC_DIR);
                               8988         [ +  + ]:            540 :         while ((de = ReadDir(tblspcdir, PG_TBLSPC_DIR)) != NULL)
                               8989                 :                :         {
                               8990                 :                :             char        fullpath[MAXPGPATH + sizeof(PG_TBLSPC_DIR)];
                               8991                 :                :             char        linkpath[MAXPGPATH];
 3770 andrew@dunslane.net      8992                 :            373 :             char       *relpath = NULL;
                               8993                 :                :             char       *s;
                               8994                 :                :             PGFileType  de_type;
                               8995                 :                :             char       *badp;
                               8996                 :                :             Oid         tsoid;
                               8997                 :                : 
                               8998                 :                :             /*
                               8999                 :                :              * Try to parse the directory name as an unsigned integer.
                               9000                 :                :              *
                               9001                 :                :              * Tablespace directories should be positive integers that can be
                               9002                 :                :              * represented in 32 bits, with no leading zeroes or trailing
                               9003                 :                :              * garbage. If we come across a name that doesn't meet those
                               9004                 :                :              * criteria, skip it.
                               9005                 :                :              */
  684 rhaas@postgresql.org     9006   [ +  +  -  + ]:            373 :             if (de->d_name[0] < '1' || de->d_name[1] > '9')
                               9007                 :            334 :                 continue;
                               9008                 :             39 :             errno = 0;
                               9009                 :             39 :             tsoid = strtoul(de->d_name, &badp, 10);
                               9010   [ +  -  +  -  :             39 :             if (*badp != '\0' || errno == EINVAL || errno == ERANGE)
                                              -  + ]
 3770 andrew@dunslane.net      9011                 :UBC           0 :                 continue;
                               9012                 :                : 
  368 michael@paquier.xyz      9013                 :CBC          39 :             snprintf(fullpath, sizeof(fullpath), "%s/%s", PG_TBLSPC_DIR, de->d_name);
                               9014                 :                : 
  872 rhaas@postgresql.org     9015                 :             39 :             de_type = get_dirent_type(fullpath, de, false, ERROR);
                               9016                 :                : 
                               9017         [ +  + ]:             39 :             if (de_type == PGFILETYPE_LNK)
                               9018                 :                :             {
                               9019                 :                :                 StringInfoData escapedpath;
                               9020                 :                :                 int         rllen;
                               9021                 :                : 
                               9022                 :             25 :                 rllen = readlink(fullpath, linkpath, sizeof(linkpath));
                               9023         [ -  + ]:             25 :                 if (rllen < 0)
                               9024                 :                :                 {
  872 rhaas@postgresql.org     9025         [ #  # ]:UBC           0 :                     ereport(WARNING,
                               9026                 :                :                             (errmsg("could not read symbolic link \"%s\": %m",
                               9027                 :                :                                     fullpath)));
                               9028                 :              0 :                     continue;
                               9029                 :                :                 }
  872 rhaas@postgresql.org     9030         [ -  + ]:CBC          25 :                 else if (rllen >= sizeof(linkpath))
                               9031                 :                :                 {
  872 rhaas@postgresql.org     9032         [ #  # ]:UBC           0 :                     ereport(WARNING,
                               9033                 :                :                             (errmsg("symbolic link \"%s\" target is too long",
                               9034                 :                :                                     fullpath)));
                               9035                 :              0 :                     continue;
                               9036                 :                :                 }
  872 rhaas@postgresql.org     9037                 :CBC          25 :                 linkpath[rllen] = '\0';
                               9038                 :                : 
                               9039                 :                :                 /*
                               9040                 :                :                  * Relpath holds the relative path of the tablespace directory
                               9041                 :                :                  * when it's located within PGDATA, or NULL if it's located
                               9042                 :                :                  * elsewhere.
                               9043                 :                :                  */
                               9044         [ -  + ]:             25 :                 if (rllen > datadirpathlen &&
  872 rhaas@postgresql.org     9045         [ #  # ]:UBC           0 :                     strncmp(linkpath, DataDir, datadirpathlen) == 0 &&
  841 tgl@sss.pgh.pa.us        9046         [ #  # ]:              0 :                     IS_DIR_SEP(linkpath[datadirpathlen]))
  872 rhaas@postgresql.org     9047                 :              0 :                     relpath = pstrdup(linkpath + datadirpathlen + 1);
                               9048                 :                : 
                               9049                 :                :                 /*
                               9050                 :                :                  * Add a backslash-escaped version of the link path to the
                               9051                 :                :                  * tablespace map file.
                               9052                 :                :                  */
  872 rhaas@postgresql.org     9053                 :CBC          25 :                 initStringInfo(&escapedpath);
                               9054         [ +  + ]:            594 :                 for (s = linkpath; *s; s++)
                               9055                 :                :                 {
                               9056   [ +  -  +  -  :            569 :                     if (*s == '\n' || *s == '\r' || *s == '\\')
                                              -  + ]
  872 rhaas@postgresql.org     9057                 :UBC           0 :                         appendStringInfoChar(&escapedpath, '\\');
  872 rhaas@postgresql.org     9058                 :CBC         569 :                     appendStringInfoChar(&escapedpath, *s);
                               9059                 :                :                 }
                               9060                 :             25 :                 appendStringInfo(tblspcmapfile, "%s %s\n",
                               9061                 :             25 :                                  de->d_name, escapedpath.data);
                               9062                 :             25 :                 pfree(escapedpath.data);
                               9063                 :                :             }
                               9064         [ +  - ]:             14 :             else if (de_type == PGFILETYPE_DIR)
                               9065                 :                :             {
                               9066                 :                :                 /*
                               9067                 :                :                  * It's possible to use allow_in_place_tablespaces to create
                               9068                 :                :                  * directories directly under pg_tblspc, for testing purposes
                               9069                 :                :                  * only.
                               9070                 :                :                  *
                               9071                 :                :                  * In this case, we store a relative path rather than an
                               9072                 :                :                  * absolute path into the tablespaceinfo.
                               9073                 :                :                  */
  368 michael@paquier.xyz      9074                 :             14 :                 snprintf(linkpath, sizeof(linkpath), "%s/%s",
                               9075                 :             14 :                          PG_TBLSPC_DIR, de->d_name);
  872 rhaas@postgresql.org     9076                 :             14 :                 relpath = pstrdup(linkpath);
                               9077                 :                :             }
                               9078                 :                :             else
                               9079                 :                :             {
                               9080                 :                :                 /* Skip any other file type that appears here. */
  872 rhaas@postgresql.org     9081                 :UBC           0 :                 continue;
                               9082                 :                :             }
                               9083                 :                : 
 3770 andrew@dunslane.net      9084                 :CBC          39 :             ti = palloc(sizeof(tablespaceinfo));
  684 rhaas@postgresql.org     9085                 :             39 :             ti->oid = tsoid;
 1634 tgl@sss.pgh.pa.us        9086                 :             39 :             ti->path = pstrdup(linkpath);
  872 rhaas@postgresql.org     9087                 :             39 :             ti->rpath = relpath;
 1907                          9088                 :             39 :             ti->size = -1;
                               9089                 :                : 
 3759 bruce@momjian.us         9090         [ +  - ]:             39 :             if (tablespaces)
                               9091                 :             39 :                 *tablespaces = lappend(*tablespaces, ti);
                               9092                 :                :         }
 2833 tgl@sss.pgh.pa.us        9093                 :            167 :         FreeDir(tblspcdir);
                               9094                 :                : 
 1076 michael@paquier.xyz      9095                 :            167 :         state->starttime = (pg_time_t) time(NULL);
                               9096                 :                :     }
   32 peter@eisentraut.org     9097         [ -  + ]:GNC         167 :     PG_END_ENSURE_ERROR_CLEANUP(do_pg_abort_backup, BoolGetDatum(true));
                               9098                 :                : 
 1076 michael@paquier.xyz      9099                 :CBC         167 :     state->started_in_recovery = backup_started_in_recovery;
                               9100                 :                : 
                               9101                 :                :     /*
                               9102                 :                :      * Mark that the start phase has correctly finished for the backup.
                               9103                 :                :      */
 1249 sfrost@snowman.net       9104                 :            167 :     sessionBackupState = SESSION_BACKUP_RUNNING;
 7704 tgl@sss.pgh.pa.us        9105                 :            167 : }
                               9106                 :                : 
                               9107                 :                : /*
                               9108                 :                :  * Utility routine to fetch the session-level status of a backup running.
                               9109                 :                :  */
                               9110                 :                : SessionBackupState
 3088 teodor@sigaev.ru         9111                 :            188 : get_backup_status(void)
                               9112                 :                : {
                               9113                 :            188 :     return sessionBackupState;
                               9114                 :                : }
                               9115                 :                : 
                               9116                 :                : /*
                               9117                 :                :  * do_pg_backup_stop
                               9118                 :                :  *
                               9119                 :                :  * Utility function called at the end of an online backup.  It creates history
                               9120                 :                :  * file (if required), resets sessionBackupState and so on.  It can optionally
                               9121                 :                :  * wait for WAL segments to be archived.
                               9122                 :                :  *
                               9123                 :                :  * "state" is filled with the information necessary to restore from this
                               9124                 :                :  * backup with its stop LSN (stoppoint), its timeline ID (stoptli), etc.
                               9125                 :                :  *
                               9126                 :                :  * It is the responsibility of the caller of this function to verify the
                               9127                 :                :  * permissions of the calling user!
                               9128                 :                :  */
                               9129                 :                : void
 1076 michael@paquier.xyz      9130                 :            161 : do_pg_backup_stop(BackupState *state, bool waitforarchive)
                               9131                 :                : {
                               9132                 :            161 :     bool        backup_stopped_in_recovery = false;
                               9133                 :                :     char        histfilepath[MAXPGPATH];
                               9134                 :                :     char        lastxlogfilename[MAXFNAMELEN];
                               9135                 :                :     char        histfilename[MAXFNAMELEN];
                               9136                 :                :     XLogSegNo   _logSegNo;
                               9137                 :                :     FILE       *fp;
                               9138                 :                :     int         seconds_before_warning;
 6363 bruce@momjian.us         9139                 :            161 :     int         waits = 0;
 5620 simon@2ndQuadrant.co     9140                 :            161 :     bool        reported_waiting = false;
                               9141                 :                : 
 1076 michael@paquier.xyz      9142         [ -  + ]:            161 :     Assert(state != NULL);
                               9143                 :                : 
                               9144                 :            161 :     backup_stopped_in_recovery = RecoveryInProgress();
                               9145                 :                : 
                               9146                 :                :     /*
                               9147                 :                :      * During recovery, we don't need to check WAL level. Because, if WAL
                               9148                 :                :      * level is not sufficient, it's impossible to get here during recovery.
                               9149                 :                :      */
                               9150   [ +  +  -  + ]:            161 :     if (!backup_stopped_in_recovery && !XLogIsNeeded())
 6207 tgl@sss.pgh.pa.us        9151         [ #  # ]:UBC           0 :         ereport(ERROR,
                               9152                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               9153                 :                :                  errmsg("WAL level not sufficient for making an online backup"),
                               9154                 :                :                  errhint("\"wal_level\" must be set to \"replica\" or \"logical\" at server start.")));
                               9155                 :                : 
                               9156                 :                :     /*
                               9157                 :                :      * OK to update backup counter and session-level lock.
                               9158                 :                :      *
                               9159                 :                :      * Note that CHECK_FOR_INTERRUPTS() must not occur while updating them,
                               9160                 :                :      * otherwise they can be updated inconsistently, which might cause
                               9161                 :                :      * do_pg_abort_backup() to fail.
                               9162                 :                :      */
 3154 fujii@postgresql.org     9163                 :CBC         161 :     WALInsertLockAcquireExclusive();
                               9164                 :                : 
                               9165                 :                :     /*
                               9166                 :                :      * It is expected that each do_pg_backup_start() call is matched by
                               9167                 :                :      * exactly one do_pg_backup_stop() call.
                               9168                 :                :      */
 1249 sfrost@snowman.net       9169         [ -  + ]:            161 :     Assert(XLogCtl->Insert.runningBackups > 0);
                               9170                 :            161 :     XLogCtl->Insert.runningBackups--;
                               9171                 :                : 
                               9172                 :                :     /*
                               9173                 :                :      * Clean up session-level lock.
                               9174                 :                :      *
                               9175                 :                :      * You might think that WALInsertLockRelease() can be called before
                               9176                 :                :      * cleaning up session-level lock because session-level lock doesn't need
                               9177                 :                :      * to be protected with WAL insertion lock. But since
                               9178                 :                :      * CHECK_FOR_INTERRUPTS() can occur in it, session-level lock must be
                               9179                 :                :      * cleaned up before it.
                               9180                 :                :      */
 3088 teodor@sigaev.ru         9181                 :            161 :     sessionBackupState = SESSION_BACKUP_NONE;
                               9182                 :                : 
 2818 fujii@postgresql.org     9183                 :            161 :     WALInsertLockRelease();
                               9184                 :                : 
                               9185                 :                :     /*
                               9186                 :                :      * If we are taking an online backup from the standby, we confirm that the
                               9187                 :                :      * standby has not been promoted during the backup.
                               9188                 :                :      */
 1076 michael@paquier.xyz      9189   [ +  +  -  + ]:            161 :     if (state->started_in_recovery && !backup_stopped_in_recovery)
 4973 simon@2ndQuadrant.co     9190         [ #  # ]:UBC           0 :         ereport(ERROR,
                               9191                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               9192                 :                :                  errmsg("the standby was promoted during online backup"),
                               9193                 :                :                  errhint("This means that the backup being taken is corrupt "
                               9194                 :                :                          "and should not be used. "
                               9195                 :                :                          "Try taking another online backup.")));
                               9196                 :                : 
                               9197                 :                :     /*
                               9198                 :                :      * During recovery, we don't write an end-of-backup record. We assume that
                               9199                 :                :      * pg_control was backed up last and its minimum recovery point can be
                               9200                 :                :      * available as the backup end location. Since we don't have an
                               9201                 :                :      * end-of-backup record, we use the pg_control value to check whether
                               9202                 :                :      * we've reached the end of backup when starting recovery from this
                               9203                 :                :      * backup. We have no way of checking if pg_control wasn't backed up last
                               9204                 :                :      * however.
                               9205                 :                :      *
                               9206                 :                :      * We don't force a switch to new WAL file but it is still possible to
                               9207                 :                :      * wait for all the required files to be archived if waitforarchive is
                               9208                 :                :      * true. This is okay if we use the backup to start a standby and fetch
                               9209                 :                :      * the missing WAL using streaming replication. But in the case of an
                               9210                 :                :      * archive recovery, a user should set waitforarchive to true and wait for
                               9211                 :                :      * them to be archived to ensure that all the required files are
                               9212                 :                :      * available.
                               9213                 :                :      *
                               9214                 :                :      * We return the current minimum recovery point as the backup end
                               9215                 :                :      * location. Note that it can be greater than the exact backup end
                               9216                 :                :      * location if the minimum recovery point is updated after the backup of
                               9217                 :                :      * pg_control. This is harmless for current uses.
                               9218                 :                :      *
                               9219                 :                :      * XXX currently a backup history file is for informational and debug
                               9220                 :                :      * purposes only. It's not essential for an online backup. Furthermore,
                               9221                 :                :      * even if it's created, it will not be archived during recovery because
                               9222                 :                :      * an archiver is not invoked. So it doesn't seem worthwhile to write a
                               9223                 :                :      * backup history file during recovery.
                               9224                 :                :      */
 1076 michael@paquier.xyz      9225         [ +  + ]:CBC         161 :     if (backup_stopped_in_recovery)
                               9226                 :                :     {
                               9227                 :                :         XLogRecPtr  recptr;
                               9228                 :                : 
                               9229                 :                :         /*
                               9230                 :                :          * Check to see if all WAL replayed during online backup contain
                               9231                 :                :          * full-page writes.
                               9232                 :                :          */
 4002 andres@anarazel.de       9233         [ -  + ]:              6 :         SpinLockAcquire(&XLogCtl->info_lck);
                               9234                 :              6 :         recptr = XLogCtl->lastFpwDisableRecPtr;
                               9235                 :              6 :         SpinLockRelease(&XLogCtl->info_lck);
                               9236                 :                : 
 1076 michael@paquier.xyz      9237         [ -  + ]:              6 :         if (state->startpoint <= recptr)
 4973 simon@2ndQuadrant.co     9238         [ #  # ]:UBC           0 :             ereport(ERROR,
                               9239                 :                :                     (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               9240                 :                :                      errmsg("WAL generated with \"full_page_writes=off\" was replayed "
                               9241                 :                :                             "during online backup"),
                               9242                 :                :                      errhint("This means that the backup being taken on the standby "
                               9243                 :                :                              "is corrupt and should not be used. "
                               9244                 :                :                              "Enable \"full_page_writes\" and run CHECKPOINT on the primary, "
                               9245                 :                :                              "and then try an online backup again.")));
                               9246                 :                : 
                               9247                 :                : 
 4973 simon@2ndQuadrant.co     9248                 :CBC           6 :         LWLockAcquire(ControlFileLock, LW_SHARED);
 1076 michael@paquier.xyz      9249                 :              6 :         state->stoppoint = ControlFile->minRecoveryPoint;
                               9250                 :              6 :         state->stoptli = ControlFile->minRecoveryPointTLI;
 4973 simon@2ndQuadrant.co     9251                 :              6 :         LWLockRelease(ControlFileLock);
                               9252                 :                :     }
                               9253                 :                :     else
                               9254                 :                :     {
                               9255                 :                :         char       *history_file;
                               9256                 :                : 
                               9257                 :                :         /*
                               9258                 :                :          * Write the backup-end xlog record
                               9259                 :                :          */
 2954 rhaas@postgresql.org     9260                 :            155 :         XLogBeginInsert();
  207 peter@eisentraut.org     9261                 :            155 :         XLogRegisterData(&state->startpoint,
                               9262                 :                :                          sizeof(state->startpoint));
 1076 michael@paquier.xyz      9263                 :            155 :         state->stoppoint = XLogInsert(RM_XLOG_ID, XLOG_BACKUP_END);
                               9264                 :                : 
                               9265                 :                :         /*
                               9266                 :                :          * Given that we're not in recovery, InsertTimeLineID is set and can't
                               9267                 :                :          * change, so we can read it without a lock.
                               9268                 :                :          */
                               9269                 :            155 :         state->stoptli = XLogCtl->InsertTimeLineID;
                               9270                 :                : 
                               9271                 :                :         /*
                               9272                 :                :          * Force a switch to a new xlog segment file, so that the backup is
                               9273                 :                :          * valid as soon as archiver moves out the current segment file.
                               9274                 :                :          */
 2954 rhaas@postgresql.org     9275                 :            155 :         RequestXLogSwitch(false);
                               9276                 :                : 
 1076 michael@paquier.xyz      9277                 :            155 :         state->stoptime = (pg_time_t) time(NULL);
                               9278                 :                : 
                               9279                 :                :         /*
                               9280                 :                :          * Write the backup history file
                               9281                 :                :          */
                               9282                 :            155 :         XLByteToSeg(state->startpoint, _logSegNo, wal_segment_size);
                               9283                 :            155 :         BackupHistoryFilePath(histfilepath, state->stoptli, _logSegNo,
                               9284                 :                :                               state->startpoint, wal_segment_size);
 2954 rhaas@postgresql.org     9285                 :            155 :         fp = AllocateFile(histfilepath, "w");
                               9286         [ -  + ]:            155 :         if (!fp)
 2954 rhaas@postgresql.org     9287         [ #  # ]:UBC           0 :             ereport(ERROR,
                               9288                 :                :                     (errcode_for_file_access(),
                               9289                 :                :                      errmsg("could not create file \"%s\": %m",
                               9290                 :                :                             histfilepath)));
                               9291                 :                : 
                               9292                 :                :         /* Build and save the contents of the backup history file */
 1076 michael@paquier.xyz      9293                 :CBC         155 :         history_file = build_backup_content(state, true);
 1075                          9294                 :            155 :         fprintf(fp, "%s", history_file);
 1076                          9295                 :            155 :         pfree(history_file);
                               9296                 :                : 
 2954 rhaas@postgresql.org     9297   [ +  -  +  -  :            155 :         if (fflush(fp) || ferror(fp) || FreeFile(fp))
                                              -  + ]
 2954 rhaas@postgresql.org     9298         [ #  # ]:UBC           0 :             ereport(ERROR,
                               9299                 :                :                     (errcode_for_file_access(),
                               9300                 :                :                      errmsg("could not write file \"%s\": %m",
                               9301                 :                :                             histfilepath)));
                               9302                 :                : 
                               9303                 :                :         /*
                               9304                 :                :          * Clean out any no-longer-needed history files.  As a side effect,
                               9305                 :                :          * this will post a .ready file for the newly created history file,
                               9306                 :                :          * notifying the archiver that history file may be archived
                               9307                 :                :          * immediately.
                               9308                 :                :          */
 2954 rhaas@postgresql.org     9309                 :CBC         155 :         CleanupBackupHistory();
                               9310                 :                :     }
                               9311                 :                : 
                               9312                 :                :     /*
                               9313                 :                :      * If archiving is enabled, wait for all the required WAL files to be
                               9314                 :                :      * archived before returning. If archiving isn't enabled, the required WAL
                               9315                 :                :      * needs to be transported via streaming replication (hopefully with
                               9316                 :                :      * wal_keep_size set high enough), or some more exotic mechanism like
                               9317                 :                :      * polling and copying files from pg_wal with script. We have no knowledge
                               9318                 :                :      * of those mechanisms, so it's up to the user to ensure that he gets all
                               9319                 :                :      * the required WAL.
                               9320                 :                :      *
                               9321                 :                :      * We wait until both the last WAL file filled during backup and the
                               9322                 :                :      * history file have been archived, and assume that the alphabetic sorting
                               9323                 :                :      * property of the WAL files ensures any earlier WAL files are safely
                               9324                 :                :      * archived as well.
                               9325                 :                :      *
                               9326                 :                :      * We wait forever, since archive_command is supposed to work and we
                               9327                 :                :      * assume the admin wanted his backup to work completely. If you don't
                               9328                 :                :      * wish to wait, then either waitforarchive should be passed in as false,
                               9329                 :                :      * or you can set statement_timeout.  Also, some notices are issued to
                               9330                 :                :      * clue in anyone who might be doing this interactively.
                               9331                 :                :      */
                               9332                 :                : 
                               9333         [ +  + ]:            161 :     if (waitforarchive &&
 1076 michael@paquier.xyz      9334   [ +  +  +  +  :             10 :         ((!backup_stopped_in_recovery && XLogArchivingActive()) ||
                                     -  +  +  +  +  
                                                 + ]
                               9335   [ +  -  -  +  :              1 :          (backup_stopped_in_recovery && XLogArchivingAlways())))
                                              -  + ]
                               9336                 :                :     {
                               9337                 :              4 :         XLByteToPrevSeg(state->stoppoint, _logSegNo, wal_segment_size);
                               9338                 :              4 :         XLogFileName(lastxlogfilename, state->stoptli, _logSegNo,
                               9339                 :                :                      wal_segment_size);
                               9340                 :                : 
                               9341                 :              4 :         XLByteToSeg(state->startpoint, _logSegNo, wal_segment_size);
                               9342                 :              4 :         BackupHistoryFileName(histfilename, state->stoptli, _logSegNo,
                               9343                 :                :                               state->startpoint, wal_segment_size);
                               9344                 :                : 
 5541 bruce@momjian.us         9345                 :              4 :         seconds_before_warning = 60;
                               9346                 :              4 :         waits = 0;
                               9347                 :                : 
                               9348   [ +  +  -  + ]:             12 :         while (XLogArchiveIsBusy(lastxlogfilename) ||
                               9349                 :              4 :                XLogArchiveIsBusy(histfilename))
                               9350                 :                :         {
                               9351         [ -  + ]:              4 :             CHECK_FOR_INTERRUPTS();
                               9352                 :                : 
                               9353   [ +  -  -  + ]:              4 :             if (!reported_waiting && waits > 5)
                               9354                 :                :             {
 5541 bruce@momjian.us         9355         [ #  # ]:UBC           0 :                 ereport(NOTICE,
                               9356                 :                :                         (errmsg("base backup done, waiting for required WAL segments to be archived")));
                               9357                 :              0 :                 reported_waiting = true;
                               9358                 :                :             }
                               9359                 :                : 
 1523 michael@paquier.xyz      9360                 :CBC           4 :             (void) WaitLatch(MyLatch,
                               9361                 :                :                              WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
                               9362                 :                :                              1000L,
                               9363                 :                :                              WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE);
                               9364                 :              4 :             ResetLatch(MyLatch);
                               9365                 :                : 
 5541 bruce@momjian.us         9366         [ -  + ]:              4 :             if (++waits >= seconds_before_warning)
                               9367                 :                :             {
 5541 bruce@momjian.us         9368                 :UBC           0 :                 seconds_before_warning *= 2;    /* This wraps in >10 years... */
                               9369         [ #  # ]:              0 :                 ereport(WARNING,
                               9370                 :                :                         (errmsg("still waiting for all required WAL segments to be archived (%d seconds elapsed)",
                               9371                 :                :                                 waits),
                               9372                 :                :                          errhint("Check that your \"archive_command\" is executing properly.  "
                               9373                 :                :                                  "You can safely cancel this backup, "
                               9374                 :                :                                  "but the database backup will not be usable without all the WAL segments.")));
                               9375                 :                :             }
                               9376                 :                :         }
                               9377                 :                : 
 5541 bruce@momjian.us         9378         [ +  + ]:CBC           4 :         ereport(NOTICE,
                               9379                 :                :                 (errmsg("all required WAL segments have been archived")));
                               9380                 :                :     }
 5323 magnus@hagander.net      9381         [ +  + ]:            157 :     else if (waitforarchive)
 5609 tgl@sss.pgh.pa.us        9382         [ +  - ]:              6 :         ereport(NOTICE,
                               9383                 :                :                 (errmsg("WAL archiving is not enabled; you must ensure that all required WAL segments are copied through other means to complete the backup")));
 5354 magnus@hagander.net      9384                 :            161 : }
                               9385                 :                : 
                               9386                 :                : 
                               9387                 :                : /*
                               9388                 :                :  * do_pg_abort_backup: abort a running backup
                               9389                 :                :  *
                               9390                 :                :  * This does just the most basic steps of do_pg_backup_stop(), by taking the
                               9391                 :                :  * system out of backup mode, thus making it a lot more safe to call from
                               9392                 :                :  * an error handler.
                               9393                 :                :  *
                               9394                 :                :  * 'arg' indicates that it's being called during backup setup; so
                               9395                 :                :  * sessionBackupState has not been modified yet, but runningBackups has
                               9396                 :                :  * already been incremented.  When it's false, then it's invoked as a
                               9397                 :                :  * before_shmem_exit handler, and therefore we must not change state
                               9398                 :                :  * unless sessionBackupState indicates that a backup is actually running.
                               9399                 :                :  *
                               9400                 :                :  * NB: This gets used as a PG_ENSURE_ERROR_CLEANUP callback and
                               9401                 :                :  * before_shmem_exit handler, hence the odd-looking signature.
                               9402                 :                :  */
                               9403                 :                : void
 2088 rhaas@postgresql.org     9404                 :              8 : do_pg_abort_backup(int code, Datum arg)
                               9405                 :                : {
 1053 alvherre@alvh.no-ip.     9406                 :              8 :     bool        during_backup_start = DatumGetBool(arg);
                               9407                 :                : 
                               9408                 :                :     /* If called during backup start, there shouldn't be one already running */
 1048                          9409   [ -  +  -  - ]:              8 :     Assert(!during_backup_start || sessionBackupState == SESSION_BACKUP_NONE);
                               9410                 :                : 
 1053                          9411   [ +  -  +  + ]:              8 :     if (during_backup_start || sessionBackupState != SESSION_BACKUP_NONE)
                               9412                 :                :     {
                               9413                 :              6 :         WALInsertLockAcquireExclusive();
                               9414         [ -  + ]:              6 :         Assert(XLogCtl->Insert.runningBackups > 0);
                               9415                 :              6 :         XLogCtl->Insert.runningBackups--;
                               9416                 :                : 
                               9417                 :              6 :         sessionBackupState = SESSION_BACKUP_NONE;
                               9418                 :              6 :         WALInsertLockRelease();
                               9419                 :                : 
                               9420         [ +  - ]:              6 :         if (!during_backup_start)
                               9421         [ +  - ]:              6 :             ereport(WARNING,
                               9422                 :                :                     errmsg("aborting backup due to backend exiting before pg_backup_stop was called"));
                               9423                 :                :     }
 2088 rhaas@postgresql.org     9424                 :              8 : }
                               9425                 :                : 
                               9426                 :                : /*
                               9427                 :                :  * Register a handler that will warn about unterminated backups at end of
                               9428                 :                :  * session, unless this has already been done.
                               9429                 :                :  */
                               9430                 :                : void
                               9431                 :              4 : register_persistent_abort_backup_handler(void)
                               9432                 :                : {
                               9433                 :                :     static bool already_done = false;
                               9434                 :                : 
                               9435         [ +  + ]:              4 :     if (already_done)
                               9436                 :              1 :         return;
   32 peter@eisentraut.org     9437                 :GNC           3 :     before_shmem_exit(do_pg_abort_backup, BoolGetDatum(false));
 2088 rhaas@postgresql.org     9438                 :CBC           3 :     already_done = true;
                               9439                 :                : }
                               9440                 :                : 
                               9441                 :                : /*
                               9442                 :                :  * Get latest WAL insert pointer
                               9443                 :                :  */
                               9444                 :                : XLogRecPtr
 4987 heikki.linnakangas@i     9445                 :           1940 : GetXLogInsertRecPtr(void)
                               9446                 :                : {
 4002 andres@anarazel.de       9447                 :           1940 :     XLogCtlInsert *Insert = &XLogCtl->Insert;
                               9448                 :                :     uint64      current_bytepos;
                               9449                 :                : 
 4443 heikki.linnakangas@i     9450         [ -  + ]:           1940 :     SpinLockAcquire(&Insert->insertpos_lck);
                               9451                 :           1940 :     current_bytepos = Insert->CurrBytePos;
                               9452                 :           1940 :     SpinLockRelease(&Insert->insertpos_lck);
                               9453                 :                : 
                               9454                 :           1940 :     return XLogBytePosToRecPtr(current_bytepos);
                               9455                 :                : }
                               9456                 :                : 
                               9457                 :                : /*
                               9458                 :                :  * Get latest WAL write pointer
                               9459                 :                :  */
                               9460                 :                : XLogRecPtr
 1298                          9461                 :           1494 : GetXLogWriteRecPtr(void)
                               9462                 :                : {
  521 alvherre@alvh.no-ip.     9463                 :           1494 :     RefreshXLogWriteResult(LogwrtResult);
                               9464                 :                : 
 1298 heikki.linnakangas@i     9465                 :           1494 :     return LogwrtResult.Write;
                               9466                 :                : }
                               9467                 :                : 
                               9468                 :                : /*
                               9469                 :                :  * Returns the redo pointer of the last checkpoint or restartpoint. This is
                               9470                 :                :  * the oldest point in WAL that we still need, if we have to restart recovery.
                               9471                 :                :  */
                               9472                 :                : void
                               9473                 :            381 : GetOldestRestartPoint(XLogRecPtr *oldrecptr, TimeLineID *oldtli)
                               9474                 :                : {
                               9475                 :            381 :     LWLockAcquire(ControlFileLock, LW_SHARED);
                               9476                 :            381 :     *oldrecptr = ControlFile->checkPointCopy.redo;
                               9477                 :            381 :     *oldtli = ControlFile->checkPointCopy.ThisTimeLineID;
                               9478                 :            381 :     LWLockRelease(ControlFileLock);
 7106 tgl@sss.pgh.pa.us        9479                 :            381 : }
                               9480                 :                : 
                               9481                 :                : /* Thin wrapper around ShutdownWalRcv(). */
                               9482                 :                : void
 1531 noah@leadboat.com        9483                 :            998 : XLogShutdownWalRcv(void)
                               9484                 :                : {
                               9485                 :            998 :     ShutdownWalRcv();
                               9486                 :                : 
                               9487                 :            998 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               9488                 :            998 :     XLogCtl->InstallXLogFileSegmentActive = false;
                               9489                 :            998 :     LWLockRelease(ControlFileLock);
                               9490                 :            998 : }
                               9491                 :                : 
                               9492                 :                : /* Enable WAL file recycling and preallocation. */
                               9493                 :                : void
 1298 heikki.linnakangas@i     9494                 :           1042 : SetInstallXLogFileSegmentActive(void)
                               9495                 :                : {
                               9496                 :           1042 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               9497                 :           1042 :     XLogCtl->InstallXLogFileSegmentActive = true;
                               9498                 :           1042 :     LWLockRelease(ControlFileLock);
 3650 fujii@postgresql.org     9499                 :           1042 : }
                               9500                 :                : 
                               9501                 :                : bool
 1298 heikki.linnakangas@i     9502                 :            358 : IsInstallXLogFileSegmentActive(void)
                               9503                 :                : {
                               9504                 :                :     bool        result;
                               9505                 :                : 
                               9506                 :            358 :     LWLockAcquire(ControlFileLock, LW_SHARED);
                               9507                 :            358 :     result = XLogCtl->InstallXLogFileSegmentActive;
                               9508                 :            358 :     LWLockRelease(ControlFileLock);
                               9509                 :                : 
                               9510                 :            358 :     return result;
                               9511                 :                : }
                               9512                 :                : 
                               9513                 :                : /*
                               9514                 :                :  * Update the WalWriterSleeping flag.
                               9515                 :                :  */
                               9516                 :                : void
 4869 tgl@sss.pgh.pa.us        9517                 :            449 : SetWalWriterSleeping(bool sleeping)
                               9518                 :                : {
 4002 andres@anarazel.de       9519         [ -  + ]:            449 :     SpinLockAcquire(&XLogCtl->info_lck);
                               9520                 :            449 :     XLogCtl->WalWriterSleeping = sleeping;
                               9521                 :            449 :     SpinLockRelease(&XLogCtl->info_lck);
 4869 tgl@sss.pgh.pa.us        9522                 :            449 : }
        

Generated by: LCOV version 2.4-beta