LCOV - c3df85756ceb0246958ef2b72c04aba51e52de13 vs 167cb26718e3eae4fef470900b4cd1d434f15649

LCOV - differential code coverage report

Current view:	top level - src/backend/access/transam - xlog.c (source / functions)		Coverage	Total	Hit	UNC	LBC	UBC	GBC	GNC	CBC	DUB	DCB
Current:	c3df85756ceb0246958ef2b72c04aba51e52de13 vs 167cb26718e3eae4fef470900b4cd1d434f15649	Lines:	88.7 %	2531	2246	10	4	271	1	53	2192	8	55
Current Date:	2025-12-18 07:33:40 +0900	Functions:	98.4 %	122	120			2		25	95		1
Baseline:	lcov-20251218-005734-baseline	Branches:	63.9 %	1811	1158	37	6	610	3	59	1096
Baseline Date:	2025-12-17 11:55:04 -0800	Line coverage date bins:
Legend:	Lines: hit not hit Branches: + taken - not taken # not executed	(7,30] days:	80.0 %	10	8	2				8
		(30,360] days:	89.0 %	109	97	8		4		45	52
		(360..) days:	88.8 %	2412	2141		4	267	1		2140
		Function coverage date bins:
		(30,360] days:	100.0 %	3	3					1	2
		(360..) days:	98.3 %	119	117			2		24	93
		Branch coverage date bins:
		(7,30] days:	50.0 %	2	1	1				1
		(30,360] days:	63.5 %	126	80	36		10		58	22
		(360..) days:	64.0 %	1683	1077		6	600	3		1074

 Age         Owner                    Branch data    TLA  Line data    Source code

                                  1                 :                : /*-------------------------------------------------------------------------
                                  2                 :                :  *
                                  3                 :                :  * xlog.c
                                  4                 :                :  *      PostgreSQL write-ahead log manager
                                  5                 :                :  *
                                  6                 :                :  * The Write-Ahead Log (WAL) functionality is split into several source
                                  7                 :                :  * files, in addition to this one:
                                  8                 :                :  *
                                  9                 :                :  * xloginsert.c - Functions for constructing WAL records
                                 10                 :                :  * xlogrecovery.c - WAL recovery and standby code
                                 11                 :                :  * xlogreader.c - Facility for reading WAL files and parsing WAL records
                                 12                 :                :  * xlogutils.c - Helper functions for WAL redo routines
                                 13                 :                :  *
                                 14                 :                :  * This file contains functions for coordinating database startup and
                                 15                 :                :  * checkpointing, and managing the write-ahead log buffers when the
                                 16                 :                :  * system is running.
                                 17                 :                :  *
                                 18                 :                :  * StartupXLOG() is the main entry point of the startup process.  It
                                 19                 :                :  * coordinates database startup, performing WAL recovery, and the
                                 20                 :                :  * transition from WAL recovery into normal operations.
                                 21                 :                :  *
                                 22                 :                :  * XLogInsertRecord() inserts a WAL record into the WAL buffers.  Most
                                 23                 :                :  * callers should not call this directly, but use the functions in
                                 24                 :                :  * xloginsert.c to construct the WAL record.  XLogFlush() can be used
                                 25                 :                :  * to force the WAL to disk.
                                 26                 :                :  *
                                 27                 :                :  * In addition to those, there are many other functions for interrogating
                                 28                 :                :  * the current system state, and for starting/stopping backups.
                                 29                 :                :  *
                                 30                 :                :  *
                                 31                 :                :  * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
                                 32                 :                :  * Portions Copyright (c) 1994, Regents of the University of California
                                 33                 :                :  *
                                 34                 :                :  * src/backend/access/transam/xlog.c
                                 35                 :                :  *
                                 36                 :                :  *-------------------------------------------------------------------------
                                 37                 :                :  */
                                 38                 :                : 
                                 39                 :                : #include "postgres.h"
                                 40                 :                : 
                                 41                 :                : #include <ctype.h>
                                 42                 :                : #include <math.h>
                                 43                 :                : #include <time.h>
                                 44                 :                : #include <fcntl.h>
                                 45                 :                : #include <sys/stat.h>
                                 46                 :                : #include <sys/time.h>
                                 47                 :                : #include <unistd.h>
                                 48                 :                : 
                                 49                 :                : #include "access/clog.h"
                                 50                 :                : #include "access/commit_ts.h"
                                 51                 :                : #include "access/heaptoast.h"
                                 52                 :                : #include "access/multixact.h"
                                 53                 :                : #include "access/rewriteheap.h"
                                 54                 :                : #include "access/subtrans.h"
                                 55                 :                : #include "access/timeline.h"
                                 56                 :                : #include "access/transam.h"
                                 57                 :                : #include "access/twophase.h"
                                 58                 :                : #include "access/xact.h"
                                 59                 :                : #include "access/xlog_internal.h"
                                 60                 :                : #include "access/xlogarchive.h"
                                 61                 :                : #include "access/xloginsert.h"
                                 62                 :                : #include "access/xlogreader.h"
                                 63                 :                : #include "access/xlogrecovery.h"
                                 64                 :                : #include "access/xlogutils.h"
                                 65                 :                : #include "access/xlogwait.h"
                                 66                 :                : #include "backup/basebackup.h"
                                 67                 :                : #include "catalog/catversion.h"
                                 68                 :                : #include "catalog/pg_control.h"
                                 69                 :                : #include "catalog/pg_database.h"
                                 70                 :                : #include "common/controldata_utils.h"
                                 71                 :                : #include "common/file_utils.h"
                                 72                 :                : #include "executor/instrument.h"
                                 73                 :                : #include "miscadmin.h"
                                 74                 :                : #include "pg_trace.h"
                                 75                 :                : #include "pgstat.h"
                                 76                 :                : #include "port/atomics.h"
                                 77                 :                : #include "postmaster/bgwriter.h"
                                 78                 :                : #include "postmaster/startup.h"
                                 79                 :                : #include "postmaster/walsummarizer.h"
                                 80                 :                : #include "postmaster/walwriter.h"
                                 81                 :                : #include "replication/origin.h"
                                 82                 :                : #include "replication/slot.h"
                                 83                 :                : #include "replication/snapbuild.h"
                                 84                 :                : #include "replication/walreceiver.h"
                                 85                 :                : #include "replication/walsender.h"
                                 86                 :                : #include "storage/bufmgr.h"
                                 87                 :                : #include "storage/fd.h"
                                 88                 :                : #include "storage/ipc.h"
                                 89                 :                : #include "storage/large_object.h"
                                 90                 :                : #include "storage/latch.h"
                                 91                 :                : #include "storage/predicate.h"
                                 92                 :                : #include "storage/proc.h"
                                 93                 :                : #include "storage/procarray.h"
                                 94                 :                : #include "storage/reinit.h"
                                 95                 :                : #include "storage/spin.h"
                                 96                 :                : #include "storage/sync.h"
                                 97                 :                : #include "utils/guc_hooks.h"
                                 98                 :                : #include "utils/guc_tables.h"
                                 99                 :                : #include "utils/injection_point.h"
                                100                 :                : #include "utils/pgstat_internal.h"
                                101                 :                : #include "utils/ps_status.h"
                                102                 :                : #include "utils/relmapper.h"
                                103                 :                : #include "utils/snapmgr.h"
                                104                 :                : #include "utils/timeout.h"
                                105                 :                : #include "utils/timestamp.h"
                                106                 :                : #include "utils/varlena.h"
                                107                 :                : 
                                108                 :                : #ifdef WAL_DEBUG
                                109                 :                : #include "utils/memutils.h"
                                110                 :                : #endif
                                111                 :                : 
                                112                 :                : /* timeline ID to be used when bootstrapping */
                                113                 :                : #define BootstrapTimeLineID     1
                                114                 :                : 
                                115                 :                : /* User-settable parameters */
                                116                 :                : int         max_wal_size_mb = 1024; /* 1 GB */
                                117                 :                : int         min_wal_size_mb = 80;   /* 80 MB */
                                118                 :                : int         wal_keep_size_mb = 0;
                                119                 :                : int         XLOGbuffers = -1;
                                120                 :                : int         XLogArchiveTimeout = 0;
                                121                 :                : int         XLogArchiveMode = ARCHIVE_MODE_OFF;
                                122                 :                : char       *XLogArchiveCommand = NULL;
                                123                 :                : bool        EnableHotStandby = false;
                                124                 :                : bool        fullPageWrites = true;
                                125                 :                : bool        wal_log_hints = false;
                                126                 :                : int         wal_compression = WAL_COMPRESSION_NONE;
                                127                 :                : char       *wal_consistency_checking_string = NULL;
                                128                 :                : bool       *wal_consistency_checking = NULL;
                                129                 :                : bool        wal_init_zero = true;
                                130                 :                : bool        wal_recycle = true;
                                131                 :                : bool        log_checkpoints = true;
                                132                 :                : int         wal_sync_method = DEFAULT_WAL_SYNC_METHOD;
                                133                 :                : int         wal_level = WAL_LEVEL_REPLICA;
                                134                 :                : int         CommitDelay = 0;    /* precommit delay in microseconds */
                                135                 :                : int         CommitSiblings = 5; /* # concurrent xacts needed to sleep */
                                136                 :                : int         wal_retrieve_retry_interval = 5000;
                                137                 :                : int         max_slot_wal_keep_size_mb = -1;
                                138                 :                : int         wal_decode_buffer_size = 512 * 1024;
                                139                 :                : bool        track_wal_io_timing = false;
                                140                 :                : 
                                141                 :                : #ifdef WAL_DEBUG
                                142                 :                : bool        XLOG_DEBUG = false;
                                143                 :                : #endif
                                144                 :                : 
                                145                 :                : int         wal_segment_size = DEFAULT_XLOG_SEG_SIZE;
                                146                 :                : 
                                147                 :                : /*
                                148                 :                :  * Number of WAL insertion locks to use. A higher value allows more insertions
                                149                 :                :  * to happen concurrently, but adds some CPU overhead to flushing the WAL,
                                150                 :                :  * which needs to iterate all the locks.
                                151                 :                :  */
                                152                 :                : #define NUM_XLOGINSERT_LOCKS  8
                                153                 :                : 
                                154                 :                : /*
                                155                 :                :  * Max distance from last checkpoint, before triggering a new xlog-based
                                156                 :                :  * checkpoint.
                                157                 :                :  */
                                158                 :                : int         CheckPointSegments;
                                159                 :                : 
                                160                 :                : /* Estimated distance between checkpoints, in bytes */
                                161                 :                : static double CheckPointDistanceEstimate = 0;
                                162                 :                : static double PrevCheckPointDistance = 0;
                                163                 :                : 
                                164                 :                : /*
                                165                 :                :  * Track whether there were any deferred checks for custom resource managers
                                166                 :                :  * specified in wal_consistency_checking.
                                167                 :                :  */
                                168                 :                : static bool check_wal_consistency_checking_deferred = false;
                                169                 :                : 
                                170                 :                : /*
                                171                 :                :  * GUC support
                                172                 :                :  */
                                173                 :                : const struct config_enum_entry wal_sync_method_options[] = {
                                174                 :                :     {"fsync", WAL_SYNC_METHOD_FSYNC, false},
                                175                 :                : #ifdef HAVE_FSYNC_WRITETHROUGH
                                176                 :                :     {"fsync_writethrough", WAL_SYNC_METHOD_FSYNC_WRITETHROUGH, false},
                                177                 :                : #endif
                                178                 :                :     {"fdatasync", WAL_SYNC_METHOD_FDATASYNC, false},
                                179                 :                : #ifdef O_SYNC
                                180                 :                :     {"open_sync", WAL_SYNC_METHOD_OPEN, false},
                                181                 :                : #endif
                                182                 :                : #ifdef O_DSYNC
                                183                 :                :     {"open_datasync", WAL_SYNC_METHOD_OPEN_DSYNC, false},
                                184                 :                : #endif
                                185                 :                :     {NULL, 0, false}
                                186                 :                : };
                                187                 :                : 
                                188                 :                : 
                                189                 :                : /*
                                190                 :                :  * Although only "on", "off", and "always" are documented,
                                191                 :                :  * we accept all the likely variants of "on" and "off".
                                192                 :                :  */
                                193                 :                : const struct config_enum_entry archive_mode_options[] = {
                                194                 :                :     {"always", ARCHIVE_MODE_ALWAYS, false},
                                195                 :                :     {"on", ARCHIVE_MODE_ON, false},
                                196                 :                :     {"off", ARCHIVE_MODE_OFF, false},
                                197                 :                :     {"true", ARCHIVE_MODE_ON, true},
                                198                 :                :     {"false", ARCHIVE_MODE_OFF, true},
                                199                 :                :     {"yes", ARCHIVE_MODE_ON, true},
                                200                 :                :     {"no", ARCHIVE_MODE_OFF, true},
                                201                 :                :     {"1", ARCHIVE_MODE_ON, true},
                                202                 :                :     {"0", ARCHIVE_MODE_OFF, true},
                                203                 :                :     {NULL, 0, false}
                                204                 :                : };
                                205                 :                : 
                                206                 :                : /*
                                207                 :                :  * Statistics for current checkpoint are collected in this global struct.
                                208                 :                :  * Because only the checkpointer or a stand-alone backend can perform
                                209                 :                :  * checkpoints, this will be unused in normal backends.
                                210                 :                :  */
                                211                 :                : CheckpointStatsData CheckpointStats;
                                212                 :                : 
                                213                 :                : /*
                                214                 :                :  * During recovery, lastFullPageWrites keeps track of full_page_writes that
                                215                 :                :  * the replayed WAL records indicate. It's initialized with full_page_writes
                                216                 :                :  * that the recovery starting checkpoint record indicates, and then updated
                                217                 :                :  * each time XLOG_FPW_CHANGE record is replayed.
                                218                 :                :  */
                                219                 :                : static bool lastFullPageWrites;
                                220                 :                : 
                                221                 :                : /*
                                222                 :                :  * Local copy of the state tracked by SharedRecoveryState in shared memory,
                                223                 :                :  * It is false if SharedRecoveryState is RECOVERY_STATE_DONE.  True actually
                                224                 :                :  * means "not known, need to check the shared state".
                                225                 :                :  */
                                226                 :                : static bool LocalRecoveryInProgress = true;
                                227                 :                : 
                                228                 :                : /*
                                229                 :                :  * Local state for XLogInsertAllowed():
                                230                 :                :  *      1: unconditionally allowed to insert XLOG
                                231                 :                :  *      0: unconditionally not allowed to insert XLOG
                                232                 :                :  *      -1: must check RecoveryInProgress(); disallow until it is false
                                233                 :                :  * Most processes start with -1 and transition to 1 after seeing that recovery
                                234                 :                :  * is not in progress.  But we can also force the value for special cases.
                                235                 :                :  * The coding in XLogInsertAllowed() depends on the first two of these states
                                236                 :                :  * being numerically the same as bool true and false.
                                237                 :                :  */
                                238                 :                : static int  LocalXLogInsertAllowed = -1;
                                239                 :                : 
                                240                 :                : /*
                                241                 :                :  * ProcLastRecPtr points to the start of the last XLOG record inserted by the
                                242                 :                :  * current backend.  It is updated for all inserts.  XactLastRecEnd points to
                                243                 :                :  * end+1 of the last record, and is reset when we end a top-level transaction,
                                244                 :                :  * or start a new one; so it can be used to tell if the current transaction has
                                245                 :                :  * created any XLOG records.
                                246                 :                :  *
                                247                 :                :  * While in parallel mode, this may not be fully up to date.  When committing,
                                248                 :                :  * a transaction can assume this covers all xlog records written either by the
                                249                 :                :  * user backend or by any parallel worker which was present at any point during
                                250                 :                :  * the transaction.  But when aborting, or when still in parallel mode, other
                                251                 :                :  * parallel backends may have written WAL records at later LSNs than the value
                                252                 :                :  * stored here.  The parallel leader advances its own copy, when necessary,
                                253                 :                :  * in WaitForParallelWorkersToFinish.
                                254                 :                :  */
                                255                 :                : XLogRecPtr  ProcLastRecPtr = InvalidXLogRecPtr;
                                256                 :                : XLogRecPtr  XactLastRecEnd = InvalidXLogRecPtr;
                                257                 :                : XLogRecPtr  XactLastCommitEnd = InvalidXLogRecPtr;
                                258                 :                : 
                                259                 :                : /*
                                260                 :                :  * RedoRecPtr is this backend's local copy of the REDO record pointer
                                261                 :                :  * (which is almost but not quite the same as a pointer to the most recent
                                262                 :                :  * CHECKPOINT record).  We update this from the shared-memory copy,
                                263                 :                :  * XLogCtl->Insert.RedoRecPtr, whenever we can safely do so (ie, when we
                                264                 :                :  * hold an insertion lock).  See XLogInsertRecord for details.  We are also
                                265                 :                :  * allowed to update from XLogCtl->RedoRecPtr if we hold the info_lck;
                                266                 :                :  * see GetRedoRecPtr.
                                267                 :                :  *
                                268                 :                :  * NB: Code that uses this variable must be prepared not only for the
                                269                 :                :  * possibility that it may be arbitrarily out of date, but also for the
                                270                 :                :  * possibility that it might be set to InvalidXLogRecPtr. We used to
                                271                 :                :  * initialize it as a side effect of the first call to RecoveryInProgress(),
                                272                 :                :  * which meant that most code that might use it could assume that it had a
                                273                 :                :  * real if perhaps stale value. That's no longer the case.
                                274                 :                :  */
                                275                 :                : static XLogRecPtr RedoRecPtr;
                                276                 :                : 
                                277                 :                : /*
                                278                 :                :  * doPageWrites is this backend's local copy of (fullPageWrites ||
                                279                 :                :  * runningBackups > 0).  It is used together with RedoRecPtr to decide whether
                                280                 :                :  * a full-page image of a page need to be taken.
                                281                 :                :  *
                                282                 :                :  * NB: Initially this is false, and there's no guarantee that it will be
                                283                 :                :  * initialized to any other value before it is first used. Any code that
                                284                 :                :  * makes use of it must recheck the value after obtaining a WALInsertLock,
                                285                 :                :  * and respond appropriately if it turns out that the previous value wasn't
                                286                 :                :  * accurate.
                                287                 :                :  */
                                288                 :                : static bool doPageWrites;
                                289                 :                : 
                                290                 :                : /*----------
                                291                 :                :  * Shared-memory data structures for XLOG control
                                292                 :                :  *
                                293                 :                :  * LogwrtRqst indicates a byte position that we need to write and/or fsync
                                294                 :                :  * the log up to (all records before that point must be written or fsynced).
                                295                 :                :  * The positions already written/fsynced are maintained in logWriteResult
                                296                 :                :  * and logFlushResult using atomic access.
                                297                 :                :  * In addition to the shared variable, each backend has a private copy of
                                298                 :                :  * both in LogwrtResult, which is updated when convenient.
                                299                 :                :  *
                                300                 :                :  * The request bookkeeping is simpler: there is a shared XLogCtl->LogwrtRqst
                                301                 :                :  * (protected by info_lck), but we don't need to cache any copies of it.
                                302                 :                :  *
                                303                 :                :  * info_lck is only held long enough to read/update the protected variables,
                                304                 :                :  * so it's a plain spinlock.  The other locks are held longer (potentially
                                305                 :                :  * over I/O operations), so we use LWLocks for them.  These locks are:
                                306                 :                :  *
                                307                 :                :  * WALBufMappingLock: must be held to replace a page in the WAL buffer cache.
                                308                 :                :  * It is only held while initializing and changing the mapping.  If the
                                309                 :                :  * contents of the buffer being replaced haven't been written yet, the mapping
                                310                 :                :  * lock is released while the write is done, and reacquired afterwards.
                                311                 :                :  *
                                312                 :                :  * WALWriteLock: must be held to write WAL buffers to disk (XLogWrite or
                                313                 :                :  * XLogFlush).
                                314                 :                :  *
                                315                 :                :  * ControlFileLock: must be held to read/update control file or create
                                316                 :                :  * new log file.
                                317                 :                :  *
                                318                 :                :  *----------
                                319                 :                :  */
                                320                 :                : 
                                321                 :                : typedef struct XLogwrtRqst
                                322                 :                : {
                                323                 :                :     XLogRecPtr  Write;          /* last byte + 1 to write out */
                                324                 :                :     XLogRecPtr  Flush;          /* last byte + 1 to flush */
                                325                 :                : } XLogwrtRqst;
                                326                 :                : 
                                327                 :                : typedef struct XLogwrtResult
                                328                 :                : {
                                329                 :                :     XLogRecPtr  Write;          /* last byte + 1 written out */
                                330                 :                :     XLogRecPtr  Flush;          /* last byte + 1 flushed */
                                331                 :                : } XLogwrtResult;
                                332                 :                : 
                                333                 :                : /*
                                334                 :                :  * Inserting to WAL is protected by a small fixed number of WAL insertion
                                335                 :                :  * locks. To insert to the WAL, you must hold one of the locks - it doesn't
                                336                 :                :  * matter which one. To lock out other concurrent insertions, you must hold
                                337                 :                :  * of them. Each WAL insertion lock consists of a lightweight lock, plus an
                                338                 :                :  * indicator of how far the insertion has progressed (insertingAt).
                                339                 :                :  *
                                340                 :                :  * The insertingAt values are read when a process wants to flush WAL from
                                341                 :                :  * the in-memory buffers to disk, to check that all the insertions to the
                                342                 :                :  * region the process is about to write out have finished. You could simply
                                343                 :                :  * wait for all currently in-progress insertions to finish, but the
                                344                 :                :  * insertingAt indicator allows you to ignore insertions to later in the WAL,
                                345                 :                :  * so that you only wait for the insertions that are modifying the buffers
                                346                 :                :  * you're about to write out.
                                347                 :                :  *
                                348                 :                :  * This isn't just an optimization. If all the WAL buffers are dirty, an
                                349                 :                :  * inserter that's holding a WAL insert lock might need to evict an old WAL
                                350                 :                :  * buffer, which requires flushing the WAL. If it's possible for an inserter
                                351                 :                :  * to block on another inserter unnecessarily, deadlock can arise when two
                                352                 :                :  * inserters holding a WAL insert lock wait for each other to finish their
                                353                 :                :  * insertion.
                                354                 :                :  *
                                355                 :                :  * Small WAL records that don't cross a page boundary never update the value,
                                356                 :                :  * the WAL record is just copied to the page and the lock is released. But
                                357                 :                :  * to avoid the deadlock-scenario explained above, the indicator is always
                                358                 :                :  * updated before sleeping while holding an insertion lock.
                                359                 :                :  *
                                360                 :                :  * lastImportantAt contains the LSN of the last important WAL record inserted
                                361                 :                :  * using a given lock. This value is used to detect if there has been
                                362                 :                :  * important WAL activity since the last time some action, like a checkpoint,
                                363                 :                :  * was performed - allowing to not repeat the action if not. The LSN is
                                364                 :                :  * updated for all insertions, unless the XLOG_MARK_UNIMPORTANT flag was
                                365                 :                :  * set. lastImportantAt is never cleared, only overwritten by the LSN of newer
                                366                 :                :  * records.  Tracking the WAL activity directly in WALInsertLock has the
                                367                 :                :  * advantage of not needing any additional locks to update the value.
                                368                 :                :  */
                                369                 :                : typedef struct
                                370                 :                : {
                                371                 :                :     LWLock      lock;
                                372                 :                :     pg_atomic_uint64 insertingAt;
                                373                 :                :     XLogRecPtr  lastImportantAt;
                                374                 :                : } WALInsertLock;
                                375                 :                : 
                                376                 :                : /*
                                377                 :                :  * All the WAL insertion locks are allocated as an array in shared memory. We
                                378                 :                :  * force the array stride to be a power of 2, which saves a few cycles in
                                379                 :                :  * indexing, but more importantly also ensures that individual slots don't
                                380                 :                :  * cross cache line boundaries. (Of course, we have to also ensure that the
                                381                 :                :  * array start address is suitably aligned.)
                                382                 :                :  */
                                383                 :                : typedef union WALInsertLockPadded
                                384                 :                : {
                                385                 :                :     WALInsertLock l;
                                386                 :                :     char        pad[PG_CACHE_LINE_SIZE];
                                387                 :                : } WALInsertLockPadded;
                                388                 :                : 
                                389                 :                : /*
                                390                 :                :  * Session status of running backup, used for sanity checks in SQL-callable
                                391                 :                :  * functions to start and stop backups.
                                392                 :                :  */
                                393                 :                : static SessionBackupState sessionBackupState = SESSION_BACKUP_NONE;
                                394                 :                : 
                                395                 :                : /*
                                396                 :                :  * Shared state data for WAL insertion.
                                397                 :                :  */
                                398                 :                : typedef struct XLogCtlInsert
                                399                 :                : {
                                400                 :                :     slock_t     insertpos_lck;  /* protects CurrBytePos and PrevBytePos */
                                401                 :                : 
                                402                 :                :     /*
                                403                 :                :      * CurrBytePos is the end of reserved WAL. The next record will be
                                404                 :                :      * inserted at that position. PrevBytePos is the start position of the
                                405                 :                :      * previously inserted (or rather, reserved) record - it is copied to the
                                406                 :                :      * prev-link of the next record. These are stored as "usable byte
                                407                 :                :      * positions" rather than XLogRecPtrs (see XLogBytePosToRecPtr()).
                                408                 :                :      */
                                409                 :                :     uint64      CurrBytePos;
                                410                 :                :     uint64      PrevBytePos;
                                411                 :                : 
                                412                 :                :     /*
                                413                 :                :      * Make sure the above heavily-contended spinlock and byte positions are
                                414                 :                :      * on their own cache line. In particular, the RedoRecPtr and full page
                                415                 :                :      * write variables below should be on a different cache line. They are
                                416                 :                :      * read on every WAL insertion, but updated rarely, and we don't want
                                417                 :                :      * those reads to steal the cache line containing Curr/PrevBytePos.
                                418                 :                :      */
                                419                 :                :     char        pad[PG_CACHE_LINE_SIZE];
                                420                 :                : 
                                421                 :                :     /*
                                422                 :                :      * fullPageWrites is the authoritative value used by all backends to
                                423                 :                :      * determine whether to write full-page image to WAL. This shared value,
                                424                 :                :      * instead of the process-local fullPageWrites, is required because, when
                                425                 :                :      * full_page_writes is changed by SIGHUP, we must WAL-log it before it
                                426                 :                :      * actually affects WAL-logging by backends.  Checkpointer sets at startup
                                427                 :                :      * or after SIGHUP.
                                428                 :                :      *
                                429                 :                :      * To read these fields, you must hold an insertion lock. To modify them,
                                430                 :                :      * you must hold ALL the locks.
                                431                 :                :      */
                                432                 :                :     XLogRecPtr  RedoRecPtr;     /* current redo point for insertions */
                                433                 :                :     bool        fullPageWrites;
                                434                 :                : 
                                435                 :                :     /*
                                436                 :                :      * runningBackups is a counter indicating the number of backups currently
                                437                 :                :      * in progress. lastBackupStart is the latest checkpoint redo location
                                438                 :                :      * used as a starting point for an online backup.
                                439                 :                :      */
                                440                 :                :     int         runningBackups;
                                441                 :                :     XLogRecPtr  lastBackupStart;
                                442                 :                : 
                                443                 :                :     /*
                                444                 :                :      * WAL insertion locks.
                                445                 :                :      */
                                446                 :                :     WALInsertLockPadded *WALInsertLocks;
                                447                 :                : } XLogCtlInsert;
                                448                 :                : 
                                449                 :                : /*
                                450                 :                :  * Total shared-memory state for XLOG.
                                451                 :                :  */
                                452                 :                : typedef struct XLogCtlData
                                453                 :                : {
                                454                 :                :     XLogCtlInsert Insert;
                                455                 :                : 
                                456                 :                :     /* Protected by info_lck: */
                                457                 :                :     XLogwrtRqst LogwrtRqst;
                                458                 :                :     XLogRecPtr  RedoRecPtr;     /* a recent copy of Insert->RedoRecPtr */
                                459                 :                :     XLogRecPtr  asyncXactLSN;   /* LSN of newest async commit/abort */
                                460                 :                :     XLogRecPtr  replicationSlotMinLSN;  /* oldest LSN needed by any slot */
                                461                 :                : 
                                462                 :                :     XLogSegNo   lastRemovedSegNo;   /* latest removed/recycled XLOG segment */
                                463                 :                : 
                                464                 :                :     /* Fake LSN counter, for unlogged relations. */
                                465                 :                :     pg_atomic_uint64 unloggedLSN;
                                466                 :                : 
                                467                 :                :     /* Time and LSN of last xlog segment switch. Protected by WALWriteLock. */
                                468                 :                :     pg_time_t   lastSegSwitchTime;
                                469                 :                :     XLogRecPtr  lastSegSwitchLSN;
                                470                 :                : 
                                471                 :                :     /* These are accessed using atomics -- info_lck not needed */
                                472                 :                :     pg_atomic_uint64 logInsertResult;   /* last byte + 1 inserted to buffers */
                                473                 :                :     pg_atomic_uint64 logWriteResult;    /* last byte + 1 written out */
                                474                 :                :     pg_atomic_uint64 logFlushResult;    /* last byte + 1 flushed */
                                475                 :                : 
                                476                 :                :     /*
                                477                 :                :      * Latest initialized page in the cache (last byte position + 1).
                                478                 :                :      *
                                479                 :                :      * To change the identity of a buffer (and InitializedUpTo), you need to
                                480                 :                :      * hold WALBufMappingLock.  To change the identity of a buffer that's
                                481                 :                :      * still dirty, the old page needs to be written out first, and for that
                                482                 :                :      * you need WALWriteLock, and you need to ensure that there are no
                                483                 :                :      * in-progress insertions to the page by calling
                                484                 :                :      * WaitXLogInsertionsToFinish().
                                485                 :                :      */
                                486                 :                :     XLogRecPtr  InitializedUpTo;
                                487                 :                : 
                                488                 :                :     /*
                                489                 :                :      * These values do not change after startup, although the pointed-to pages
                                490                 :                :      * and xlblocks values certainly do.  xlblocks values are protected by
                                491                 :                :      * WALBufMappingLock.
                                492                 :                :      */
                                493                 :                :     char       *pages;          /* buffers for unwritten XLOG pages */
                                494                 :                :     pg_atomic_uint64 *xlblocks; /* 1st byte ptr-s + XLOG_BLCKSZ */
                                495                 :                :     int         XLogCacheBlck;  /* highest allocated xlog buffer index */
                                496                 :                : 
                                497                 :                :     /*
                                498                 :                :      * InsertTimeLineID is the timeline into which new WAL is being inserted
                                499                 :                :      * and flushed. It is zero during recovery, and does not change once set.
                                500                 :                :      *
                                501                 :                :      * If we create a new timeline when the system was started up,
                                502                 :                :      * PrevTimeLineID is the old timeline's ID that we forked off from.
                                503                 :                :      * Otherwise it's equal to InsertTimeLineID.
                                504                 :                :      *
                                505                 :                :      * We set these fields while holding info_lck. Most that reads these
                                506                 :                :      * values knows that recovery is no longer in progress and so can safely
                                507                 :                :      * read the value without a lock, but code that could be run either during
                                508                 :                :      * or after recovery can take info_lck while reading these values.
                                509                 :                :      */
                                510                 :                :     TimeLineID  InsertTimeLineID;
                                511                 :                :     TimeLineID  PrevTimeLineID;
                                512                 :                : 
                                513                 :                :     /*
                                514                 :                :      * SharedRecoveryState indicates if we're still in crash or archive
                                515                 :                :      * recovery.  Protected by info_lck.
                                516                 :                :      */
                                517                 :                :     RecoveryState SharedRecoveryState;
                                518                 :                : 
                                519                 :                :     /*
                                520                 :                :      * InstallXLogFileSegmentActive indicates whether the checkpointer should
                                521                 :                :      * arrange for future segments by recycling and/or PreallocXlogFiles().
                                522                 :                :      * Protected by ControlFileLock.  Only the startup process changes it.  If
                                523                 :                :      * true, anyone can use InstallXLogFileSegment().  If false, the startup
                                524                 :                :      * process owns the exclusive right to install segments, by reading from
                                525                 :                :      * the archive and possibly replacing existing files.
                                526                 :                :      */
                                527                 :                :     bool        InstallXLogFileSegmentActive;
                                528                 :                : 
                                529                 :                :     /*
                                530                 :                :      * WalWriterSleeping indicates whether the WAL writer is currently in
                                531                 :                :      * low-power mode (and hence should be nudged if an async commit occurs).
                                532                 :                :      * Protected by info_lck.
                                533                 :                :      */
                                534                 :                :     bool        WalWriterSleeping;
                                535                 :                : 
                                536                 :                :     /*
                                537                 :                :      * During recovery, we keep a copy of the latest checkpoint record here.
                                538                 :                :      * lastCheckPointRecPtr points to start of checkpoint record and
                                539                 :                :      * lastCheckPointEndPtr points to end+1 of checkpoint record.  Used by the
                                540                 :                :      * checkpointer when it wants to create a restartpoint.
                                541                 :                :      *
                                542                 :                :      * Protected by info_lck.
                                543                 :                :      */
                                544                 :                :     XLogRecPtr  lastCheckPointRecPtr;
                                545                 :                :     XLogRecPtr  lastCheckPointEndPtr;
                                546                 :                :     CheckPoint  lastCheckPoint;
                                547                 :                : 
                                548                 :                :     /*
                                549                 :                :      * lastFpwDisableRecPtr points to the start of the last replayed
                                550                 :                :      * XLOG_FPW_CHANGE record that instructs full_page_writes is disabled.
                                551                 :                :      */
                                552                 :                :     XLogRecPtr  lastFpwDisableRecPtr;
                                553                 :                : 
                                554                 :                :     slock_t     info_lck;       /* locks shared variables shown above */
                                555                 :                : } XLogCtlData;
                                556                 :                : 
                                557                 :                : /*
                                558                 :                :  * Classification of XLogInsertRecord operations.
                                559                 :                :  */
                                560                 :                : typedef enum
                                561                 :                : {
                                562                 :                :     WALINSERT_NORMAL,
                                563                 :                :     WALINSERT_SPECIAL_SWITCH,
                                564                 :                :     WALINSERT_SPECIAL_CHECKPOINT
                                565                 :                : } WalInsertClass;
                                566                 :                : 
                                567                 :                : static XLogCtlData *XLogCtl = NULL;
                                568                 :                : 
                                569                 :                : /* a private copy of XLogCtl->Insert.WALInsertLocks, for convenience */
                                570                 :                : static WALInsertLockPadded *WALInsertLocks = NULL;
                                571                 :                : 
                                572                 :                : /*
                                573                 :                :  * We maintain an image of pg_control in shared memory.
                                574                 :                :  */
                                575                 :                : static ControlFileData *ControlFile = NULL;
                                576                 :                : 
                                577                 :                : /*
                                578                 :                :  * Calculate the amount of space left on the page after 'endptr'. Beware
                                579                 :                :  * multiple evaluation!
                                580                 :                :  */
                                581                 :                : #define INSERT_FREESPACE(endptr)    \
                                582                 :                :     (((endptr) % XLOG_BLCKSZ == 0) ? 0 : (XLOG_BLCKSZ - (endptr) % XLOG_BLCKSZ))
                                583                 :                : 
                                584                 :                : /* Macro to advance to next buffer index. */
                                585                 :                : #define NextBufIdx(idx)     \
                                586                 :                :         (((idx) == XLogCtl->XLogCacheBlck) ? 0 : ((idx) + 1))
                                587                 :                : 
                                588                 :                : /*
                                589                 :                :  * XLogRecPtrToBufIdx returns the index of the WAL buffer that holds, or
                                590                 :                :  * would hold if it was in cache, the page containing 'recptr'.
                                591                 :                :  */
                                592                 :                : #define XLogRecPtrToBufIdx(recptr)  \
                                593                 :                :     (((recptr) / XLOG_BLCKSZ) % (XLogCtl->XLogCacheBlck + 1))
                                594                 :                : 
                                595                 :                : /*
                                596                 :                :  * These are the number of bytes in a WAL page usable for WAL data.
                                597                 :                :  */
                                598                 :                : #define UsableBytesInPage (XLOG_BLCKSZ - SizeOfXLogShortPHD)
                                599                 :                : 
                                600                 :                : /*
                                601                 :                :  * Convert values of GUCs measured in megabytes to equiv. segment count.
                                602                 :                :  * Rounds down.
                                603                 :                :  */
                                604                 :                : #define ConvertToXSegs(x, segsize)  XLogMBVarToSegs((x), (segsize))
                                605                 :                : 
                                606                 :                : /* The number of bytes in a WAL segment usable for WAL data. */
                                607                 :                : static int  UsableBytesInSegment;
                                608                 :                : 
                                609                 :                : /*
                                610                 :                :  * Private, possibly out-of-date copy of shared LogwrtResult.
                                611                 :                :  * See discussion above.
                                612                 :                :  */
                                613                 :                : static XLogwrtResult LogwrtResult = {0, 0};
                                614                 :                : 
                                615                 :                : /*
                                616                 :                :  * Update local copy of shared XLogCtl->log{Write,Flush}Result
                                617                 :                :  *
                                618                 :                :  * It's critical that Flush always trails Write, so the order of the reads is
                                619                 :                :  * important, as is the barrier.  See also XLogWrite.
                                620                 :                :  */
                                621                 :                : #define RefreshXLogWriteResult(_target) \
                                622                 :                :     do { \
                                623                 :                :         _target.Flush = pg_atomic_read_u64(&XLogCtl->logFlushResult); \
                                624                 :                :         pg_read_barrier(); \
                                625                 :                :         _target.Write = pg_atomic_read_u64(&XLogCtl->logWriteResult); \
                                626                 :                :     } while (0)
                                627                 :                : 
                                628                 :                : /*
                                629                 :                :  * openLogFile is -1 or a kernel FD for an open log file segment.
                                630                 :                :  * openLogSegNo identifies the segment, and openLogTLI the corresponding TLI.
                                631                 :                :  * These variables are only used to write the XLOG, and so will normally refer
                                632                 :                :  * to the active segment.
                                633                 :                :  *
                                634                 :                :  * Note: call Reserve/ReleaseExternalFD to track consumption of this FD.
                                635                 :                :  */
                                636                 :                : static int  openLogFile = -1;
                                637                 :                : static XLogSegNo openLogSegNo = 0;
                                638                 :                : static TimeLineID openLogTLI = 0;
                                639                 :                : 
                                640                 :                : /*
                                641                 :                :  * Local copies of equivalent fields in the control file.  When running
                                642                 :                :  * crash recovery, LocalMinRecoveryPoint is set to InvalidXLogRecPtr as we
                                643                 :                :  * expect to replay all the WAL available, and updateMinRecoveryPoint is
                                644                 :                :  * switched to false to prevent any updates while replaying records.
                                645                 :                :  * Those values are kept consistent as long as crash recovery runs.
                                646                 :                :  */
                                647                 :                : static XLogRecPtr LocalMinRecoveryPoint;
                                648                 :                : static TimeLineID LocalMinRecoveryPointTLI;
                                649                 :                : static bool updateMinRecoveryPoint = true;
                                650                 :                : 
                                651                 :                : /* For WALInsertLockAcquire/Release functions */
                                652                 :                : static int  MyLockNo = 0;
                                653                 :                : static bool holdingAllLocks = false;
                                654                 :                : 
                                655                 :                : #ifdef WAL_DEBUG
                                656                 :                : static MemoryContext walDebugCxt = NULL;
                                657                 :                : #endif
                                658                 :                : 
                                659                 :                : static void CleanupAfterArchiveRecovery(TimeLineID EndOfLogTLI,
                                660                 :                :                                         XLogRecPtr EndOfLog,
                                661                 :                :                                         TimeLineID newTLI);
                                662                 :                : static void CheckRequiredParameterValues(void);
                                663                 :                : static void XLogReportParameters(void);
                                664                 :                : static int  LocalSetXLogInsertAllowed(void);
                                665                 :                : static void CreateEndOfRecoveryRecord(void);
                                666                 :                : static XLogRecPtr CreateOverwriteContrecordRecord(XLogRecPtr aborted_lsn,
                                667                 :                :                                                   XLogRecPtr pagePtr,
                                668                 :                :                                                   TimeLineID newTLI);
                                669                 :                : static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);
                                670                 :                : static void KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo);
                                671                 :                : static XLogRecPtr XLogGetReplicationSlotMinimumLSN(void);
                                672                 :                : 
                                673                 :                : static void AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli,
                                674                 :                :                                   bool opportunistic);
                                675                 :                : static void XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible);
                                676                 :                : static bool InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
                                677                 :                :                                    bool find_free, XLogSegNo max_segno,
                                678                 :                :                                    TimeLineID tli);
                                679                 :                : static void XLogFileClose(void);
                                680                 :                : static void PreallocXlogFiles(XLogRecPtr endptr, TimeLineID tli);
                                681                 :                : static void RemoveTempXlogFiles(void);
                                682                 :                : static void RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr lastredoptr,
                                683                 :                :                                XLogRecPtr endptr, TimeLineID insertTLI);
                                684                 :                : static void RemoveXlogFile(const struct dirent *segment_de,
                                685                 :                :                            XLogSegNo recycleSegNo, XLogSegNo *endlogSegNo,
                                686                 :                :                            TimeLineID insertTLI);
                                687                 :                : static void UpdateLastRemovedPtr(char *filename);
                                688                 :                : static void ValidateXLOGDirectoryStructure(void);
                                689                 :                : static void CleanupBackupHistory(void);
                                690                 :                : static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force);
                                691                 :                : static bool PerformRecoveryXLogAction(void);
                                692                 :                : static void InitControlFile(uint64 sysidentifier, uint32 data_checksum_version);
                                693                 :                : static void WriteControlFile(void);
                                694                 :                : static void ReadControlFile(void);
                                695                 :                : static void UpdateControlFile(void);
                                696                 :                : static char *str_time(pg_time_t tnow, char *buf, size_t bufsize);
                                697                 :                : 
                                698                 :                : static int  get_sync_bit(int method);
                                699                 :                : 
                                700                 :                : static void CopyXLogRecordToWAL(int write_len, bool isLogSwitch,
                                701                 :                :                                 XLogRecData *rdata,
                                702                 :                :                                 XLogRecPtr StartPos, XLogRecPtr EndPos,
                                703                 :                :                                 TimeLineID tli);
                                704                 :                : static void ReserveXLogInsertLocation(int size, XLogRecPtr *StartPos,
                                705                 :                :                                       XLogRecPtr *EndPos, XLogRecPtr *PrevPtr);
                                706                 :                : static bool ReserveXLogSwitch(XLogRecPtr *StartPos, XLogRecPtr *EndPos,
                                707                 :                :                               XLogRecPtr *PrevPtr);
                                708                 :                : static XLogRecPtr WaitXLogInsertionsToFinish(XLogRecPtr upto);
                                709                 :                : static char *GetXLogBuffer(XLogRecPtr ptr, TimeLineID tli);
                                710                 :                : static XLogRecPtr XLogBytePosToRecPtr(uint64 bytepos);
                                711                 :                : static XLogRecPtr XLogBytePosToEndRecPtr(uint64 bytepos);
                                712                 :                : static uint64 XLogRecPtrToBytePos(XLogRecPtr ptr);
                                713                 :                : 
                                714                 :                : static void WALInsertLockAcquire(void);
                                715                 :                : static void WALInsertLockAcquireExclusive(void);
                                716                 :                : static void WALInsertLockRelease(void);
                                717                 :                : static void WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt);
                                718                 :                : 
                                719                 :                : /*
                                720                 :                :  * Insert an XLOG record represented by an already-constructed chain of data
                                721                 :                :  * chunks.  This is a low-level routine; to construct the WAL record header
                                722                 :                :  * and data, use the higher-level routines in xloginsert.c.
                                723                 :                :  *
                                724                 :                :  * If 'fpw_lsn' is valid, it is the oldest LSN among the pages that this
                                725                 :                :  * WAL record applies to, that were not included in the record as full page
                                726                 :                :  * images.  If fpw_lsn <= RedoRecPtr, the function does not perform the
                                727                 :                :  * insertion and returns InvalidXLogRecPtr.  The caller can then recalculate
                                728                 :                :  * which pages need a full-page image, and retry.  If fpw_lsn is invalid, the
                                729                 :                :  * record is always inserted.
                                730                 :                :  *
                                731                 :                :  * 'flags' gives more in-depth control on the record being inserted. See
                                732                 :                :  * XLogSetRecordFlags() for details.
                                733                 :                :  *
                                734                 :                :  * 'topxid_included' tells whether the top-transaction id is logged along with
                                735                 :                :  * current subtransaction. See XLogRecordAssemble().
                                736                 :                :  *
                                737                 :                :  * The first XLogRecData in the chain must be for the record header, and its
                                738                 :                :  * data must be MAXALIGNed.  XLogInsertRecord fills in the xl_prev and
                                739                 :                :  * xl_crc fields in the header, the rest of the header must already be filled
                                740                 :                :  * by the caller.
                                741                 :                :  *
                                742                 :                :  * Returns XLOG pointer to end of record (beginning of next record).
                                743                 :                :  * This can be used as LSN for data pages affected by the logged action.
                                744                 :                :  * (LSN is the XLOG point up to which the XLOG must be flushed to disk
                                745                 :                :  * before the data page can be written out.  This implements the basic
                                746                 :                :  * WAL rule "write the log before the data".)
                                747                 :                :  */
                                748                 :                : XLogRecPtr
 3283 andres@anarazel.de        749                 :CBC    14169133 : XLogInsertRecord(XLogRecData *rdata,
                                750                 :                :                  XLogRecPtr fpw_lsn,
                                751                 :                :                  uint8 flags,
                                752                 :                :                  int num_fpi,
                                753                 :                :                  uint64 fpi_bytes,
                                754                 :                :                  bool topxid_included)
                                755                 :                : {
 9037 bruce@momjian.us          756                 :       14169133 :     XLogCtlInsert *Insert = &XLogCtl->Insert;
                                757                 :                :     pg_crc32c   rdata_crc;
                                758                 :                :     bool        inserted;
 4060 heikki.linnakangas@i      759                 :       14169133 :     XLogRecord *rechdr = (XLogRecord *) rdata->data;
 3331 tgl@sss.pgh.pa.us         760                 :       14169133 :     uint8       info = rechdr->xl_info & ~XLR_INFO_MASK;
  791 rhaas@postgresql.org      761                 :       14169133 :     WalInsertClass class = WALINSERT_NORMAL;
                                762                 :                :     XLogRecPtr  StartPos;
                                763                 :                :     XLogRecPtr  EndPos;
 2653 akapila@postgresql.o      764                 :       14169133 :     bool        prevDoPageWrites = doPageWrites;
                                765                 :                :     TimeLineID  insertTLI;
                                766                 :                : 
                                767                 :                :     /* Does this record type require special handling? */
  791 rhaas@postgresql.org      768         [ +  + ]:       14169133 :     if (unlikely(rechdr->xl_rmid == RM_XLOG_ID))
                                769                 :                :     {
                                770         [ +  + ]:         220343 :         if (info == XLOG_SWITCH)
                                771                 :            710 :             class = WALINSERT_SPECIAL_SWITCH;
                                772         [ +  + ]:         219633 :         else if (info == XLOG_CHECKPOINT_REDO)
                                773                 :            915 :             class = WALINSERT_SPECIAL_CHECKPOINT;
                                774                 :                :     }
                                775                 :                : 
                                776                 :                :     /* we assume that all of the record header is in the first chunk */
 4046 heikki.linnakangas@i      777         [ -  + ]:       14169133 :     Assert(rdata->len >= SizeOfXLogRecord);
                                778                 :                : 
                                779                 :                :     /* cross-check on whether we should be here or not */
 6019 tgl@sss.pgh.pa.us         780         [ -  + ]:       14169133 :     if (!XLogInsertAllowed())
 6019 tgl@sss.pgh.pa.us         781         [ #  # ]:UBC           0 :         elog(ERROR, "cannot make new WAL entries during recovery");
                                782                 :                : 
                                783                 :                :     /*
                                784                 :                :      * Given that we're not in recovery, InsertTimeLineID is set and can't
                                785                 :                :      * change, so we can read it without a lock.
                                786                 :                :      */
 1499 rhaas@postgresql.org      787                 :CBC    14169133 :     insertTLI = XLogCtl->InsertTimeLineID;
                                788                 :                : 
                                789                 :                :     /*----------
                                790                 :                :      *
                                791                 :                :      * We have now done all the preparatory work we can without holding a
                                792                 :                :      * lock or modifying shared state. From here on, inserting the new WAL
                                793                 :                :      * record to the shared WAL buffer cache is a two-step process:
                                794                 :                :      *
                                795                 :                :      * 1. Reserve the right amount of space from the WAL. The current head of
                                796                 :                :      *    reserved space is kept in Insert->CurrBytePos, and is protected by
                                797                 :                :      *    insertpos_lck.
                                798                 :                :      *
                                799                 :                :      * 2. Copy the record to the reserved WAL space. This involves finding the
                                800                 :                :      *    correct WAL buffer containing the reserved space, and copying the
                                801                 :                :      *    record in place. This can be done concurrently in multiple processes.
                                802                 :                :      *
                                803                 :                :      * To keep track of which insertions are still in-progress, each concurrent
                                804                 :                :      * inserter acquires an insertion lock. In addition to just indicating that
                                805                 :                :      * an insertion is in progress, the lock tells others how far the inserter
                                806                 :                :      * has progressed. There is a small fixed number of insertion locks,
                                807                 :                :      * determined by NUM_XLOGINSERT_LOCKS. When an inserter crosses a page
                                808                 :                :      * boundary, it updates the value stored in the lock to the how far it has
                                809                 :                :      * inserted, to allow the previous buffer to be flushed.
                                810                 :                :      *
                                811                 :                :      * Holding onto an insertion lock also protects RedoRecPtr and
                                812                 :                :      * fullPageWrites from changing until the insertion is finished.
                                813                 :                :      *
                                814                 :                :      * Step 2 can usually be done completely in parallel. If the required WAL
                                815                 :                :      * page is not initialized yet, you have to grab WALBufMappingLock to
                                816                 :                :      * initialize it, but the WAL writer tries to do that ahead of insertions
                                817                 :                :      * to avoid that from happening in the critical path.
                                818                 :                :      *
                                819                 :                :      *----------
                                820                 :                :      */
 5090 heikki.linnakangas@i      821                 :       14169133 :     START_CRIT_SECTION();
                                822                 :                : 
  791 rhaas@postgresql.org      823         [ +  + ]:       14169133 :     if (likely(class == WALINSERT_NORMAL))
                                824                 :                :     {
  800                           825                 :       14167508 :         WALInsertLockAcquire();
                                826                 :                : 
                                827                 :                :         /*
                                828                 :                :          * Check to see if my copy of RedoRecPtr is out of date. If so, may
                                829                 :                :          * have to go back and have the caller recompute everything. This can
                                830                 :                :          * only happen just after a checkpoint, so it's better to be slow in
                                831                 :                :          * this case and fast otherwise.
                                832                 :                :          *
                                833                 :                :          * Also check to see if fullPageWrites was just turned on or there's a
                                834                 :                :          * running backup (which forces full-page writes); if we weren't
                                835                 :                :          * already doing full-page writes then go back and recompute.
                                836                 :                :          *
                                837                 :                :          * If we aren't doing full-page writes then RedoRecPtr doesn't
                                838                 :                :          * actually affect the contents of the XLOG record, so we'll update
                                839                 :                :          * our local copy but not force a recomputation.  (If doPageWrites was
                                840                 :                :          * just turned off, we could recompute the record without full pages,
                                841                 :                :          * but we choose not to bother.)
                                842                 :                :          */
                                843         [ +  + ]:       14167508 :         if (RedoRecPtr != Insert->RedoRecPtr)
                                844                 :                :         {
                                845         [ -  + ]:           6386 :             Assert(RedoRecPtr < Insert->RedoRecPtr);
                                846                 :           6386 :             RedoRecPtr = Insert->RedoRecPtr;
                                847                 :                :         }
                                848   [ +  +  +  + ]:       14167508 :         doPageWrites = (Insert->fullPageWrites || Insert->runningBackups > 0);
                                849                 :                : 
                                850         [ +  + ]:       14167508 :         if (doPageWrites &&
                                851   [ +  +  +  + ]:       13918652 :             (!prevDoPageWrites ||
   42 alvherre@kurilemu.de      852         [ +  + ]:GNC    13167940 :              (XLogRecPtrIsValid(fpw_lsn) && fpw_lsn <= RedoRecPtr)))
                                853                 :                :         {
                                854                 :                :             /*
                                855                 :                :              * Oops, some buffer now needs to be backed up that the caller
                                856                 :                :              * didn't back up.  Start over.
                                857                 :                :              */
  800 rhaas@postgresql.org      858                 :CBC        7106 :             WALInsertLockRelease();
                                859         [ -  + ]:           7106 :             END_CRIT_SECTION();
                                860                 :           7106 :             return InvalidXLogRecPtr;
                                861                 :                :         }
                                862                 :                : 
                                863                 :                :         /*
                                864                 :                :          * Reserve space for the record in the WAL. This also sets the xl_prev
                                865                 :                :          * pointer.
                                866                 :                :          */
 4060 heikki.linnakangas@i      867                 :       14160402 :         ReserveXLogInsertLocation(rechdr->xl_tot_len, &StartPos, &EndPos,
                                868                 :                :                                   &rechdr->xl_prev);
                                869                 :                : 
                                870                 :                :         /* Normal records are always inserted. */
 4546                           871                 :       14160402 :         inserted = true;
                                872                 :                :     }
  791 rhaas@postgresql.org      873         [ +  + ]:           1625 :     else if (class == WALINSERT_SPECIAL_SWITCH)
                                874                 :                :     {
                                875                 :                :         /*
                                876                 :                :          * In order to insert an XLOG_SWITCH record, we need to hold all of
                                877                 :                :          * the WAL insertion locks, not just one, so that no one else can
                                878                 :                :          * begin inserting a record until we've figured out how much space
                                879                 :                :          * remains in the current WAL segment and claimed all of it.
                                880                 :                :          *
                                881                 :                :          * Nonetheless, this case is simpler than the normal cases handled
                                882                 :                :          * below, which must check for changes in doPageWrites and RedoRecPtr.
                                883                 :                :          * Those checks are only needed for records that can contain buffer
                                884                 :                :          * references, and an XLOG_SWITCH record never does.
                                885                 :                :          */
   42 alvherre@kurilemu.de      886         [ -  + ]:GNC         710 :         Assert(!XLogRecPtrIsValid(fpw_lsn));
  800 rhaas@postgresql.org      887                 :CBC         710 :         WALInsertLockAcquireExclusive();
                                888                 :            710 :         inserted = ReserveXLogSwitch(&StartPos, &EndPos, &rechdr->xl_prev);
                                889                 :                :     }
                                890                 :                :     else
                                891                 :                :     {
  791                           892         [ -  + ]:            915 :         Assert(class == WALINSERT_SPECIAL_CHECKPOINT);
                                893                 :                : 
                                894                 :                :         /*
                                895                 :                :          * We need to update both the local and shared copies of RedoRecPtr,
                                896                 :                :          * which means that we need to hold all the WAL insertion locks.
                                897                 :                :          * However, there can't be any buffer references, so as above, we need
                                898                 :                :          * not check RedoRecPtr before inserting the record; we just need to
                                899                 :                :          * update it afterwards.
                                900                 :                :          */
   42 alvherre@kurilemu.de      901         [ -  + ]:GNC         915 :         Assert(!XLogRecPtrIsValid(fpw_lsn));
  791 rhaas@postgresql.org      902                 :CBC         915 :         WALInsertLockAcquireExclusive();
                                903                 :            915 :         ReserveXLogInsertLocation(rechdr->xl_tot_len, &StartPos, &EndPos,
                                904                 :                :                                   &rechdr->xl_prev);
                                905                 :            915 :         RedoRecPtr = Insert->RedoRecPtr = StartPos;
                                906                 :            915 :         inserted = true;
                                907                 :                :     }
                                908                 :                : 
 4546 heikki.linnakangas@i      909         [ +  + ]:       14162027 :     if (inserted)
                                910                 :                :     {
                                911                 :                :         /*
                                912                 :                :          * Now that xl_prev has been filled in, calculate CRC of the record
                                913                 :                :          * header.
                                914                 :                :          */
 4046                           915                 :       14161962 :         rdata_crc = rechdr->xl_crc;
                                916                 :       14161962 :         COMP_CRC32C(rdata_crc, rechdr, offsetof(XLogRecord, xl_crc));
 4062                           917                 :       14161962 :         FIN_CRC32C(rdata_crc);
 4546                           918                 :       14161962 :         rechdr->xl_crc = rdata_crc;
                                919                 :                : 
                                920                 :                :         /*
                                921                 :                :          * All the record data, including the header, is now ready to be
                                922                 :                :          * inserted. Copy the record in the space reserved.
                                923                 :                :          */
  791 rhaas@postgresql.org      924                 :       14161962 :         CopyXLogRecordToWAL(rechdr->xl_tot_len,
                                925                 :                :                             class == WALINSERT_SPECIAL_SWITCH, rdata,
                                926                 :                :                             StartPos, EndPos, insertTLI);
                                927                 :                : 
                                928                 :                :         /*
                                929                 :                :          * Unless record is flagged as not important, update LSN of last
                                930                 :                :          * important record in the current slot. When holding all locks, just
                                931                 :                :          * update the first one.
                                932                 :                :          */
 3283 andres@anarazel.de        933         [ +  + ]:       14161962 :         if ((flags & XLOG_MARK_UNIMPORTANT) == 0)
                                934                 :                :         {
 3137 bruce@momjian.us          935         [ +  + ]:       14071071 :             int         lockno = holdingAllLocks ? 0 : MyLockNo;
                                936                 :                : 
 3283 andres@anarazel.de        937                 :       14071071 :             WALInsertLocks[lockno].l.lastImportantAt = StartPos;
                                938                 :                :         }
                                939                 :                :     }
                                940                 :                :     else
                                941                 :                :     {
                                942                 :                :         /*
                                943                 :                :          * This was an xlog-switch record, but the current insert location was
                                944                 :                :          * already exactly at the beginning of a segment, so there was no need
                                945                 :                :          * to do anything.
                                946                 :                :          */
                                947                 :                :     }
                                948                 :                : 
                                949                 :                :     /*
                                950                 :                :      * Done! Let others know that we're finished.
                                951                 :                :      */
 4290 heikki.linnakangas@i      952                 :       14162027 :     WALInsertLockRelease();
                                953                 :                : 
 4546                           954         [ -  + ]:       14162027 :     END_CRIT_SECTION();
                                955                 :                : 
 1507 akapila@postgresql.o      956                 :       14162027 :     MarkCurrentTransactionIdLoggedIfAny();
                                957                 :                : 
                                958                 :                :     /*
                                959                 :                :      * Mark top transaction id is logged (if needed) so that we should not try
                                960                 :                :      * to log it again with the next WAL record in the current subtransaction.
                                961                 :                :      */
                                962         [ +  + ]:       14162027 :     if (topxid_included)
                                963                 :            219 :         MarkSubxactTopXidLogged();
                                964                 :                : 
                                965                 :                :     /*
                                966                 :                :      * Update shared LogwrtRqst.Write, if we crossed page boundary.
                                967                 :                :      */
 4546 heikki.linnakangas@i      968         [ +  + ]:       14162027 :     if (StartPos / XLOG_BLCKSZ != EndPos / XLOG_BLCKSZ)
                                969                 :                :     {
 4105 andres@anarazel.de        970         [ +  + ]:        1685581 :         SpinLockAcquire(&XLogCtl->info_lck);
                                971                 :                :         /* advance global request to include new block(s) */
                                972         [ +  + ]:        1685581 :         if (XLogCtl->LogwrtRqst.Write < EndPos)
                                973                 :        1615672 :             XLogCtl->LogwrtRqst.Write = EndPos;
                                974                 :        1685581 :         SpinLockRelease(&XLogCtl->info_lck);
  622 alvherre@alvh.no-ip.      975                 :        1685581 :         RefreshXLogWriteResult(LogwrtResult);
                                976                 :                :     }
                                977                 :                : 
                                978                 :                :     /*
                                979                 :                :      * If this was an XLOG_SWITCH record, flush the record and the empty
                                980                 :                :      * padding space that fills the rest of the segment, and perform
                                981                 :                :      * end-of-segment actions (eg, notifying archiver).
                                982                 :                :      */
  791 rhaas@postgresql.org      983         [ +  + ]:       14162027 :     if (class == WALINSERT_SPECIAL_SWITCH)
                                984                 :                :     {
                                985                 :                :         TRACE_POSTGRESQL_WAL_SWITCH();
 4546 heikki.linnakangas@i      986                 :            710 :         XLogFlush(EndPos);
                                987                 :                : 
                                988                 :                :         /*
                                989                 :                :          * Even though we reserved the rest of the segment for us, which is
                                990                 :                :          * reflected in EndPos, we return a pointer to just the end of the
                                991                 :                :          * xlog-switch record.
                                992                 :                :          */
                                993         [ +  + ]:            710 :         if (inserted)
                                994                 :                :         {
                                995                 :            645 :             EndPos = StartPos + SizeOfXLogRecord;
                                996         [ -  + ]:            645 :             if (StartPos / XLOG_BLCKSZ != EndPos / XLOG_BLCKSZ)
                                997                 :                :             {
 3012 andres@anarazel.de        998                 :UBC           0 :                 uint64      offset = XLogSegmentOffset(EndPos, wal_segment_size);
                                999                 :                : 
                               1000         [ #  # ]:              0 :                 if (offset == EndPos % XLOG_BLCKSZ)
 4546 heikki.linnakangas@i     1001                 :              0 :                     EndPos += SizeOfXLogLongPHD;
                               1002                 :                :                 else
                               1003                 :              0 :                     EndPos += SizeOfXLogShortPHD;
                               1004                 :                :             }
                               1005                 :                :         }
                               1006                 :                :     }
                               1007                 :                : 
                               1008                 :                : #ifdef WAL_DEBUG
                               1009                 :                :     if (XLOG_DEBUG)
                               1010                 :                :     {
                               1011                 :                :         static XLogReaderState *debug_reader = NULL;
                               1012                 :                :         XLogRecord *record;
                               1013                 :                :         DecodedXLogRecord *decoded;
                               1014                 :                :         StringInfoData buf;
                               1015                 :                :         StringInfoData recordBuf;
                               1016                 :                :         char       *errormsg = NULL;
                               1017                 :                :         MemoryContext oldCxt;
                               1018                 :                : 
                               1019                 :                :         oldCxt = MemoryContextSwitchTo(walDebugCxt);
                               1020                 :                : 
                               1021                 :                :         initStringInfo(&buf);
                               1022                 :                :         appendStringInfo(&buf, "INSERT @ %X/%08X: ", LSN_FORMAT_ARGS(EndPos));
                               1023                 :                : 
                               1024                 :                :         /*
                               1025                 :                :          * We have to piece together the WAL record data from the XLogRecData
                               1026                 :                :          * entries, so that we can pass it to the rm_desc function as one
                               1027                 :                :          * contiguous chunk.
                               1028                 :                :          */
                               1029                 :                :         initStringInfo(&recordBuf);
                               1030                 :                :         for (; rdata != NULL; rdata = rdata->next)
                               1031                 :                :             appendBinaryStringInfo(&recordBuf, rdata->data, rdata->len);
                               1032                 :                : 
                               1033                 :                :         /* We also need temporary space to decode the record. */
                               1034                 :                :         record = (XLogRecord *) recordBuf.data;
                               1035                 :                :         decoded = (DecodedXLogRecord *)
                               1036                 :                :             palloc(DecodeXLogRecordRequiredSpace(record->xl_tot_len));
                               1037                 :                : 
                               1038                 :                :         if (!debug_reader)
                               1039                 :                :             debug_reader = XLogReaderAllocate(wal_segment_size, NULL,
                               1040                 :                :                                               XL_ROUTINE(.page_read = NULL,
                               1041                 :                :                                                          .segment_open = NULL,
                               1042                 :                :                                                          .segment_close = NULL),
                               1043                 :                :                                               NULL);
                               1044                 :                :         if (!debug_reader)
                               1045                 :                :         {
                               1046                 :                :             appendStringInfoString(&buf, "error decoding record: out of memory while allocating a WAL reading processor");
                               1047                 :                :         }
                               1048                 :                :         else if (!DecodeXLogRecord(debug_reader,
                               1049                 :                :                                    decoded,
                               1050                 :                :                                    record,
                               1051                 :                :                                    EndPos,
                               1052                 :                :                                    &errormsg))
                               1053                 :                :         {
                               1054                 :                :             appendStringInfo(&buf, "error decoding record: %s",
                               1055                 :                :                              errormsg ? errormsg : "no error message");
                               1056                 :                :         }
                               1057                 :                :         else
                               1058                 :                :         {
                               1059                 :                :             appendStringInfoString(&buf, " - ");
                               1060                 :                : 
                               1061                 :                :             debug_reader->record = decoded;
                               1062                 :                :             xlog_outdesc(&buf, debug_reader);
                               1063                 :                :             debug_reader->record = NULL;
                               1064                 :                :         }
                               1065                 :                :         elog(LOG, "%s", buf.data);
                               1066                 :                : 
                               1067                 :                :         pfree(decoded);
                               1068                 :                :         pfree(buf.data);
                               1069                 :                :         pfree(recordBuf.data);
                               1070                 :                :         MemoryContextSwitchTo(oldCxt);
                               1071                 :                :     }
                               1072                 :                : #endif
                               1073                 :                : 
                               1074                 :                :     /*
                               1075                 :                :      * Update our global variables
                               1076                 :                :      */
 4546 heikki.linnakangas@i     1077                 :CBC    14162027 :     ProcLastRecPtr = StartPos;
                               1078                 :       14162027 :     XactLastRecEnd = EndPos;
                               1079                 :                : 
                               1080                 :                :     /* Report WAL traffic to the instrumentation. */
 2084 akapila@postgresql.o     1081         [ +  + ]:       14162027 :     if (inserted)
                               1082                 :                :     {
                               1083                 :       14161962 :         pgWalUsage.wal_bytes += rechdr->xl_tot_len;
                               1084                 :       14161962 :         pgWalUsage.wal_records++;
 2053                          1085                 :       14161962 :         pgWalUsage.wal_fpi += num_fpi;
   50 michael@paquier.xyz      1086                 :GNC    14161962 :         pgWalUsage.wal_fpi_bytes += fpi_bytes;
                               1087                 :                : 
                               1088                 :                :         /* Required for the flush of pending stats WAL data */
  143 michael@paquier.xyz      1089                 :CBC    14161962 :         pgstat_report_fixed = true;
                               1090                 :                :     }
                               1091                 :                : 
 4546 heikki.linnakangas@i     1092                 :       14162027 :     return EndPos;
                               1093                 :                : }
                               1094                 :                : 
                               1095                 :                : /*
                               1096                 :                :  * Reserves the right amount of space for a record of given size from the WAL.
                               1097                 :                :  * *StartPos is set to the beginning of the reserved section, *EndPos to
                               1098                 :                :  * its end+1. *PrevPtr is set to the beginning of the previous record; it is
                               1099                 :                :  * used to set the xl_prev of this record.
                               1100                 :                :  *
                               1101                 :                :  * This is the performance critical part of XLogInsert that must be serialized
                               1102                 :                :  * across backends. The rest can happen mostly in parallel. Try to keep this
                               1103                 :                :  * section as short as possible, insertpos_lck can be heavily contended on a
                               1104                 :                :  * busy system.
                               1105                 :                :  *
                               1106                 :                :  * NB: The space calculation here must match the code in CopyXLogRecordToWAL,
                               1107                 :                :  * where we actually copy the record to the reserved space.
                               1108                 :                :  *
                               1109                 :                :  * NB: Testing shows that XLogInsertRecord runs faster if this code is inlined;
                               1110                 :                :  * however, because there are two call sites, the compiler is reluctant to
                               1111                 :                :  * inline. We use pg_attribute_always_inline here to try to convince it.
                               1112                 :                :  */
                               1113                 :                : static pg_attribute_always_inline void
                               1114                 :       14161317 : ReserveXLogInsertLocation(int size, XLogRecPtr *StartPos, XLogRecPtr *EndPos,
                               1115                 :                :                           XLogRecPtr *PrevPtr)
                               1116                 :                : {
 4105 andres@anarazel.de       1117                 :       14161317 :     XLogCtlInsert *Insert = &XLogCtl->Insert;
                               1118                 :                :     uint64      startbytepos;
                               1119                 :                :     uint64      endbytepos;
                               1120                 :                :     uint64      prevbytepos;
                               1121                 :                : 
 4546 heikki.linnakangas@i     1122                 :       14161317 :     size = MAXALIGN(size);
                               1123                 :                : 
                               1124                 :                :     /* All (non xlog-switch) records should contain data. */
                               1125         [ -  + ]:       14161317 :     Assert(size > SizeOfXLogRecord);
                               1126                 :                : 
                               1127                 :                :     /*
                               1128                 :                :      * The duration the spinlock needs to be held is minimized by minimizing
                               1129                 :                :      * the calculations that have to be done while holding the lock. The
                               1130                 :                :      * current tip of reserved WAL is kept in CurrBytePos, as a byte position
                               1131                 :                :      * that only counts "usable" bytes in WAL, that is, it excludes all WAL
                               1132                 :                :      * page headers. The mapping between "usable" byte positions and physical
                               1133                 :                :      * positions (XLogRecPtrs) can be done outside the locked region, and
                               1134                 :                :      * because the usable byte position doesn't include any headers, reserving
                               1135                 :                :      * X bytes from WAL is almost as simple as "CurrBytePos += X".
                               1136                 :                :      */
                               1137         [ +  + ]:       14161317 :     SpinLockAcquire(&Insert->insertpos_lck);
                               1138                 :                : 
                               1139                 :       14161317 :     startbytepos = Insert->CurrBytePos;
                               1140                 :       14161317 :     endbytepos = startbytepos + size;
                               1141                 :       14161317 :     prevbytepos = Insert->PrevBytePos;
                               1142                 :       14161317 :     Insert->CurrBytePos = endbytepos;
                               1143                 :       14161317 :     Insert->PrevBytePos = startbytepos;
                               1144                 :                : 
                               1145                 :       14161317 :     SpinLockRelease(&Insert->insertpos_lck);
                               1146                 :                : 
                               1147                 :       14161317 :     *StartPos = XLogBytePosToRecPtr(startbytepos);
                               1148                 :       14161317 :     *EndPos = XLogBytePosToEndRecPtr(endbytepos);
                               1149                 :       14161317 :     *PrevPtr = XLogBytePosToRecPtr(prevbytepos);
                               1150                 :                : 
                               1151                 :                :     /*
                               1152                 :                :      * Check that the conversions between "usable byte positions" and
                               1153                 :                :      * XLogRecPtrs work consistently in both directions.
                               1154                 :                :      */
                               1155         [ -  + ]:       14161317 :     Assert(XLogRecPtrToBytePos(*StartPos) == startbytepos);
                               1156         [ -  + ]:       14161317 :     Assert(XLogRecPtrToBytePos(*EndPos) == endbytepos);
                               1157         [ -  + ]:       14161317 :     Assert(XLogRecPtrToBytePos(*PrevPtr) == prevbytepos);
                               1158                 :       14161317 : }
                               1159                 :                : 
                               1160                 :                : /*
                               1161                 :                :  * Like ReserveXLogInsertLocation(), but for an xlog-switch record.
                               1162                 :                :  *
                               1163                 :                :  * A log-switch record is handled slightly differently. The rest of the
                               1164                 :                :  * segment will be reserved for this insertion, as indicated by the returned
                               1165                 :                :  * *EndPos value. However, if we are already at the beginning of the current
                               1166                 :                :  * segment, *StartPos and *EndPos are set to the current location without
                               1167                 :                :  * reserving any space, and the function returns false.
                               1168                 :                : */
                               1169                 :                : static bool
                               1170                 :            710 : ReserveXLogSwitch(XLogRecPtr *StartPos, XLogRecPtr *EndPos, XLogRecPtr *PrevPtr)
                               1171                 :                : {
 4105 andres@anarazel.de       1172                 :            710 :     XLogCtlInsert *Insert = &XLogCtl->Insert;
                               1173                 :                :     uint64      startbytepos;
                               1174                 :                :     uint64      endbytepos;
                               1175                 :                :     uint64      prevbytepos;
 4046 heikki.linnakangas@i     1176                 :            710 :     uint32      size = MAXALIGN(SizeOfXLogRecord);
                               1177                 :                :     XLogRecPtr  ptr;
                               1178                 :                :     uint32      segleft;
                               1179                 :                : 
                               1180                 :                :     /*
                               1181                 :                :      * These calculations are a bit heavy-weight to be done while holding a
                               1182                 :                :      * spinlock, but since we're holding all the WAL insertion locks, there
                               1183                 :                :      * are no other inserters competing for it. GetXLogInsertRecPtr() does
                               1184                 :                :      * compete for it, but that's not called very frequently.
                               1185                 :                :      */
 4546                          1186         [ -  + ]:            710 :     SpinLockAcquire(&Insert->insertpos_lck);
                               1187                 :                : 
                               1188                 :            710 :     startbytepos = Insert->CurrBytePos;
                               1189                 :                : 
                               1190                 :            710 :     ptr = XLogBytePosToEndRecPtr(startbytepos);
 3012 andres@anarazel.de       1191         [ +  + ]:            710 :     if (XLogSegmentOffset(ptr, wal_segment_size) == 0)
                               1192                 :                :     {
 4546 heikki.linnakangas@i     1193                 :             65 :         SpinLockRelease(&Insert->insertpos_lck);
                               1194                 :             65 :         *EndPos = *StartPos = ptr;
                               1195                 :             65 :         return false;
                               1196                 :                :     }
                               1197                 :                : 
                               1198                 :            645 :     endbytepos = startbytepos + size;
                               1199                 :            645 :     prevbytepos = Insert->PrevBytePos;
                               1200                 :                : 
                               1201                 :            645 :     *StartPos = XLogBytePosToRecPtr(startbytepos);
                               1202                 :            645 :     *EndPos = XLogBytePosToEndRecPtr(endbytepos);
                               1203                 :                : 
 3012 andres@anarazel.de       1204                 :            645 :     segleft = wal_segment_size - XLogSegmentOffset(*EndPos, wal_segment_size);
                               1205         [ +  - ]:            645 :     if (segleft != wal_segment_size)
                               1206                 :                :     {
                               1207                 :                :         /* consume the rest of the segment */
 4546 heikki.linnakangas@i     1208                 :            645 :         *EndPos += segleft;
                               1209                 :            645 :         endbytepos = XLogRecPtrToBytePos(*EndPos);
                               1210                 :                :     }
                               1211                 :            645 :     Insert->CurrBytePos = endbytepos;
                               1212                 :            645 :     Insert->PrevBytePos = startbytepos;
                               1213                 :                : 
                               1214                 :            645 :     SpinLockRelease(&Insert->insertpos_lck);
                               1215                 :                : 
                               1216                 :            645 :     *PrevPtr = XLogBytePosToRecPtr(prevbytepos);
                               1217                 :                : 
 3012 andres@anarazel.de       1218         [ -  + ]:            645 :     Assert(XLogSegmentOffset(*EndPos, wal_segment_size) == 0);
 4546 heikki.linnakangas@i     1219         [ -  + ]:            645 :     Assert(XLogRecPtrToBytePos(*EndPos) == endbytepos);
                               1220         [ -  + ]:            645 :     Assert(XLogRecPtrToBytePos(*StartPos) == startbytepos);
                               1221         [ -  + ]:            645 :     Assert(XLogRecPtrToBytePos(*PrevPtr) == prevbytepos);
                               1222                 :                : 
                               1223                 :            645 :     return true;
                               1224                 :                : }
                               1225                 :                : 
                               1226                 :                : /*
                               1227                 :                :  * Subroutine of XLogInsertRecord.  Copies a WAL record to an already-reserved
                               1228                 :                :  * area in the WAL.
                               1229                 :                :  */
                               1230                 :                : static void
                               1231                 :       14161962 : CopyXLogRecordToWAL(int write_len, bool isLogSwitch, XLogRecData *rdata,
                               1232                 :                :                     XLogRecPtr StartPos, XLogRecPtr EndPos, TimeLineID tli)
                               1233                 :                : {
                               1234                 :                :     char       *currpos;
                               1235                 :                :     int         freespace;
                               1236                 :                :     int         written;
                               1237                 :                :     XLogRecPtr  CurrPos;
                               1238                 :                :     XLogPageHeader pagehdr;
                               1239                 :                : 
                               1240                 :                :     /*
                               1241                 :                :      * Get a pointer to the right place in the right WAL buffer to start
                               1242                 :                :      * inserting to.
                               1243                 :                :      */
                               1244                 :       14161962 :     CurrPos = StartPos;
 1504 rhaas@postgresql.org     1245                 :       14161962 :     currpos = GetXLogBuffer(CurrPos, tli);
 4546 heikki.linnakangas@i     1246         [ +  - ]:       14161962 :     freespace = INSERT_FREESPACE(CurrPos);
                               1247                 :                : 
                               1248                 :                :     /*
                               1249                 :                :      * there should be enough space for at least the first field (xl_tot_len)
                               1250                 :                :      * on this page.
                               1251                 :                :      */
                               1252         [ -  + ]:       14161962 :     Assert(freespace >= sizeof(uint32));
                               1253                 :                : 
                               1254                 :                :     /* Copy record data */
                               1255                 :       14161962 :     written = 0;
                               1256         [ +  + ]:       67645175 :     while (rdata != NULL)
                               1257                 :                :     {
  471 peter@eisentraut.org     1258                 :       53483213 :         const char *rdata_data = rdata->data;
 4546 heikki.linnakangas@i     1259                 :       53483213 :         int         rdata_len = rdata->len;
                               1260                 :                : 
                               1261         [ +  + ]:       55289202 :         while (rdata_len > freespace)
                               1262                 :                :         {
                               1263                 :                :             /*
                               1264                 :                :              * Write what fits on this page, and continue on the next page.
                               1265                 :                :              */
                               1266   [ +  +  -  + ]:        1805989 :             Assert(CurrPos % XLOG_BLCKSZ >= SizeOfXLogShortPHD || freespace == 0);
                               1267                 :        1805989 :             memcpy(currpos, rdata_data, freespace);
                               1268                 :        1805989 :             rdata_data += freespace;
                               1269                 :        1805989 :             rdata_len -= freespace;
                               1270                 :        1805989 :             written += freespace;
                               1271                 :        1805989 :             CurrPos += freespace;
                               1272                 :                : 
                               1273                 :                :             /*
                               1274                 :                :              * Get pointer to beginning of next page, and set the xlp_rem_len
                               1275                 :                :              * in the page header. Set XLP_FIRST_IS_CONTRECORD.
                               1276                 :                :              *
                               1277                 :                :              * It's safe to set the contrecord flag and xlp_rem_len without a
                               1278                 :                :              * lock on the page. All the other flags were already set when the
                               1279                 :                :              * page was initialized, in AdvanceXLInsertBuffer, and we're the
                               1280                 :                :              * only backend that needs to set the contrecord flag.
                               1281                 :                :              */
 1504 rhaas@postgresql.org     1282                 :        1805989 :             currpos = GetXLogBuffer(CurrPos, tli);
 4546 heikki.linnakangas@i     1283                 :        1805989 :             pagehdr = (XLogPageHeader) currpos;
                               1284                 :        1805989 :             pagehdr->xlp_rem_len = write_len - written;
                               1285                 :        1805989 :             pagehdr->xlp_info |= XLP_FIRST_IS_CONTRECORD;
                               1286                 :                : 
                               1287                 :                :             /* skip over the page header */
 3012 andres@anarazel.de       1288         [ +  + ]:        1805989 :             if (XLogSegmentOffset(CurrPos, wal_segment_size) == 0)
                               1289                 :                :             {
 4546 heikki.linnakangas@i     1290                 :           1200 :                 CurrPos += SizeOfXLogLongPHD;
                               1291                 :           1200 :                 currpos += SizeOfXLogLongPHD;
                               1292                 :                :             }
                               1293                 :                :             else
                               1294                 :                :             {
                               1295                 :        1804789 :                 CurrPos += SizeOfXLogShortPHD;
                               1296                 :        1804789 :                 currpos += SizeOfXLogShortPHD;
                               1297                 :                :             }
                               1298         [ +  - ]:        1805989 :             freespace = INSERT_FREESPACE(CurrPos);
                               1299                 :                :         }
                               1300                 :                : 
                               1301   [ -  +  -  - ]:       53483213 :         Assert(CurrPos % XLOG_BLCKSZ >= SizeOfXLogShortPHD || rdata_len == 0);
                               1302                 :       53483213 :         memcpy(currpos, rdata_data, rdata_len);
                               1303                 :       53483213 :         currpos += rdata_len;
                               1304                 :       53483213 :         CurrPos += rdata_len;
                               1305                 :       53483213 :         freespace -= rdata_len;
                               1306                 :       53483213 :         written += rdata_len;
                               1307                 :                : 
                               1308                 :       53483213 :         rdata = rdata->next;
                               1309                 :                :     }
                               1310         [ -  + ]:       14161962 :     Assert(written == write_len);
                               1311                 :                : 
                               1312                 :                :     /*
                               1313                 :                :      * If this was an xlog-switch, it's not enough to write the switch record,
                               1314                 :                :      * we also have to consume all the remaining space in the WAL segment.  We
                               1315                 :                :      * have already reserved that space, but we need to actually fill it.
                               1316                 :                :      */
 3012 andres@anarazel.de       1317   [ +  +  +  - ]:       14161962 :     if (isLogSwitch && XLogSegmentOffset(CurrPos, wal_segment_size) != 0)
                               1318                 :                :     {
                               1319                 :                :         /* An xlog-switch record doesn't contain any data besides the header */
 4546 heikki.linnakangas@i     1320         [ -  + ]:            645 :         Assert(write_len == SizeOfXLogRecord);
                               1321                 :                : 
                               1322                 :                :         /* Assert that we did reserve the right amount of space */
 3012 andres@anarazel.de       1323         [ -  + ]:            645 :         Assert(XLogSegmentOffset(EndPos, wal_segment_size) == 0);
                               1324                 :                : 
                               1325                 :                :         /* Use up all the remaining space on the current page */
 4546 heikki.linnakangas@i     1326                 :            645 :         CurrPos += freespace;
                               1327                 :                : 
                               1328                 :                :         /*
                               1329                 :                :          * Cause all remaining pages in the segment to be flushed, leaving the
                               1330                 :                :          * XLog position where it should be, at the start of the next segment.
                               1331                 :                :          * We do this one page at a time, to make sure we don't deadlock
                               1332                 :                :          * against ourselves if wal_buffers < wal_segment_size.
                               1333                 :                :          */
                               1334         [ +  + ]:         538036 :         while (CurrPos < EndPos)
                               1335                 :                :         {
                               1336                 :                :             /*
                               1337                 :                :              * The minimal action to flush the page would be to call
                               1338                 :                :              * WALInsertLockUpdateInsertingAt(CurrPos) followed by
                               1339                 :                :              * AdvanceXLInsertBuffer(...).  The page would be left initialized
                               1340                 :                :              * mostly to zeros, except for the page header (always the short
                               1341                 :                :              * variant, as this is never a segment's first page).
                               1342                 :                :              *
                               1343                 :                :              * The large vistas of zeros are good for compressibility, but the
                               1344                 :                :              * headers interrupting them every XLOG_BLCKSZ (with values that
                               1345                 :                :              * differ from page to page) are not.  The effect varies with
                               1346                 :                :              * compression tool, but bzip2 for instance compresses about an
                               1347                 :                :              * order of magnitude worse if those headers are left in place.
                               1348                 :                :              *
                               1349                 :                :              * Rather than complicating AdvanceXLInsertBuffer itself (which is
                               1350                 :                :              * called in heavily-loaded circumstances as well as this lightly-
                               1351                 :                :              * loaded one) with variant behavior, we just use GetXLogBuffer
                               1352                 :                :              * (which itself calls the two methods we need) to get the pointer
                               1353                 :                :              * and zero most of the page.  Then we just zero the page header.
                               1354                 :                :              */
 1504 rhaas@postgresql.org     1355                 :         537391 :             currpos = GetXLogBuffer(CurrPos, tli);
 2820 tgl@sss.pgh.pa.us        1356   [ +  -  +  -  :        2149564 :             MemSet(currpos, 0, SizeOfXLogShortPHD);
                                     +  -  +  -  +  
                                                 + ]
                               1357                 :                : 
 4546 heikki.linnakangas@i     1358                 :         537391 :             CurrPos += XLOG_BLCKSZ;
                               1359                 :                :         }
                               1360                 :                :     }
                               1361                 :                :     else
                               1362                 :                :     {
                               1363                 :                :         /* Align the end position, so that the next record starts aligned */
 4046                          1364                 :       14161317 :         CurrPos = MAXALIGN64(CurrPos);
                               1365                 :                :     }
                               1366                 :                : 
 4546                          1367         [ -  + ]:       14161962 :     if (CurrPos != EndPos)
  624 dgustafsson@postgres     1368         [ #  # ]:UBC           0 :         ereport(PANIC,
                               1369                 :                :                 errcode(ERRCODE_DATA_CORRUPTED),
                               1370                 :                :                 errmsg_internal("space reserved for WAL record does not match what was written"));
 4546 heikki.linnakangas@i     1371                 :CBC    14161962 : }
                               1372                 :                : 
                               1373                 :                : /*
                               1374                 :                :  * Acquire a WAL insertion lock, for inserting to WAL.
                               1375                 :                :  */
                               1376                 :                : static void
 4290                          1377                 :       14167518 : WALInsertLockAcquire(void)
                               1378                 :                : {
                               1379                 :                :     bool        immed;
                               1380                 :                : 
                               1381                 :                :     /*
                               1382                 :                :      * It doesn't matter which of the WAL insertion locks we acquire, so try
                               1383                 :                :      * the one we used last time.  If the system isn't particularly busy, it's
                               1384                 :                :      * a good bet that it's still available, and it's good to have some
                               1385                 :                :      * affinity to a particular lock so that you don't unnecessarily bounce
                               1386                 :                :      * cache lines between processes when there's no contention.
                               1387                 :                :      *
                               1388                 :                :      * If this is the first time through in this backend, pick a lock
                               1389                 :                :      * (semi-)randomly.  This allows the locks to be used evenly if you have a
                               1390                 :                :      * lot of very short connections.
                               1391                 :                :      */
                               1392                 :                :     static int  lockToTry = -1;
                               1393                 :                : 
                               1394         [ +  + ]:       14167518 :     if (lockToTry == -1)
  665                          1395                 :           7483 :         lockToTry = MyProcNumber % NUM_XLOGINSERT_LOCKS;
 4290                          1396                 :       14167518 :     MyLockNo = lockToTry;
                               1397                 :                : 
                               1398                 :                :     /*
                               1399                 :                :      * The insertingAt value is initially set to 0, as we don't know our
                               1400                 :                :      * insert location yet.
                               1401                 :                :      */
 3793 andres@anarazel.de       1402                 :       14167518 :     immed = LWLockAcquire(&WALInsertLocks[MyLockNo].l.lock, LW_EXCLUSIVE);
 4290 heikki.linnakangas@i     1403         [ +  + ]:       14167518 :     if (!immed)
                               1404                 :                :     {
                               1405                 :                :         /*
                               1406                 :                :          * If we couldn't get the lock immediately, try another lock next
                               1407                 :                :          * time.  On a system with more insertion locks than concurrent
                               1408                 :                :          * inserters, this causes all the inserters to eventually migrate to a
                               1409                 :                :          * lock that no-one else is using.  On a system with more inserters
                               1410                 :                :          * than locks, it still helps to distribute the inserters evenly
                               1411                 :                :          * across the locks.
                               1412                 :                :          */
 4096                          1413                 :          19253 :         lockToTry = (lockToTry + 1) % NUM_XLOGINSERT_LOCKS;
                               1414                 :                :     }
 4546                          1415                 :       14167518 : }
                               1416                 :                : 
                               1417                 :                : /*
                               1418                 :                :  * Acquire all WAL insertion locks, to prevent other backends from inserting
                               1419                 :                :  * to WAL.
                               1420                 :                :  */
                               1421                 :                : static void
 4290                          1422                 :           4356 : WALInsertLockAcquireExclusive(void)
                               1423                 :                : {
                               1424                 :                :     int         i;
                               1425                 :                : 
                               1426                 :                :     /*
                               1427                 :                :      * When holding all the locks, all but the last lock's insertingAt
                               1428                 :                :      * indicator is set to 0xFFFFFFFFFFFFFFFF, which is higher than any real
                               1429                 :                :      * XLogRecPtr value, to make sure that no-one blocks waiting on those.
                               1430                 :                :      */
 4096                          1431         [ +  + ]:          34848 :     for (i = 0; i < NUM_XLOGINSERT_LOCKS - 1; i++)
                               1432                 :                :     {
 3793 andres@anarazel.de       1433                 :          30492 :         LWLockAcquire(&WALInsertLocks[i].l.lock, LW_EXCLUSIVE);
                               1434                 :          30492 :         LWLockUpdateVar(&WALInsertLocks[i].l.lock,
                               1435                 :          30492 :                         &WALInsertLocks[i].l.insertingAt,
                               1436                 :                :                         PG_UINT64_MAX);
                               1437                 :                :     }
                               1438                 :                :     /* Variable value reset to 0 at release */
                               1439                 :           4356 :     LWLockAcquire(&WALInsertLocks[i].l.lock, LW_EXCLUSIVE);
                               1440                 :                : 
 4290 heikki.linnakangas@i     1441                 :           4356 :     holdingAllLocks = true;
 4546                          1442                 :           4356 : }
                               1443                 :                : 
                               1444                 :                : /*
                               1445                 :                :  * Release our insertion lock (or locks, if we're holding them all).
                               1446                 :                :  *
                               1447                 :                :  * NB: Reset all variables to 0, so they cause LWLockWaitForVar to block the
                               1448                 :                :  * next time the lock is acquired.
                               1449                 :                :  */
                               1450                 :                : static void
 4290                          1451                 :       14171874 : WALInsertLockRelease(void)
                               1452                 :                : {
                               1453         [ +  + ]:       14171874 :     if (holdingAllLocks)
                               1454                 :                :     {
                               1455                 :                :         int         i;
                               1456                 :                : 
 4096                          1457         [ +  + ]:          39204 :         for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
 3793 andres@anarazel.de       1458                 :          34848 :             LWLockReleaseClearVar(&WALInsertLocks[i].l.lock,
                               1459                 :          34848 :                                   &WALInsertLocks[i].l.insertingAt,
                               1460                 :                :                                   0);
                               1461                 :                : 
 4290 heikki.linnakangas@i     1462                 :           4356 :         holdingAllLocks = false;
                               1463                 :                :     }
                               1464                 :                :     else
                               1465                 :                :     {
 3793 andres@anarazel.de       1466                 :       14167518 :         LWLockReleaseClearVar(&WALInsertLocks[MyLockNo].l.lock,
                               1467                 :       14167518 :                               &WALInsertLocks[MyLockNo].l.insertingAt,
                               1468                 :                :                               0);
                               1469                 :                :     }
 4546 heikki.linnakangas@i     1470                 :       14171874 : }
                               1471                 :                : 
                               1472                 :                : /*
                               1473                 :                :  * Update our insertingAt value, to let others know that we've finished
                               1474                 :                :  * inserting up to that point.
                               1475                 :                :  */
                               1476                 :                : static void
 4290                          1477                 :        2284313 : WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt)
                               1478                 :                : {
                               1479         [ +  + ]:        2284313 :     if (holdingAllLocks)
                               1480                 :                :     {
                               1481                 :                :         /*
                               1482                 :                :          * We use the last lock to mark our actual position, see comments in
                               1483                 :                :          * WALInsertLockAcquireExclusive.
                               1484                 :                :          */
 4096                          1485                 :         536084 :         LWLockUpdateVar(&WALInsertLocks[NUM_XLOGINSERT_LOCKS - 1].l.lock,
 3102 tgl@sss.pgh.pa.us        1486                 :         536084 :                         &WALInsertLocks[NUM_XLOGINSERT_LOCKS - 1].l.insertingAt,
                               1487                 :                :                         insertingAt);
                               1488                 :                :     }
                               1489                 :                :     else
 4290 heikki.linnakangas@i     1490                 :        1748229 :         LWLockUpdateVar(&WALInsertLocks[MyLockNo].l.lock,
                               1491                 :        1748229 :                         &WALInsertLocks[MyLockNo].l.insertingAt,
                               1492                 :                :                         insertingAt);
 4546                          1493                 :        2284313 : }
                               1494                 :                : 
                               1495                 :                : /*
                               1496                 :                :  * Wait for any WAL insertions < upto to finish.
                               1497                 :                :  *
                               1498                 :                :  * Returns the location of the oldest insertion that is still in-progress.
                               1499                 :                :  * Any WAL prior to that point has been fully copied into WAL buffers, and
                               1500                 :                :  * can be flushed out to disk. Because this waits for any insertions older
                               1501                 :                :  * than 'upto' to finish, the return value is always >= 'upto'.
                               1502                 :                :  *
                               1503                 :                :  * Note: When you are about to write out WAL, you must call this function
                               1504                 :                :  * *before* acquiring WALWriteLock, to avoid deadlocks. This function might
                               1505                 :                :  * need to wait for an insertion to finish (or at least advance to next
                               1506                 :                :  * uninitialized page), and the inserter might need to evict an old WAL buffer
                               1507                 :                :  * to make room for a new one, which in turn requires WALWriteLock.
                               1508                 :                :  */
                               1509                 :                : static XLogRecPtr
                               1510                 :        2168315 : WaitXLogInsertionsToFinish(XLogRecPtr upto)
                               1511                 :                : {
                               1512                 :                :     uint64      bytepos;
                               1513                 :                :     XLogRecPtr  inserted;
                               1514                 :                :     XLogRecPtr  reservedUpto;
                               1515                 :                :     XLogRecPtr  finishedUpto;
 4105 andres@anarazel.de       1516                 :        2168315 :     XLogCtlInsert *Insert = &XLogCtl->Insert;
                               1517                 :                :     int         i;
                               1518                 :                : 
 4546 heikki.linnakangas@i     1519         [ -  + ]:        2168315 :     if (MyProc == NULL)
 4546 heikki.linnakangas@i     1520         [ #  # ]:UBC           0 :         elog(PANIC, "cannot wait without a PGPROC structure");
                               1521                 :                : 
                               1522                 :                :     /*
                               1523                 :                :      * Check if there's any work to do.  Use a barrier to ensure we get the
                               1524                 :                :      * freshest value.
                               1525                 :                :      */
  620 alvherre@alvh.no-ip.     1526                 :CBC     2168315 :     inserted = pg_atomic_read_membarrier_u64(&XLogCtl->logInsertResult);
                               1527         [ +  + ]:        2168315 :     if (upto <= inserted)
                               1528                 :        1695312 :         return inserted;
                               1529                 :                : 
                               1530                 :                :     /* Read the current insert position */
 4546 heikki.linnakangas@i     1531         [ +  + ]:         473003 :     SpinLockAcquire(&Insert->insertpos_lck);
                               1532                 :         473003 :     bytepos = Insert->CurrBytePos;
                               1533                 :         473003 :     SpinLockRelease(&Insert->insertpos_lck);
                               1534                 :         473003 :     reservedUpto = XLogBytePosToEndRecPtr(bytepos);
                               1535                 :                : 
                               1536                 :                :     /*
                               1537                 :                :      * No-one should request to flush a piece of WAL that hasn't even been
                               1538                 :                :      * reserved yet. However, it can happen if there is a block with a bogus
                               1539                 :                :      * LSN on disk, for example. XLogFlush checks for that situation and
                               1540                 :                :      * complains, but only after the flush. Here we just assume that to mean
                               1541                 :                :      * that all WAL that has been reserved needs to be finished. In this
                               1542                 :                :      * corner-case, the return value can be smaller than 'upto' argument.
                               1543                 :                :      */
                               1544         [ -  + ]:         473003 :     if (upto > reservedUpto)
                               1545                 :                :     {
 1840 peter@eisentraut.org     1546         [ #  # ]:UBC           0 :         ereport(LOG,
                               1547                 :                :                 errmsg("request to flush past end of generated WAL; request %X/%08X, current position %X/%08X",
                               1548                 :                :                        LSN_FORMAT_ARGS(upto), LSN_FORMAT_ARGS(reservedUpto)));
 4546 heikki.linnakangas@i     1549                 :              0 :         upto = reservedUpto;
                               1550                 :                :     }
                               1551                 :                : 
                               1552                 :                :     /*
                               1553                 :                :      * Loop through all the locks, sleeping on any in-progress insert older
                               1554                 :                :      * than 'upto'.
                               1555                 :                :      *
                               1556                 :                :      * finishedUpto is our return value, indicating the point upto which all
                               1557                 :                :      * the WAL insertions have been finished. Initialize it to the head of
                               1558                 :                :      * reserved WAL, and as we iterate through the insertion locks, back it
                               1559                 :                :      * out for any insertion that's still in progress.
                               1560                 :                :      */
 4546 heikki.linnakangas@i     1561                 :CBC      473003 :     finishedUpto = reservedUpto;
 4096                          1562         [ +  + ]:        4257027 :     for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
                               1563                 :                :     {
 4244 bruce@momjian.us         1564                 :        3784024 :         XLogRecPtr  insertingat = InvalidXLogRecPtr;
                               1565                 :                : 
                               1566                 :                :         do
                               1567                 :                :         {
                               1568                 :                :             /*
                               1569                 :                :              * See if this insertion is in progress.  LWLockWaitForVar will
                               1570                 :                :              * wait for the lock to be released, or for the 'value' to be set
                               1571                 :                :              * by a LWLockUpdateVar call.  When a lock is initially acquired,
                               1572                 :                :              * its value is 0 (InvalidXLogRecPtr), which means that we don't
                               1573                 :                :              * know where it's inserting yet.  We will have to wait for it. If
                               1574                 :                :              * it's a small insertion, the record will most likely fit on the
                               1575                 :                :              * same page and the inserter will release the lock without ever
                               1576                 :                :              * calling LWLockUpdateVar.  But if it has to sleep, it will
                               1577                 :                :              * advertise the insertion point with LWLockUpdateVar before
                               1578                 :                :              * sleeping.
                               1579                 :                :              *
                               1580                 :                :              * In this loop we are only waiting for insertions that started
                               1581                 :                :              * before WaitXLogInsertionsToFinish was called.  The lack of
                               1582                 :                :              * memory barriers in the loop means that we might see locks as
                               1583                 :                :              * "unused" that have since become used.  This is fine because
                               1584                 :                :              * they only can be used for later insertions that we would not
                               1585                 :                :              * want to wait on anyway.  Not taking a lock to acquire the
                               1586                 :                :              * current insertingAt value means that we might see older
                               1587                 :                :              * insertingAt values.  This is also fine, because if we read a
                               1588                 :                :              * value too old, we will add ourselves to the wait queue, which
                               1589                 :                :              * contains atomic operations.
                               1590                 :                :              */
 4290 heikki.linnakangas@i     1591         [ +  + ]:        3922653 :             if (LWLockWaitForVar(&WALInsertLocks[i].l.lock,
                               1592                 :        3922653 :                                  &WALInsertLocks[i].l.insertingAt,
                               1593                 :                :                                  insertingat, &insertingat))
                               1594                 :                :             {
                               1595                 :                :                 /* the lock was free, so no insertion in progress */
                               1596                 :        2661276 :                 insertingat = InvalidXLogRecPtr;
                               1597                 :        2661276 :                 break;
                               1598                 :                :             }
                               1599                 :                : 
                               1600                 :                :             /*
                               1601                 :                :              * This insertion is still in progress. Have to wait, unless the
                               1602                 :                :              * inserter has proceeded past 'upto'.
                               1603                 :                :              */
                               1604         [ +  + ]:        1261377 :         } while (insertingat < upto);
                               1605                 :                : 
   42 alvherre@kurilemu.de     1606   [ +  +  +  + ]:GNC     3784024 :         if (XLogRecPtrIsValid(insertingat) && insertingat < finishedUpto)
 4290 heikki.linnakangas@i     1607                 :CBC      422363 :             finishedUpto = insertingat;
                               1608                 :                :     }
                               1609                 :                : 
                               1610                 :                :     /*
                               1611                 :                :      * Advance the limit we know to have been inserted and return the freshest
                               1612                 :                :      * value we know of, which might be beyond what we requested if somebody
                               1613                 :                :      * is concurrently doing this with an 'upto' pointer ahead of us.
                               1614                 :                :      */
  620 alvherre@alvh.no-ip.     1615                 :         473003 :     finishedUpto = pg_atomic_monotonic_advance_u64(&XLogCtl->logInsertResult,
                               1616                 :                :                                                    finishedUpto);
                               1617                 :                : 
 4546 heikki.linnakangas@i     1618                 :         473003 :     return finishedUpto;
                               1619                 :                : }
                               1620                 :                : 
                               1621                 :                : /*
                               1622                 :                :  * Get a pointer to the right location in the WAL buffer containing the
                               1623                 :                :  * given XLogRecPtr.
                               1624                 :                :  *
                               1625                 :                :  * If the page is not initialized yet, it is initialized. That might require
                               1626                 :                :  * evicting an old dirty buffer from the buffer cache, which means I/O.
                               1627                 :                :  *
                               1628                 :                :  * The caller must ensure that the page containing the requested location
                               1629                 :                :  * isn't evicted yet, and won't be evicted. The way to ensure that is to
                               1630                 :                :  * hold onto a WAL insertion lock with the insertingAt position set to
                               1631                 :                :  * something <= ptr. GetXLogBuffer() will update insertingAt if it needs
                               1632                 :                :  * to evict an old page from the buffer. (This means that once you call
                               1633                 :                :  * GetXLogBuffer() with a given 'ptr', you must not access anything before
                               1634                 :                :  * that point anymore, and must not call GetXLogBuffer() with an older 'ptr'
                               1635                 :                :  * later, because older buffers might be recycled already)
                               1636                 :                :  */
                               1637                 :                : static char *
 1504 rhaas@postgresql.org     1638                 :       16505352 : GetXLogBuffer(XLogRecPtr ptr, TimeLineID tli)
                               1639                 :                : {
                               1640                 :                :     int         idx;
                               1641                 :                :     XLogRecPtr  endptr;
                               1642                 :                :     static uint64 cachedPage = 0;
                               1643                 :                :     static char *cachedPos = NULL;
                               1644                 :                :     XLogRecPtr  expectedEndPtr;
                               1645                 :                : 
                               1646                 :                :     /*
                               1647                 :                :      * Fast path for the common case that we need to access again the same
                               1648                 :                :      * page as last time.
                               1649                 :                :      */
 4546 heikki.linnakangas@i     1650         [ +  + ]:       16505352 :     if (ptr / XLOG_BLCKSZ == cachedPage)
                               1651                 :                :     {
                               1652         [ -  + ]:       13801017 :         Assert(((XLogPageHeader) cachedPos)->xlp_magic == XLOG_PAGE_MAGIC);
                               1653         [ -  + ]:       13801017 :         Assert(((XLogPageHeader) cachedPos)->xlp_pageaddr == ptr - (ptr % XLOG_BLCKSZ));
                               1654                 :       13801017 :         return cachedPos + ptr % XLOG_BLCKSZ;
                               1655                 :                :     }
                               1656                 :                : 
                               1657                 :                :     /*
                               1658                 :                :      * The XLog buffer cache is organized so that a page is always loaded to a
                               1659                 :                :      * particular buffer.  That way we can easily calculate the buffer a given
                               1660                 :                :      * page must be loaded into, from the XLogRecPtr alone.
                               1661                 :                :      */
                               1662                 :        2704335 :     idx = XLogRecPtrToBufIdx(ptr);
                               1663                 :                : 
                               1664                 :                :     /*
                               1665                 :                :      * See what page is loaded in the buffer at the moment. It could be the
                               1666                 :                :      * page we're looking for, or something older. It can't be anything newer
                               1667                 :                :      * - that would imply the page we're looking for has already been written
                               1668                 :                :      * out to disk and evicted, and the caller is responsible for making sure
                               1669                 :                :      * that doesn't happen.
                               1670                 :                :      *
                               1671                 :                :      * We don't hold a lock while we read the value. If someone is just about
                               1672                 :                :      * to initialize or has just initialized the page, it's possible that we
                               1673                 :                :      * get InvalidXLogRecPtr. That's ok, we'll grab the mapping lock (in
                               1674                 :                :      * AdvanceXLInsertBuffer) and retry if we see anything other than the page
                               1675                 :                :      * we're looking for.
                               1676                 :                :      */
                               1677                 :        2704335 :     expectedEndPtr = ptr;
                               1678                 :        2704335 :     expectedEndPtr += XLOG_BLCKSZ - ptr % XLOG_BLCKSZ;
                               1679                 :                : 
  730 jdavis@postgresql.or     1680                 :        2704335 :     endptr = pg_atomic_read_u64(&XLogCtl->xlblocks[idx]);
 4546 heikki.linnakangas@i     1681         [ +  + ]:        2704335 :     if (expectedEndPtr != endptr)
                               1682                 :                :     {
                               1683                 :                :         XLogRecPtr  initializedUpto;
                               1684                 :                : 
                               1685                 :                :         /*
                               1686                 :                :          * Before calling AdvanceXLInsertBuffer(), which can block, let others
                               1687                 :                :          * know how far we're finished with inserting the record.
                               1688                 :                :          *
                               1689                 :                :          * NB: If 'ptr' points to just after the page header, advertise a
                               1690                 :                :          * position at the beginning of the page rather than 'ptr' itself. If
                               1691                 :                :          * there are no other insertions running, someone might try to flush
                               1692                 :                :          * up to our advertised location. If we advertised a position after
                               1693                 :                :          * the page header, someone might try to flush the page header, even
                               1694                 :                :          * though page might actually not be initialized yet. As the first
                               1695                 :                :          * inserter on the page, we are effectively responsible for making
                               1696                 :                :          * sure that it's initialized, before we let insertingAt to move past
                               1697                 :                :          * the page header.
                               1698                 :                :          */
 3791                          1699         [ +  + ]:        2284313 :         if (ptr % XLOG_BLCKSZ == SizeOfXLogShortPHD &&
 3012 andres@anarazel.de       1700         [ +  - ]:           5486 :             XLogSegmentOffset(ptr, wal_segment_size) > XLOG_BLCKSZ)
 3791 heikki.linnakangas@i     1701                 :           5486 :             initializedUpto = ptr - SizeOfXLogShortPHD;
                               1702         [ +  + ]:        2278827 :         else if (ptr % XLOG_BLCKSZ == SizeOfXLogLongPHD &&
 3012 andres@anarazel.de       1703         [ +  + ]:            762 :                  XLogSegmentOffset(ptr, wal_segment_size) < XLOG_BLCKSZ)
 3791 heikki.linnakangas@i     1704                 :            535 :             initializedUpto = ptr - SizeOfXLogLongPHD;
                               1705                 :                :         else
                               1706                 :        2278292 :             initializedUpto = ptr;
                               1707                 :                : 
                               1708                 :        2284313 :         WALInsertLockUpdateInsertingAt(initializedUpto);
                               1709                 :                : 
 1504 rhaas@postgresql.org     1710                 :        2284313 :         AdvanceXLInsertBuffer(ptr, tli, false);
  730 jdavis@postgresql.or     1711                 :        2284313 :         endptr = pg_atomic_read_u64(&XLogCtl->xlblocks[idx]);
                               1712                 :                : 
 4546 heikki.linnakangas@i     1713         [ -  + ]:        2284313 :         if (expectedEndPtr != endptr)
  164 alvherre@kurilemu.de     1714         [ #  # ]:UNC           0 :             elog(PANIC, "could not find WAL buffer for %X/%08X",
                               1715                 :                :                  LSN_FORMAT_ARGS(ptr));
                               1716                 :                :     }
                               1717                 :                :     else
                               1718                 :                :     {
                               1719                 :                :         /*
                               1720                 :                :          * Make sure the initialization of the page is visible to us, and
                               1721                 :                :          * won't arrive later to overwrite the WAL data we write on the page.
                               1722                 :                :          */
 4546 heikki.linnakangas@i     1723                 :CBC      420022 :         pg_memory_barrier();
                               1724                 :                :     }
                               1725                 :                : 
                               1726                 :                :     /*
                               1727                 :                :      * Found the buffer holding this page. Return a pointer to the right
                               1728                 :                :      * offset within the page.
                               1729                 :                :      */
                               1730                 :        2704335 :     cachedPage = ptr / XLOG_BLCKSZ;
                               1731                 :        2704335 :     cachedPos = XLogCtl->pages + idx * (Size) XLOG_BLCKSZ;
                               1732                 :                : 
                               1733         [ -  + ]:        2704335 :     Assert(((XLogPageHeader) cachedPos)->xlp_magic == XLOG_PAGE_MAGIC);
                               1734         [ -  + ]:        2704335 :     Assert(((XLogPageHeader) cachedPos)->xlp_pageaddr == ptr - (ptr % XLOG_BLCKSZ));
                               1735                 :                : 
                               1736                 :        2704335 :     return cachedPos + ptr % XLOG_BLCKSZ;
                               1737                 :                : }
                               1738                 :                : 
                               1739                 :                : /*
                               1740                 :                :  * Read WAL data directly from WAL buffers, if available. Returns the number
                               1741                 :                :  * of bytes read successfully.
                               1742                 :                :  *
                               1743                 :                :  * Fewer than 'count' bytes may be read if some of the requested WAL data has
                               1744                 :                :  * already been evicted.
                               1745                 :                :  *
                               1746                 :                :  * No locks are taken.
                               1747                 :                :  *
                               1748                 :                :  * Caller should ensure that it reads no further than LogwrtResult.Write
                               1749                 :                :  * (which should have been updated by the caller when determining how far to
                               1750                 :                :  * read). The 'tli' argument is only used as a convenient safety check so that
                               1751                 :                :  * callers do not read from WAL buffers on a historical timeline.
                               1752                 :                :  */
                               1753                 :                : Size
  675 jdavis@postgresql.or     1754                 :         102593 : WALReadFromBuffers(char *dstbuf, XLogRecPtr startptr, Size count,
                               1755                 :                :                    TimeLineID tli)
                               1756                 :                : {
                               1757                 :         102593 :     char       *pdst = dstbuf;
                               1758                 :         102593 :     XLogRecPtr  recptr = startptr;
                               1759                 :                :     XLogRecPtr  inserted;
  671                          1760                 :         102593 :     Size        nbytes = count;
                               1761                 :                : 
  675                          1762   [ +  +  +  + ]:         102593 :     if (RecoveryInProgress() || tli != GetWALInsertionTimeLine())
                               1763                 :            719 :         return 0;
                               1764                 :                : 
   42 alvherre@kurilemu.de     1765         [ -  + ]:GNC      101874 :     Assert(XLogRecPtrIsValid(startptr));
                               1766                 :                : 
                               1767                 :                :     /*
                               1768                 :                :      * Caller should ensure that the requested data has been inserted into WAL
                               1769                 :                :      * buffers before we try to read it.
                               1770                 :                :      */
  620 alvherre@alvh.no-ip.     1771                 :CBC      101874 :     inserted = pg_atomic_read_u64(&XLogCtl->logInsertResult);
                               1772         [ -  + ]:         101874 :     if (startptr + count > inserted)
  620 alvherre@alvh.no-ip.     1773         [ #  # ]:UBC           0 :         ereport(ERROR,
                               1774                 :                :                 errmsg("cannot read past end of generated WAL: requested %X/%08X, current position %X/%08X",
                               1775                 :                :                        LSN_FORMAT_ARGS(startptr + count),
                               1776                 :                :                        LSN_FORMAT_ARGS(inserted)));
                               1777                 :                : 
                               1778                 :                :     /*
                               1779                 :                :      * Loop through the buffers without a lock. For each buffer, atomically
                               1780                 :                :      * read and verify the end pointer, then copy the data out, and finally
                               1781                 :                :      * re-read and re-verify the end pointer.
                               1782                 :                :      *
                               1783                 :                :      * Once a page is evicted, it never returns to the WAL buffers, so if the
                               1784                 :                :      * end pointer matches the expected end pointer before and after we copy
                               1785                 :                :      * the data, then the right page must have been present during the data
                               1786                 :                :      * copy. Read barriers are necessary to ensure that the data copy actually
                               1787                 :                :      * happens between the two verification steps.
                               1788                 :                :      *
                               1789                 :                :      * If either verification fails, we simply terminate the loop and return
                               1790                 :                :      * with the data that had been already copied out successfully.
                               1791                 :                :      */
  675 jdavis@postgresql.or     1792         [ +  + ]:CBC      127680 :     while (nbytes > 0)
                               1793                 :                :     {
                               1794                 :         120322 :         uint32      offset = recptr % XLOG_BLCKSZ;
                               1795                 :         120322 :         int         idx = XLogRecPtrToBufIdx(recptr);
                               1796                 :                :         XLogRecPtr  expectedEndPtr;
                               1797                 :                :         XLogRecPtr  endptr;
                               1798                 :                :         const char *page;
                               1799                 :                :         const char *psrc;
                               1800                 :                :         Size        npagebytes;
                               1801                 :                : 
                               1802                 :                :         /*
                               1803                 :                :          * Calculate the end pointer we expect in the xlblocks array if the
                               1804                 :                :          * correct page is present.
                               1805                 :                :          */
                               1806                 :         120322 :         expectedEndPtr = recptr + (XLOG_BLCKSZ - offset);
                               1807                 :                : 
                               1808                 :                :         /*
                               1809                 :                :          * First verification step: check that the correct page is present in
                               1810                 :                :          * the WAL buffers.
                               1811                 :                :          */
                               1812                 :         120322 :         endptr = pg_atomic_read_u64(&XLogCtl->xlblocks[idx]);
                               1813         [ +  + ]:         120322 :         if (expectedEndPtr != endptr)
                               1814                 :          94514 :             break;
                               1815                 :                : 
                               1816                 :                :         /*
                               1817                 :                :          * The correct page is present (or was at the time the endptr was
                               1818                 :                :          * read; must re-verify later). Calculate pointer to source data and
                               1819                 :                :          * determine how much data to read from this page.
                               1820                 :                :          */
                               1821                 :          25808 :         page = XLogCtl->pages + idx * (Size) XLOG_BLCKSZ;
                               1822                 :          25808 :         psrc = page + offset;
                               1823                 :          25808 :         npagebytes = Min(nbytes, XLOG_BLCKSZ - offset);
                               1824                 :                : 
                               1825                 :                :         /*
                               1826                 :                :          * Ensure that the data copy and the first verification step are not
                               1827                 :                :          * reordered.
                               1828                 :                :          */
                               1829                 :          25808 :         pg_read_barrier();
                               1830                 :                : 
                               1831                 :                :         /* data copy */
                               1832                 :          25808 :         memcpy(pdst, psrc, npagebytes);
                               1833                 :                : 
                               1834                 :                :         /*
                               1835                 :                :          * Ensure that the data copy and the second verification step are not
                               1836                 :                :          * reordered.
                               1837                 :                :          */
                               1838                 :          25808 :         pg_read_barrier();
                               1839                 :                : 
                               1840                 :                :         /*
                               1841                 :                :          * Second verification step: check that the page we read from wasn't
                               1842                 :                :          * evicted while we were copying the data.
                               1843                 :                :          */
                               1844                 :          25808 :         endptr = pg_atomic_read_u64(&XLogCtl->xlblocks[idx]);
                               1845         [ +  + ]:          25808 :         if (expectedEndPtr != endptr)
  675 jdavis@postgresql.or     1846                 :GBC           2 :             break;
                               1847                 :                : 
  675 jdavis@postgresql.or     1848                 :CBC       25806 :         pdst += npagebytes;
                               1849                 :          25806 :         recptr += npagebytes;
                               1850                 :          25806 :         nbytes -= npagebytes;
                               1851                 :                :     }
                               1852                 :                : 
                               1853         [ -  + ]:         101874 :     Assert(pdst - dstbuf <= count);
                               1854                 :                : 
                               1855                 :         101874 :     return pdst - dstbuf;
                               1856                 :                : }
                               1857                 :                : 
                               1858                 :                : /*
                               1859                 :                :  * Converts a "usable byte position" to XLogRecPtr. A usable byte position
                               1860                 :                :  * is the position starting from the beginning of WAL, excluding all WAL
                               1861                 :                :  * page headers.
                               1862                 :                :  */
                               1863                 :                : static XLogRecPtr
 4546 heikki.linnakangas@i     1864                 :       28326492 : XLogBytePosToRecPtr(uint64 bytepos)
                               1865                 :                : {
                               1866                 :                :     uint64      fullsegs;
                               1867                 :                :     uint64      fullpages;
                               1868                 :                :     uint64      bytesleft;
                               1869                 :                :     uint32      seg_offset;
                               1870                 :                :     XLogRecPtr  result;
                               1871                 :                : 
                               1872                 :       28326492 :     fullsegs = bytepos / UsableBytesInSegment;
                               1873                 :       28326492 :     bytesleft = bytepos % UsableBytesInSegment;
                               1874                 :                : 
                               1875         [ +  + ]:       28326492 :     if (bytesleft < XLOG_BLCKSZ - SizeOfXLogLongPHD)
                               1876                 :                :     {
                               1877                 :                :         /* fits on first page of segment */
                               1878                 :          51301 :         seg_offset = bytesleft + SizeOfXLogLongPHD;
                               1879                 :                :     }
                               1880                 :                :     else
                               1881                 :                :     {
                               1882                 :                :         /* account for the first page on segment with long header */
                               1883                 :       28275191 :         seg_offset = XLOG_BLCKSZ;
                               1884                 :       28275191 :         bytesleft -= XLOG_BLCKSZ - SizeOfXLogLongPHD;
                               1885                 :                : 
                               1886                 :       28275191 :         fullpages = bytesleft / UsableBytesInPage;
                               1887                 :       28275191 :         bytesleft = bytesleft % UsableBytesInPage;
                               1888                 :                : 
                               1889                 :       28275191 :         seg_offset += fullpages * XLOG_BLCKSZ + bytesleft + SizeOfXLogShortPHD;
                               1890                 :                :     }
                               1891                 :                : 
 2719 alvherre@alvh.no-ip.     1892                 :       28326492 :     XLogSegNoOffsetToRecPtr(fullsegs, seg_offset, wal_segment_size, result);
                               1893                 :                : 
 4546 heikki.linnakangas@i     1894                 :       28326492 :     return result;
                               1895                 :                : }
                               1896                 :                : 
                               1897                 :                : /*
                               1898                 :                :  * Like XLogBytePosToRecPtr, but if the position is at a page boundary,
                               1899                 :                :  * returns a pointer to the beginning of the page (ie. before page header),
                               1900                 :                :  * not to where the first xlog record on that page would go to. This is used
                               1901                 :                :  * when converting a pointer to the end of a record.
                               1902                 :                :  */
                               1903                 :                : static XLogRecPtr
                               1904                 :       14635675 : XLogBytePosToEndRecPtr(uint64 bytepos)
                               1905                 :                : {
                               1906                 :                :     uint64      fullsegs;
                               1907                 :                :     uint64      fullpages;
                               1908                 :                :     uint64      bytesleft;
                               1909                 :                :     uint32      seg_offset;
                               1910                 :                :     XLogRecPtr  result;
                               1911                 :                : 
                               1912                 :       14635675 :     fullsegs = bytepos / UsableBytesInSegment;
                               1913                 :       14635675 :     bytesleft = bytepos % UsableBytesInSegment;
                               1914                 :                : 
                               1915         [ +  + ]:       14635675 :     if (bytesleft < XLOG_BLCKSZ - SizeOfXLogLongPHD)
                               1916                 :                :     {
                               1917                 :                :         /* fits on first page of segment */
                               1918         [ +  + ]:          82142 :         if (bytesleft == 0)
                               1919                 :          55233 :             seg_offset = 0;
                               1920                 :                :         else
                               1921                 :          26909 :             seg_offset = bytesleft + SizeOfXLogLongPHD;
                               1922                 :                :     }
                               1923                 :                :     else
                               1924                 :                :     {
                               1925                 :                :         /* account for the first page on segment with long header */
                               1926                 :       14553533 :         seg_offset = XLOG_BLCKSZ;
                               1927                 :       14553533 :         bytesleft -= XLOG_BLCKSZ - SizeOfXLogLongPHD;
                               1928                 :                : 
                               1929                 :       14553533 :         fullpages = bytesleft / UsableBytesInPage;
                               1930                 :       14553533 :         bytesleft = bytesleft % UsableBytesInPage;
                               1931                 :                : 
                               1932         [ +  + ]:       14553533 :         if (bytesleft == 0)
                               1933                 :          14111 :             seg_offset += fullpages * XLOG_BLCKSZ + bytesleft;
                               1934                 :                :         else
                               1935                 :       14539422 :             seg_offset += fullpages * XLOG_BLCKSZ + bytesleft + SizeOfXLogShortPHD;
                               1936                 :                :     }
                               1937                 :                : 
 2719 alvherre@alvh.no-ip.     1938                 :       14635675 :     XLogSegNoOffsetToRecPtr(fullsegs, seg_offset, wal_segment_size, result);
                               1939                 :                : 
 4546 heikki.linnakangas@i     1940                 :       14635675 :     return result;
                               1941                 :                : }
                               1942                 :                : 
                               1943                 :                : /*
                               1944                 :                :  * Convert an XLogRecPtr to a "usable byte position".
                               1945                 :                :  */
                               1946                 :                : static uint64
                               1947                 :       42488275 : XLogRecPtrToBytePos(XLogRecPtr ptr)
                               1948                 :                : {
                               1949                 :                :     uint64      fullsegs;
                               1950                 :                :     uint32      fullpages;
                               1951                 :                :     uint32      offset;
                               1952                 :                :     uint64      result;
                               1953                 :                : 
 3012 andres@anarazel.de       1954                 :       42488275 :     XLByteToSeg(ptr, fullsegs, wal_segment_size);
                               1955                 :                : 
                               1956                 :       42488275 :     fullpages = (XLogSegmentOffset(ptr, wal_segment_size)) / XLOG_BLCKSZ;
 4546 heikki.linnakangas@i     1957                 :       42488275 :     offset = ptr % XLOG_BLCKSZ;
                               1958                 :                : 
                               1959         [ +  + ]:       42488275 :     if (fullpages == 0)
                               1960                 :                :     {
                               1961                 :          77671 :         result = fullsegs * UsableBytesInSegment;
                               1962         [ +  + ]:          77671 :         if (offset > 0)
                               1963                 :                :         {
                               1964         [ -  + ]:          76333 :             Assert(offset >= SizeOfXLogLongPHD);
                               1965                 :          76333 :             result += offset - SizeOfXLogLongPHD;
                               1966                 :                :         }
                               1967                 :                :     }
                               1968                 :                :     else
                               1969                 :                :     {
                               1970                 :       42410604 :         result = fullsegs * UsableBytesInSegment +
 4244 bruce@momjian.us         1971                 :       42410604 :             (XLOG_BLCKSZ - SizeOfXLogLongPHD) + /* account for first page */
 3102 tgl@sss.pgh.pa.us        1972                 :       42410604 :             (fullpages - 1) * UsableBytesInPage;    /* full pages */
 4546 heikki.linnakangas@i     1973         [ +  + ]:       42410604 :         if (offset > 0)
                               1974                 :                :         {
                               1975         [ -  + ]:       42396735 :             Assert(offset >= SizeOfXLogShortPHD);
                               1976                 :       42396735 :             result += offset - SizeOfXLogShortPHD;
                               1977                 :                :         }
                               1978                 :                :     }
                               1979                 :                : 
                               1980                 :       42488275 :     return result;
                               1981                 :                : }
                               1982                 :                : 
                               1983                 :                : /*
                               1984                 :                :  * Initialize XLOG buffers, writing out old buffers if they still contain
                               1985                 :                :  * unwritten data, upto the page containing 'upto'. Or if 'opportunistic' is
                               1986                 :                :  * true, initialize as many pages as we can without having to write out
                               1987                 :                :  * unwritten data. Any new pages are initialized to zeros, with pages headers
                               1988                 :                :  * initialized properly.
                               1989                 :                :  */
                               1990                 :                : static void
 1504 rhaas@postgresql.org     1991                 :        2287601 : AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, bool opportunistic)
                               1992                 :                : {
 9046 tgl@sss.pgh.pa.us        1993                 :        2287601 :     XLogCtlInsert *Insert = &XLogCtl->Insert;
                               1994                 :                :     int         nextidx;
                               1995                 :                :     XLogRecPtr  OldPageRqstPtr;
                               1996                 :                :     XLogwrtRqst WriteRqst;
 4546 heikki.linnakangas@i     1997                 :        2287601 :     XLogRecPtr  NewPageEndPtr = InvalidXLogRecPtr;
                               1998                 :                :     XLogRecPtr  NewPageBeginPtr;
                               1999                 :                :     XLogPageHeader NewPage;
 1185 tgl@sss.pgh.pa.us        2000                 :        2287601 :     int         npages pg_attribute_unused() = 0;
                               2001                 :                : 
  118 akorotkov@postgresql     2002                 :        2287601 :     LWLockAcquire(WALBufMappingLock, LW_EXCLUSIVE);
                               2003                 :                : 
                               2004                 :                :     /*
                               2005                 :                :      * Now that we have the lock, check if someone initialized the page
                               2006                 :                :      * already.
                               2007                 :                :      */
                               2008   [ +  +  +  + ]:        6727569 :     while (upto >= XLogCtl->InitializedUpTo || opportunistic)
                               2009                 :                :     {
                               2010                 :        4443256 :         nextidx = XLogRecPtrToBufIdx(XLogCtl->InitializedUpTo);
                               2011                 :                : 
                               2012                 :                :         /*
                               2013                 :                :          * Get ending-offset of the buffer page we need to replace (this may
                               2014                 :                :          * be zero if the buffer hasn't been used yet).  Fall through if it's
                               2015                 :                :          * already written out.
                               2016                 :                :          */
                               2017                 :        4443256 :         OldPageRqstPtr = pg_atomic_read_u64(&XLogCtl->xlblocks[nextidx]);
                               2018         [ +  + ]:        4443256 :         if (LogwrtResult.Write < OldPageRqstPtr)
                               2019                 :                :         {
                               2020                 :                :             /*
                               2021                 :                :              * Nope, got work to do. If we just want to pre-initialize as much
                               2022                 :                :              * as we can without flushing, give up now.
                               2023                 :                :              */
                               2024         [ +  + ]:        2046956 :             if (opportunistic)
                               2025                 :           3288 :                 break;
                               2026                 :                : 
                               2027                 :                :             /* Advance shared memory write request position */
 4105 andres@anarazel.de       2028         [ +  + ]:        2043668 :             SpinLockAcquire(&XLogCtl->info_lck);
                               2029         [ +  + ]:        2043668 :             if (XLogCtl->LogwrtRqst.Write < OldPageRqstPtr)
                               2030                 :         509243 :                 XLogCtl->LogwrtRqst.Write = OldPageRqstPtr;
                               2031                 :        2043668 :             SpinLockRelease(&XLogCtl->info_lck);
                               2032                 :                : 
                               2033                 :                :             /*
                               2034                 :                :              * Acquire an up-to-date LogwrtResult value and see if we still
                               2035                 :                :              * need to write it or if someone else already did.
                               2036                 :                :              */
  622 alvherre@alvh.no-ip.     2037                 :        2043668 :             RefreshXLogWriteResult(LogwrtResult);
 4546 heikki.linnakangas@i     2038         [ +  + ]:        2043668 :             if (LogwrtResult.Write < OldPageRqstPtr)
                               2039                 :                :             {
                               2040                 :                :                 /*
                               2041                 :                :                  * Must acquire write lock. Release WALBufMappingLock first,
                               2042                 :                :                  * to make sure that all insertions that we need to wait for
                               2043                 :                :                  * can finish (up to this same position). Otherwise we risk
                               2044                 :                :                  * deadlock.
                               2045                 :                :                  */
  118 akorotkov@postgresql     2046                 :        2028836 :                 LWLockRelease(WALBufMappingLock);
                               2047                 :                : 
 4546 heikki.linnakangas@i     2048                 :        2028836 :                 WaitXLogInsertionsToFinish(OldPageRqstPtr);
                               2049                 :                : 
                               2050                 :        2028836 :                 LWLockAcquire(WALWriteLock, LW_EXCLUSIVE);
                               2051                 :                : 
  624 alvherre@alvh.no-ip.     2052                 :        2028836 :                 RefreshXLogWriteResult(LogwrtResult);
 4546 heikki.linnakangas@i     2053         [ +  + ]:        2028836 :                 if (LogwrtResult.Write >= OldPageRqstPtr)
                               2054                 :                :                 {
                               2055                 :                :                     /* OK, someone wrote it already */
                               2056                 :         129388 :                     LWLockRelease(WALWriteLock);
                               2057                 :                :                 }
                               2058                 :                :                 else
                               2059                 :                :                 {
                               2060                 :                :                     /* Have to write it ourselves */
                               2061                 :                :                     TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_START();
                               2062                 :        1899448 :                     WriteRqst.Write = OldPageRqstPtr;
                               2063                 :        1899448 :                     WriteRqst.Flush = 0;
 1504 rhaas@postgresql.org     2064                 :        1899448 :                     XLogWrite(WriteRqst, tli, false);
 4546 heikki.linnakangas@i     2065                 :        1899448 :                     LWLockRelease(WALWriteLock);
  304 michael@paquier.xyz      2066                 :        1899448 :                     pgWalUsage.wal_buffers_full++;
                               2067                 :                :                     TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_DONE();
                               2068                 :                : 
                               2069                 :                :                     /*
                               2070                 :                :                      * Required for the flush of pending stats WAL data, per
                               2071                 :                :                      * update of pgWalUsage.
                               2072                 :                :                      */
  143                          2073                 :        1899448 :                     pgstat_report_fixed = true;
                               2074                 :                :                 }
                               2075                 :                :                 /* Re-acquire WALBufMappingLock and retry */
  118 akorotkov@postgresql     2076                 :        2028836 :                 LWLockAcquire(WALBufMappingLock, LW_EXCLUSIVE);
                               2077                 :        2028836 :                 continue;
                               2078                 :                :             }
                               2079                 :                :         }
                               2080                 :                : 
                               2081                 :                :         /*
                               2082                 :                :          * Now the next buffer slot is free and we can set it up to be the
                               2083                 :                :          * next output page.
                               2084                 :                :          */
                               2085                 :        2411132 :         NewPageBeginPtr = XLogCtl->InitializedUpTo;
 4546 heikki.linnakangas@i     2086                 :        2411132 :         NewPageEndPtr = NewPageBeginPtr + XLOG_BLCKSZ;
                               2087                 :                : 
  118 akorotkov@postgresql     2088         [ -  + ]:        2411132 :         Assert(XLogRecPtrToBufIdx(NewPageBeginPtr) == nextidx);
                               2089                 :                : 
 4546 heikki.linnakangas@i     2090                 :        2411132 :         NewPage = (XLogPageHeader) (XLogCtl->pages + nextidx * (Size) XLOG_BLCKSZ);
                               2091                 :                : 
                               2092                 :                :         /*
                               2093                 :                :          * Mark the xlblock with InvalidXLogRecPtr and issue a write barrier
                               2094                 :                :          * before initializing. Otherwise, the old page may be partially
                               2095                 :                :          * zeroed but look valid.
                               2096                 :                :          */
  730 jdavis@postgresql.or     2097                 :        2411132 :         pg_atomic_write_u64(&XLogCtl->xlblocks[nextidx], InvalidXLogRecPtr);
                               2098                 :        2411132 :         pg_write_barrier();
                               2099                 :                : 
                               2100                 :                :         /*
                               2101                 :                :          * Be sure to re-zero the buffer so that bytes beyond what we've
                               2102                 :                :          * written will look like zeroes and not valid XLOG records...
                               2103                 :                :          */
  309 peter@eisentraut.org     2104   [ +  -  +  -  :        2411132 :         MemSet(NewPage, 0, XLOG_BLCKSZ);
                                     +  -  -  +  -  
                                                 - ]
                               2105                 :                : 
                               2106                 :                :         /*
                               2107                 :                :          * Fill the new page's header
                               2108                 :                :          */
 3862 bruce@momjian.us         2109                 :        2411132 :         NewPage->xlp_magic = XLOG_PAGE_MAGIC;
                               2110                 :                : 
                               2111                 :                :         /* NewPage->xlp_info = 0; */ /* done by memset */
 1504 rhaas@postgresql.org     2112                 :        2411132 :         NewPage->xlp_tli = tli;
 3862 bruce@momjian.us         2113                 :        2411132 :         NewPage->xlp_pageaddr = NewPageBeginPtr;
                               2114                 :                : 
                               2115                 :                :         /* NewPage->xlp_rem_len = 0; */  /* done by memset */
                               2116                 :                : 
                               2117                 :                :         /*
                               2118                 :                :          * If online backup is not in progress, mark the header to indicate
                               2119                 :                :          * that WAL records beginning in this page have removable backup
                               2120                 :                :          * blocks.  This allows the WAL archiver to know whether it is safe to
                               2121                 :                :          * compress archived WAL data by transforming full-block records into
                               2122                 :                :          * the non-full-block format.  It is sufficient to record this at the
                               2123                 :                :          * page level because we force a page switch (in fact a segment
                               2124                 :                :          * switch) when starting a backup, so the flag will be off before any
                               2125                 :                :          * records can be written during the backup.  At the end of a backup,
                               2126                 :                :          * the last page will be marked as all unsafe when perhaps only part
                               2127                 :                :          * is unsafe, but at worst the archiver would miss the opportunity to
                               2128                 :                :          * compress a few records.
                               2129                 :                :          */
 1156 alvherre@alvh.no-ip.     2130         [ +  + ]:        2411132 :         if (Insert->runningBackups == 0)
 3862 bruce@momjian.us         2131                 :        2283405 :             NewPage->xlp_info |= XLP_BKP_REMOVABLE;
                               2132                 :                : 
                               2133                 :                :         /*
                               2134                 :                :          * If first page of an XLOG segment file, make it a long header.
                               2135                 :                :          */
 3012 andres@anarazel.de       2136         [ +  + ]:        2411132 :         if ((XLogSegmentOffset(NewPage->xlp_pageaddr, wal_segment_size)) == 0)
                               2137                 :                :         {
 4546 heikki.linnakangas@i     2138                 :           1784 :             XLogLongPageHeader NewLongPage = (XLogLongPageHeader) NewPage;
                               2139                 :                : 
                               2140                 :           1784 :             NewLongPage->xlp_sysid = ControlFile->system_identifier;
 3012 andres@anarazel.de       2141                 :           1784 :             NewLongPage->xlp_seg_size = wal_segment_size;
 4546 heikki.linnakangas@i     2142                 :           1784 :             NewLongPage->xlp_xlog_blcksz = XLOG_BLCKSZ;
 3862 bruce@momjian.us         2143                 :           1784 :             NewPage->xlp_info |= XLP_LONG_HEADER;
                               2144                 :                :         }
                               2145                 :                : 
                               2146                 :                :         /*
                               2147                 :                :          * Make sure the initialization of the page becomes visible to others
                               2148                 :                :          * before the xlblocks update. GetXLogBuffer() reads xlblocks without
                               2149                 :                :          * holding a lock.
                               2150                 :                :          */
 4546 heikki.linnakangas@i     2151                 :        2411132 :         pg_write_barrier();
                               2152                 :                : 
  730 jdavis@postgresql.or     2153                 :        2411132 :         pg_atomic_write_u64(&XLogCtl->xlblocks[nextidx], NewPageEndPtr);
  118 akorotkov@postgresql     2154                 :        2411132 :         XLogCtl->InitializedUpTo = NewPageEndPtr;
                               2155                 :                : 
 4546 heikki.linnakangas@i     2156                 :        2411132 :         npages++;
                               2157                 :                :     }
  118 akorotkov@postgresql     2158                 :        2287601 :     LWLockRelease(WALBufMappingLock);
                               2159                 :                : 
                               2160                 :                : #ifdef WAL_DEBUG
                               2161                 :                :     if (XLOG_DEBUG && npages > 0)
                               2162                 :                :     {
                               2163                 :                :         elog(DEBUG1, "initialized %d pages, up to %X/%08X",
                               2164                 :                :              npages, LSN_FORMAT_ARGS(NewPageEndPtr));
                               2165                 :                :     }
                               2166                 :                : #endif
 9579 vadim4o@yahoo.com        2167                 :        2287601 : }
                               2168                 :                : 
                               2169                 :                : /*
                               2170                 :                :  * Calculate CheckPointSegments based on max_wal_size_mb and
                               2171                 :                :  * checkpoint_completion_target.
                               2172                 :                :  */
                               2173                 :                : static void
 3951 heikki.linnakangas@i     2174                 :           7791 : CalculateCheckpointSegments(void)
                               2175                 :                : {
                               2176                 :                :     double      target;
                               2177                 :                : 
                               2178                 :                :     /*-------
                               2179                 :                :      * Calculate the distance at which to trigger a checkpoint, to avoid
                               2180                 :                :      * exceeding max_wal_size_mb. This is based on two assumptions:
                               2181                 :                :      *
                               2182                 :                :      * a) we keep WAL for only one checkpoint cycle (prior to PG11 we kept
                               2183                 :                :      *    WAL for two checkpoint cycles to allow us to recover from the
                               2184                 :                :      *    secondary checkpoint if the first checkpoint failed, though we
                               2185                 :                :      *    only did this on the primary anyway, not on standby. Keeping just
                               2186                 :                :      *    one checkpoint simplifies processing and reduces disk space in
                               2187                 :                :      *    many smaller databases.)
                               2188                 :                :      * b) during checkpoint, we consume checkpoint_completion_target *
                               2189                 :                :      *    number of segments consumed between checkpoints.
                               2190                 :                :      *-------
                               2191                 :                :      */
 3012 andres@anarazel.de       2192                 :           7791 :     target = (double) ConvertToXSegs(max_wal_size_mb, wal_segment_size) /
 2963 simon@2ndQuadrant.co     2193                 :           7791 :         (1.0 + CheckPointCompletionTarget);
                               2194                 :                : 
                               2195                 :                :     /* round down */
 3951 heikki.linnakangas@i     2196                 :           7791 :     CheckPointSegments = (int) target;
                               2197                 :                : 
                               2198         [ +  + ]:           7791 :     if (CheckPointSegments < 1)
                               2199                 :             10 :         CheckPointSegments = 1;
                               2200                 :           7791 : }
                               2201                 :                : 
                               2202                 :                : void
                               2203                 :           5705 : assign_max_wal_size(int newval, void *extra)
                               2204                 :                : {
 3180 simon@2ndQuadrant.co     2205                 :           5705 :     max_wal_size_mb = newval;
 3951 heikki.linnakangas@i     2206                 :           5705 :     CalculateCheckpointSegments();
                               2207                 :           5705 : }
                               2208                 :                : 
                               2209                 :                : void
                               2210                 :           1109 : assign_checkpoint_completion_target(double newval, void *extra)
                               2211                 :                : {
                               2212                 :           1109 :     CheckPointCompletionTarget = newval;
                               2213                 :           1109 :     CalculateCheckpointSegments();
                               2214                 :           1109 : }
                               2215                 :                : 
                               2216                 :                : bool
  843 peter@eisentraut.org     2217                 :           2138 : check_wal_segment_size(int *newval, void **extra, GucSource source)
                               2218                 :                : {
                               2219   [ +  -  +  -  :           2138 :     if (!IsValidWalSegSize(*newval))
                                        +  -  -  + ]
                               2220                 :                :     {
  843 peter@eisentraut.org     2221                 :UBC           0 :         GUC_check_errdetail("The WAL segment size must be a power of two between 1 MB and 1 GB.");
                               2222                 :              0 :         return false;
                               2223                 :                :     }
                               2224                 :                : 
  843 peter@eisentraut.org     2225                 :CBC        2138 :     return true;
                               2226                 :                : }
                               2227                 :                : 
                               2228                 :                : /*
                               2229                 :                :  * At a checkpoint, how many WAL segments to recycle as preallocated future
                               2230                 :                :  * XLOG segments? Returns the highest segment that should be preallocated.
                               2231                 :                :  */
                               2232                 :                : static XLogSegNo
 2192 michael@paquier.xyz      2233                 :           1732 : XLOGfileslop(XLogRecPtr lastredoptr)
                               2234                 :                : {
                               2235                 :                :     XLogSegNo   minSegNo;
                               2236                 :                :     XLogSegNo   maxSegNo;
                               2237                 :                :     double      distance;
                               2238                 :                :     XLogSegNo   recycleSegNo;
                               2239                 :                : 
                               2240                 :                :     /*
                               2241                 :                :      * Calculate the segment numbers that min_wal_size_mb and max_wal_size_mb
                               2242                 :                :      * correspond to. Always recycle enough segments to meet the minimum, and
                               2243                 :                :      * remove enough segments to stay below the maximum.
                               2244                 :                :      */
                               2245                 :           1732 :     minSegNo = lastredoptr / wal_segment_size +
 3012 andres@anarazel.de       2246                 :           1732 :         ConvertToXSegs(min_wal_size_mb, wal_segment_size) - 1;
 2192 michael@paquier.xyz      2247                 :           1732 :     maxSegNo = lastredoptr / wal_segment_size +
 3012 andres@anarazel.de       2248                 :           1732 :         ConvertToXSegs(max_wal_size_mb, wal_segment_size) - 1;
                               2249                 :                : 
                               2250                 :                :     /*
                               2251                 :                :      * Between those limits, recycle enough segments to get us through to the
                               2252                 :                :      * estimated end of next checkpoint.
                               2253                 :                :      *
                               2254                 :                :      * To estimate where the next checkpoint will finish, assume that the
                               2255                 :                :      * system runs steadily consuming CheckPointDistanceEstimate bytes between
                               2256                 :                :      * every checkpoint.
                               2257                 :                :      */
 2963 simon@2ndQuadrant.co     2258                 :           1732 :     distance = (1.0 + CheckPointCompletionTarget) * CheckPointDistanceEstimate;
                               2259                 :                :     /* add 10% for good measure. */
 3951 heikki.linnakangas@i     2260                 :           1732 :     distance *= 1.10;
                               2261                 :                : 
 2192 michael@paquier.xyz      2262                 :           1732 :     recycleSegNo = (XLogSegNo) ceil(((double) lastredoptr + distance) /
                               2263                 :                :                                     wal_segment_size);
                               2264                 :                : 
 3951 heikki.linnakangas@i     2265         [ +  + ]:           1732 :     if (recycleSegNo < minSegNo)
                               2266                 :           1214 :         recycleSegNo = minSegNo;
                               2267         [ +  + ]:           1732 :     if (recycleSegNo > maxSegNo)
                               2268                 :            398 :         recycleSegNo = maxSegNo;
                               2269                 :                : 
                               2270                 :           1732 :     return recycleSegNo;
                               2271                 :                : }
                               2272                 :                : 
                               2273                 :                : /*
                               2274                 :                :  * Check whether we've consumed enough xlog space that a checkpoint is needed.
                               2275                 :                :  *
                               2276                 :                :  * new_segno indicates a log file that has just been filled up (or read
                               2277                 :                :  * during recovery). We measure the distance from RedoRecPtr to new_segno
                               2278                 :                :  * and see if that exceeds CheckPointSegments.
                               2279                 :                :  *
                               2280                 :                :  * Note: it is caller's responsibility that RedoRecPtr is up-to-date.
                               2281                 :                :  */
                               2282                 :                : bool
 4925                          2283                 :           4687 : XLogCheckpointNeeded(XLogSegNo new_segno)
                               2284                 :                : {
                               2285                 :                :     XLogSegNo   old_segno;
                               2286                 :                : 
 3012 andres@anarazel.de       2287                 :           4687 :     XLByteToSeg(RedoRecPtr, old_segno, wal_segment_size);
                               2288                 :                : 
 4925 heikki.linnakangas@i     2289         [ +  + ]:           4687 :     if (new_segno >= old_segno + (uint64) (CheckPointSegments - 1))
 6642 tgl@sss.pgh.pa.us        2290                 :           2945 :         return true;
                               2291                 :           1742 :     return false;
                               2292                 :                : }
                               2293                 :                : 
                               2294                 :                : /*
                               2295                 :                :  * Write and/or fsync the log at least as far as WriteRqst indicates.
                               2296                 :                :  *
                               2297                 :                :  * If flexible == true, we don't have to write as far as WriteRqst, but
                               2298                 :                :  * may stop at any convenient boundary (such as a cache or logfile boundary).
                               2299                 :                :  * This option allows us to avoid uselessly issuing multiple writes when a
                               2300                 :                :  * single one would do.
                               2301                 :                :  *
                               2302                 :                :  * Must be called with WALWriteLock held. WaitXLogInsertionsToFinish(WriteRqst)
                               2303                 :                :  * must be called before grabbing the lock, to make sure the data is ready to
                               2304                 :                :  * write.
                               2305                 :                :  */
                               2306                 :                : static void
 1504 rhaas@postgresql.org     2307                 :        2032723 : XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
                               2308                 :                : {
                               2309                 :                :     bool        ispartialpage;
                               2310                 :                :     bool        last_iteration;
                               2311                 :                :     bool        finishing_seg;
                               2312                 :                :     int         curridx;
                               2313                 :                :     int         npages;
                               2314                 :                :     int         startidx;
                               2315                 :                :     uint32      startoffset;
                               2316                 :                : 
                               2317                 :                :     /* We should always be inside a critical section here */
 7552 tgl@sss.pgh.pa.us        2318         [ -  + ]:        2032723 :     Assert(CritSectionCount > 0);
                               2319                 :                : 
                               2320                 :                :     /*
                               2321                 :                :      * Update local LogwrtResult (caller probably did this already, but...)
                               2322                 :                :      */
  624 alvherre@alvh.no-ip.     2323                 :        2032723 :     RefreshXLogWriteResult(LogwrtResult);
                               2324                 :                : 
                               2325                 :                :     /*
                               2326                 :                :      * Since successive pages in the xlog cache are consecutively allocated,
                               2327                 :                :      * we can usually gather multiple pages together and issue just one
                               2328                 :                :      * write() call.  npages is the number of pages we have determined can be
                               2329                 :                :      * written together; startidx is the cache block index of the first one,
                               2330                 :                :      * and startoffset is the file offset at which it should go. The latter
                               2331                 :                :      * two variables are only valid when npages > 0, but we must initialize
                               2332                 :                :      * all of them to keep the compiler quiet.
                               2333                 :                :      */
 7423 tgl@sss.pgh.pa.us        2334                 :        2032723 :     npages = 0;
                               2335                 :        2032723 :     startidx = 0;
                               2336                 :        2032723 :     startoffset = 0;
                               2337                 :                : 
                               2338                 :                :     /*
                               2339                 :                :      * Within the loop, curridx is the cache block index of the page to
                               2340                 :                :      * consider writing.  Begin at the buffer containing the next unwritten
                               2341                 :                :      * page, or last partially written page.
                               2342                 :                :      */
 4537 heikki.linnakangas@i     2343                 :        2032723 :     curridx = XLogRecPtrToBufIdx(LogwrtResult.Write);
                               2344                 :                : 
 4738 alvherre@alvh.no-ip.     2345         [ +  + ]:        4384654 :     while (LogwrtResult.Write < WriteRqst.Write)
                               2346                 :                :     {
                               2347                 :                :         /*
                               2348                 :                :          * Make sure we're not ahead of the insert process.  This could happen
                               2349                 :                :          * if we're passed a bogus WriteRqst.Write that is past the end of the
                               2350                 :                :          * last page that's been initialized by AdvanceXLInsertBuffer.
                               2351                 :                :          */
  730 jdavis@postgresql.or     2352                 :        2479551 :         XLogRecPtr  EndPtr = pg_atomic_read_u64(&XLogCtl->xlblocks[curridx]);
                               2353                 :                : 
 4546 heikki.linnakangas@i     2354         [ -  + ]:        2479551 :         if (LogwrtResult.Write >= EndPtr)
  164 alvherre@kurilemu.de     2355         [ #  # ]:UNC           0 :             elog(PANIC, "xlog write request %X/%08X is past end of log %X/%08X",
                               2356                 :                :                  LSN_FORMAT_ARGS(LogwrtResult.Write),
                               2357                 :                :                  LSN_FORMAT_ARGS(EndPtr));
                               2358                 :                : 
                               2359                 :                :         /* Advance LogwrtResult.Write to end of current buffer page */
 4546 heikki.linnakangas@i     2360                 :CBC     2479551 :         LogwrtResult.Write = EndPtr;
 4738 alvherre@alvh.no-ip.     2361                 :        2479551 :         ispartialpage = WriteRqst.Write < LogwrtResult.Write;
                               2362                 :                : 
 3012 andres@anarazel.de       2363         [ +  + ]:        2479551 :         if (!XLByteInPrevSeg(LogwrtResult.Write, openLogSegNo,
                               2364                 :                :                              wal_segment_size))
                               2365                 :                :         {
                               2366                 :                :             /*
                               2367                 :                :              * Switch to new logfile segment.  We cannot have any pending
                               2368                 :                :              * pages here (since we dump what we have at segment end).
                               2369                 :                :              */
 7423 tgl@sss.pgh.pa.us        2370         [ -  + ]:          13252 :             Assert(npages == 0);
 9046                          2371         [ +  + ]:          13252 :             if (openLogFile >= 0)
 7126 bruce@momjian.us         2372                 :           6359 :                 XLogFileClose();
 3012 andres@anarazel.de       2373                 :          13252 :             XLByteToPrevSeg(LogwrtResult.Write, openLogSegNo,
                               2374                 :                :                             wal_segment_size);
 1504 rhaas@postgresql.org     2375                 :          13252 :             openLogTLI = tli;
                               2376                 :                : 
                               2377                 :                :             /* create/use new log file */
                               2378                 :          13252 :             openLogFile = XLogFileInit(openLogSegNo, tli);
 2124 tgl@sss.pgh.pa.us        2379                 :          13252 :             ReserveExternalFD();
                               2380                 :                :         }
                               2381                 :                : 
                               2382                 :                :         /* Make sure we have the current logfile open */
 9046                          2383         [ -  + ]:        2479551 :         if (openLogFile < 0)
                               2384                 :                :         {
 3012 andres@anarazel.de       2385                 :UBC           0 :             XLByteToPrevSeg(LogwrtResult.Write, openLogSegNo,
                               2386                 :                :                             wal_segment_size);
 1504 rhaas@postgresql.org     2387                 :              0 :             openLogTLI = tli;
                               2388                 :              0 :             openLogFile = XLogFileOpen(openLogSegNo, tli);
 2124 tgl@sss.pgh.pa.us        2389                 :              0 :             ReserveExternalFD();
                               2390                 :                :         }
                               2391                 :                : 
                               2392                 :                :         /* Add current page to the set of pending pages-to-dump */
 7423 tgl@sss.pgh.pa.us        2393         [ +  + ]:CBC     2479551 :         if (npages == 0)
                               2394                 :                :         {
                               2395                 :                :             /* first of group */
                               2396                 :        2044859 :             startidx = curridx;
 3012 andres@anarazel.de       2397                 :        2044859 :             startoffset = XLogSegmentOffset(LogwrtResult.Write - XLOG_BLCKSZ,
                               2398                 :                :                                             wal_segment_size);
                               2399                 :                :         }
 7423 tgl@sss.pgh.pa.us        2400                 :        2479551 :         npages++;
                               2401                 :                : 
                               2402                 :                :         /*
                               2403                 :                :          * Dump the set if this will be the last loop iteration, or if we are
                               2404                 :                :          * at the last page of the cache area (since the next page won't be
                               2405                 :                :          * contiguous in memory), or if we are at the end of the logfile
                               2406                 :                :          * segment.
                               2407                 :                :          */
 4738 alvherre@alvh.no-ip.     2408                 :        2479551 :         last_iteration = WriteRqst.Write <= LogwrtResult.Write;
                               2409                 :                : 
 7423 tgl@sss.pgh.pa.us        2410         [ +  + ]:        4833914 :         finishing_seg = !ispartialpage &&
 3012 andres@anarazel.de       2411         [ +  + ]:        2354363 :             (startoffset + npages * XLOG_BLCKSZ) >= wal_segment_size;
                               2412                 :                : 
 7074 tgl@sss.pgh.pa.us        2413         [ +  + ]:        2479551 :         if (last_iteration ||
 7423                          2414   [ +  +  -  + ]:         449454 :             curridx == XLogCtl->XLogCacheBlck ||
                               2415                 :                :             finishing_seg)
                               2416                 :                :         {
                               2417                 :                :             char       *from;
                               2418                 :                :             Size        nbytes;
                               2419                 :                :             Size        nleft;
                               2420                 :                :             ssize_t     written;
                               2421                 :                :             instr_time  start;
                               2422                 :                : 
                               2423                 :                :             /* OK to write the page(s) */
 7199                          2424                 :        2044859 :             from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ;
                               2425                 :        2044859 :             nbytes = npages * (Size) XLOG_BLCKSZ;
 4553 heikki.linnakangas@i     2426                 :        2044859 :             nleft = nbytes;
                               2427                 :                :             do
                               2428                 :                :             {
                               2429                 :        2044859 :                 errno = 0;
                               2430                 :                : 
                               2431                 :                :                 /*
                               2432                 :                :                  * Measure I/O timing to write WAL data, for pg_stat_io.
                               2433                 :                :                  */
  295 michael@paquier.xyz      2434                 :        2044859 :                 start = pgstat_prepare_io_time(track_wal_io_timing);
                               2435                 :                : 
 3197 rhaas@postgresql.org     2436                 :        2044859 :                 pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE);
 1176 tmunro@postgresql.or     2437                 :        2044859 :                 written = pg_pwrite(openLogFile, from, nleft, startoffset);
 3197 rhaas@postgresql.org     2438                 :        2044859 :                 pgstat_report_wait_end();
                               2439                 :                : 
  317 michael@paquier.xyz      2440                 :        2044859 :                 pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL,
                               2441                 :                :                                         IOOP_WRITE, start, 1, written);
                               2442                 :                : 
 4553 heikki.linnakangas@i     2443         [ -  + ]:        2044859 :                 if (written <= 0)
                               2444                 :                :                 {
                               2445                 :                :                     char        xlogfname[MAXFNAMELEN];
                               2446                 :                :                     int         save_errno;
                               2447                 :                : 
 4553 heikki.linnakangas@i     2448         [ #  # ]:UBC           0 :                     if (errno == EINTR)
                               2449                 :              0 :                         continue;
                               2450                 :                : 
 2207 michael@paquier.xyz      2451                 :              0 :                     save_errno = errno;
 1504 rhaas@postgresql.org     2452                 :              0 :                     XLogFileName(xlogfname, tli, openLogSegNo,
                               2453                 :                :                                  wal_segment_size);
 2207 michael@paquier.xyz      2454                 :              0 :                     errno = save_errno;
 4553 heikki.linnakangas@i     2455         [ #  # ]:              0 :                     ereport(PANIC,
                               2456                 :                :                             (errcode_for_file_access(),
                               2457                 :                :                              errmsg("could not write to log file \"%s\" at offset %u, length %zu: %m",
                               2458                 :                :                                     xlogfname, startoffset, nleft)));
                               2459                 :                :                 }
 4553 heikki.linnakangas@i     2460                 :CBC     2044859 :                 nleft -= written;
                               2461                 :        2044859 :                 from += written;
 2598 tmunro@postgresql.or     2462                 :        2044859 :                 startoffset += written;
 4553 heikki.linnakangas@i     2463         [ -  + ]:        2044859 :             } while (nleft > 0);
                               2464                 :                : 
 7423 tgl@sss.pgh.pa.us        2465                 :        2044859 :             npages = 0;
                               2466                 :                : 
                               2467                 :                :             /*
                               2468                 :                :              * If we just wrote the whole last page of a logfile segment,
                               2469                 :                :              * fsync the segment immediately.  This avoids having to go back
                               2470                 :                :              * and re-open prior segments when an fsync request comes along
                               2471                 :                :              * later. Doing it here ensures that one and only one backend will
                               2472                 :                :              * perform this fsync.
                               2473                 :                :              *
                               2474                 :                :              * This is also the right place to notify the Archiver that the
                               2475                 :                :              * segment is ready to copy to archival storage, and to update the
                               2476                 :                :              * timer for archive_timeout, and to signal for a checkpoint if
                               2477                 :                :              * too many logfile segments have been used since the last
                               2478                 :                :              * checkpoint.
                               2479                 :                :              */
 4546 heikki.linnakangas@i     2480         [ +  + ]:        2044859 :             if (finishing_seg)
                               2481                 :                :             {
 1504 rhaas@postgresql.org     2482                 :           1871 :                 issue_xlog_fsync(openLogFile, openLogSegNo, tli);
                               2483                 :                : 
                               2484                 :                :                 /* signal that we need to wakeup walsenders later */
 4917                          2485                 :           1871 :                 WalSndWakeupRequest();
                               2486                 :                : 
 3102 tgl@sss.pgh.pa.us        2487                 :           1871 :                 LogwrtResult.Flush = LogwrtResult.Write;    /* end of page */
                               2488                 :                : 
 7423                          2489   [ +  +  -  +  :           1871 :                 if (XLogArchivingActive())
                                              +  + ]
 1504 rhaas@postgresql.org     2490                 :            401 :                     XLogArchiveNotifySeg(openLogSegNo, tli);
                               2491                 :                : 
 4537 heikki.linnakangas@i     2492                 :           1871 :                 XLogCtl->lastSegSwitchTime = (pg_time_t) time(NULL);
 3283 andres@anarazel.de       2493                 :           1871 :                 XLogCtl->lastSegSwitchLSN = LogwrtResult.Flush;
                               2494                 :                : 
                               2495                 :                :                 /*
                               2496                 :                :                  * Request a checkpoint if we've consumed too much xlog since
                               2497                 :                :                  * the last one.  For speed, we first check using the local
                               2498                 :                :                  * copy of RedoRecPtr, which might be out of date; if it looks
                               2499                 :                :                  * like a checkpoint is needed, forcibly update RedoRecPtr and
                               2500                 :                :                  * recheck.
                               2501                 :                :                  */
 4925 heikki.linnakangas@i     2502   [ +  +  +  + ]:           1871 :                 if (IsUnderPostmaster && XLogCheckpointNeeded(openLogSegNo))
                               2503                 :                :                 {
 6642 tgl@sss.pgh.pa.us        2504                 :            247 :                     (void) GetRedoRecPtr();
 4925 heikki.linnakangas@i     2505         [ +  + ]:            247 :                     if (XLogCheckpointNeeded(openLogSegNo))
 6746 tgl@sss.pgh.pa.us        2506                 :            197 :                         RequestCheckpoint(CHECKPOINT_CAUSE_XLOG);
                               2507                 :                :                 }
                               2508                 :                :             }
                               2509                 :                :         }
                               2510                 :                : 
 9046                          2511         [ +  + ]:        2479551 :         if (ispartialpage)
                               2512                 :                :         {
                               2513                 :                :             /* Only asked to write a partial page */
                               2514                 :         125188 :             LogwrtResult.Write = WriteRqst.Write;
                               2515                 :         125188 :             break;
                               2516                 :                :         }
 7423                          2517         [ +  + ]:        2354363 :         curridx = NextBufIdx(curridx);
                               2518                 :                : 
                               2519                 :                :         /* If flexible, break out of loop as soon as we wrote something */
                               2520   [ +  +  +  + ]:        2354363 :         if (flexible && npages == 0)
                               2521                 :           2432 :             break;
                               2522                 :                :     }
                               2523                 :                : 
                               2524         [ -  + ]:        2032723 :     Assert(npages == 0);
                               2525                 :                : 
                               2526                 :                :     /*
                               2527                 :                :      * If asked to flush, do so
                               2528                 :                :      */
 4738 alvherre@alvh.no-ip.     2529         [ +  + ]:        2032723 :     if (LogwrtResult.Flush < WriteRqst.Flush &&
                               2530         [ +  + ]:         132610 :         LogwrtResult.Flush < LogwrtResult.Write)
                               2531                 :                :     {
                               2532                 :                :         /*
                               2533                 :                :          * Could get here without iterating above loop, in which case we might
                               2534                 :                :          * have no open file or the wrong one.  However, we do not need to
                               2535                 :                :          * fsync more than one file.
                               2536                 :                :          */
  797 nathan@postgresql.or     2537         [ +  - ]:         132539 :         if (wal_sync_method != WAL_SYNC_METHOD_OPEN &&
                               2538         [ +  - ]:         132539 :             wal_sync_method != WAL_SYNC_METHOD_OPEN_DSYNC)
                               2539                 :                :         {
 9043 tgl@sss.pgh.pa.us        2540         [ +  + ]:         132539 :             if (openLogFile >= 0 &&
 3012 andres@anarazel.de       2541         [ +  + ]:         132523 :                 !XLByteInPrevSeg(LogwrtResult.Write, openLogSegNo,
                               2542                 :                :                                  wal_segment_size))
 7126 bruce@momjian.us         2543                 :            158 :                 XLogFileClose();
 9043 tgl@sss.pgh.pa.us        2544         [ +  + ]:         132539 :             if (openLogFile < 0)
                               2545                 :                :             {
 3012 andres@anarazel.de       2546                 :            174 :                 XLByteToPrevSeg(LogwrtResult.Write, openLogSegNo,
                               2547                 :                :                                 wal_segment_size);
 1504 rhaas@postgresql.org     2548                 :            174 :                 openLogTLI = tli;
                               2549                 :            174 :                 openLogFile = XLogFileOpen(openLogSegNo, tli);
 2124 tgl@sss.pgh.pa.us        2550                 :            174 :                 ReserveExternalFD();
                               2551                 :                :             }
                               2552                 :                : 
 1504 rhaas@postgresql.org     2553                 :         132539 :             issue_xlog_fsync(openLogFile, openLogSegNo, tli);
                               2554                 :                :         }
                               2555                 :                : 
                               2556                 :                :         /* signal that we need to wakeup walsenders later */
 4917                          2557                 :         132539 :         WalSndWakeupRequest();
                               2558                 :                : 
 9046 tgl@sss.pgh.pa.us        2559                 :         132539 :         LogwrtResult.Flush = LogwrtResult.Write;
                               2560                 :                :     }
                               2561                 :                : 
                               2562                 :                :     /*
                               2563                 :                :      * Update shared-memory status
                               2564                 :                :      *
                               2565                 :                :      * We make sure that the shared 'request' values do not fall behind the
                               2566                 :                :      * 'result' values.  This is not absolutely essential, but it saves some
                               2567                 :                :      * code in a couple of places.
                               2568                 :                :      */
  622 alvherre@alvh.no-ip.     2569         [ +  + ]:        2032723 :     SpinLockAcquire(&XLogCtl->info_lck);
                               2570         [ +  + ]:        2032723 :     if (XLogCtl->LogwrtRqst.Write < LogwrtResult.Write)
                               2571                 :         117345 :         XLogCtl->LogwrtRqst.Write = LogwrtResult.Write;
                               2572         [ +  + ]:        2032723 :     if (XLogCtl->LogwrtRqst.Flush < LogwrtResult.Flush)
                               2573                 :         134000 :         XLogCtl->LogwrtRqst.Flush = LogwrtResult.Flush;
                               2574                 :        2032723 :     SpinLockRelease(&XLogCtl->info_lck);
                               2575                 :                : 
                               2576                 :                :     /*
                               2577                 :                :      * We write Write first, bar, then Flush.  When reading, the opposite must
                               2578                 :                :      * be done (with a matching barrier in between), so that we always see a
                               2579                 :                :      * Flush value that trails behind the Write value seen.
                               2580                 :                :      */
                               2581                 :        2032723 :     pg_atomic_write_u64(&XLogCtl->logWriteResult, LogwrtResult.Write);
                               2582                 :        2032723 :     pg_write_barrier();
                               2583                 :        2032723 :     pg_atomic_write_u64(&XLogCtl->logFlushResult, LogwrtResult.Flush);
                               2584                 :                : 
                               2585                 :                : #ifdef USE_ASSERT_CHECKING
                               2586                 :                :     {
                               2587                 :                :         XLogRecPtr  Flush;
                               2588                 :                :         XLogRecPtr  Write;
                               2589                 :                :         XLogRecPtr  Insert;
                               2590                 :                : 
                               2591                 :        2032723 :         Flush = pg_atomic_read_u64(&XLogCtl->logFlushResult);
                               2592                 :        2032723 :         pg_read_barrier();
                               2593                 :        2032723 :         Write = pg_atomic_read_u64(&XLogCtl->logWriteResult);
  620                          2594                 :        2032723 :         pg_read_barrier();
                               2595                 :        2032723 :         Insert = pg_atomic_read_u64(&XLogCtl->logInsertResult);
                               2596                 :                : 
                               2597                 :                :         /* WAL written to disk is always ahead of WAL flushed */
  622                          2598         [ -  + ]:        2032723 :         Assert(Write >= Flush);
                               2599                 :                : 
                               2600                 :                :         /* WAL inserted to buffers is always ahead of WAL written */
  620                          2601         [ -  + ]:        2032723 :         Assert(Insert >= Write);
                               2602                 :                :     }
                               2603                 :                : #endif
 9046 tgl@sss.pgh.pa.us        2604                 :        2032723 : }
                               2605                 :                : 
                               2606                 :                : /*
                               2607                 :                :  * Record the LSN for an asynchronous transaction commit/abort
                               2608                 :                :  * and nudge the WALWriter if there is work for it to do.
                               2609                 :                :  * (This should not be called for synchronous commits.)
                               2610                 :                :  */
                               2611                 :                : void
 5621 simon@2ndQuadrant.co     2612                 :          30585 : XLogSetAsyncXactLSN(XLogRecPtr asyncXactLSN)
                               2613                 :                : {
 5149                          2614                 :          30585 :     XLogRecPtr  WriteRqstPtr = asyncXactLSN;
                               2615                 :                :     bool        sleeping;
  752 heikki.linnakangas@i     2616                 :          30585 :     bool        wakeup = false;
                               2617                 :                :     XLogRecPtr  prevAsyncXactLSN;
                               2618                 :                : 
 4105 andres@anarazel.de       2619         [ +  + ]:          30585 :     SpinLockAcquire(&XLogCtl->info_lck);
                               2620                 :          30585 :     sleeping = XLogCtl->WalWriterSleeping;
  752 heikki.linnakangas@i     2621                 :          30585 :     prevAsyncXactLSN = XLogCtl->asyncXactLSN;
 4105 andres@anarazel.de       2622         [ +  + ]:          30585 :     if (XLogCtl->asyncXactLSN < asyncXactLSN)
                               2623                 :          30111 :         XLogCtl->asyncXactLSN = asyncXactLSN;
                               2624                 :          30585 :     SpinLockRelease(&XLogCtl->info_lck);
                               2625                 :                : 
                               2626                 :                :     /*
                               2627                 :                :      * If somebody else already called this function with a more aggressive
                               2628                 :                :      * LSN, they will have done what we needed (and perhaps more).
                               2629                 :                :      */
  752 heikki.linnakangas@i     2630         [ +  + ]:          30585 :     if (asyncXactLSN <= prevAsyncXactLSN)
                               2631                 :            474 :         return;
                               2632                 :                : 
                               2633                 :                :     /*
                               2634                 :                :      * If the WALWriter is sleeping, kick it to make it come out of low-power
                               2635                 :                :      * mode, so that this async commit will reach disk within the expected
                               2636                 :                :      * amount of time.  Otherwise, determine whether it has enough WAL
                               2637                 :                :      * available to flush, the same way that XLogBackgroundFlush() does.
                               2638                 :                :      */
                               2639         [ +  + ]:          30111 :     if (sleeping)
                               2640                 :             14 :         wakeup = true;
                               2641                 :                :     else
                               2642                 :                :     {
                               2643                 :                :         int         flushblocks;
                               2644                 :                : 
  622 alvherre@alvh.no-ip.     2645                 :          30097 :         RefreshXLogWriteResult(LogwrtResult);
                               2646                 :                : 
  752 heikki.linnakangas@i     2647                 :          30097 :         flushblocks =
                               2648                 :          30097 :             WriteRqstPtr / XLOG_BLCKSZ - LogwrtResult.Flush / XLOG_BLCKSZ;
                               2649                 :                : 
                               2650   [ +  -  +  + ]:          30097 :         if (WalWriterFlushAfter == 0 || flushblocks >= WalWriterFlushAfter)
                               2651                 :           3776 :             wakeup = true;
                               2652                 :                :     }
                               2653                 :                : 
  412                          2654         [ +  + ]:          30111 :     if (wakeup)
                               2655                 :                :     {
                               2656                 :           3790 :         volatile PROC_HDR *procglobal = ProcGlobal;
                               2657                 :           3790 :         ProcNumber  walwriterProc = procglobal->walwriterProc;
                               2658                 :                : 
                               2659         [ +  + ]:           3790 :         if (walwriterProc != INVALID_PROC_NUMBER)
                               2660                 :            178 :             SetLatch(&GetPGProcByNumber(walwriterProc)->procLatch);
                               2661                 :                :     }
                               2662                 :                : }
                               2663                 :                : 
                               2664                 :                : /*
                               2665                 :                :  * Record the LSN up to which we can remove WAL because it's not required by
                               2666                 :                :  * any replication slot.
                               2667                 :                :  */
                               2668                 :                : void
 4339 rhaas@postgresql.org     2669                 :          39567 : XLogSetReplicationSlotMinimumLSN(XLogRecPtr lsn)
                               2670                 :                : {
 4105 andres@anarazel.de       2671         [ +  + ]:          39567 :     SpinLockAcquire(&XLogCtl->info_lck);
                               2672                 :          39567 :     XLogCtl->replicationSlotMinLSN = lsn;
                               2673                 :          39567 :     SpinLockRelease(&XLogCtl->info_lck);
 4339 rhaas@postgresql.org     2674                 :          39567 : }
                               2675                 :                : 
                               2676                 :                : 
                               2677                 :                : /*
                               2678                 :                :  * Return the oldest LSN we must retain to satisfy the needs of some
                               2679                 :                :  * replication slot.
                               2680                 :                :  */
                               2681                 :                : static XLogRecPtr
                               2682                 :           2104 : XLogGetReplicationSlotMinimumLSN(void)
                               2683                 :                : {
                               2684                 :                :     XLogRecPtr  retval;
                               2685                 :                : 
 4105 andres@anarazel.de       2686         [ -  + ]:           2104 :     SpinLockAcquire(&XLogCtl->info_lck);
                               2687                 :           2104 :     retval = XLogCtl->replicationSlotMinLSN;
                               2688                 :           2104 :     SpinLockRelease(&XLogCtl->info_lck);
                               2689                 :                : 
 4339 rhaas@postgresql.org     2690                 :           2104 :     return retval;
                               2691                 :                : }
                               2692                 :                : 
                               2693                 :                : /*
                               2694                 :                :  * Advance minRecoveryPoint in control file.
                               2695                 :                :  *
                               2696                 :                :  * If we crash during recovery, we must reach this point again before the
                               2697                 :                :  * database is consistent.
                               2698                 :                :  *
                               2699                 :                :  * If 'force' is true, 'lsn' argument is ignored. Otherwise, minRecoveryPoint
                               2700                 :                :  * is only updated if it's not already greater than or equal to 'lsn'.
                               2701                 :                :  */
                               2702                 :                : static void
 6147 heikki.linnakangas@i     2703                 :         107477 : UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force)
                               2704                 :                : {
                               2705                 :                :     /* Quick check using our local copy of the variable */
 1401                          2706   [ +  +  +  +  :         107477 :     if (!updateMinRecoveryPoint || (!force && lsn <= LocalMinRecoveryPoint))
                                              +  + ]
 6147                          2707                 :         100383 :         return;
                               2708                 :                : 
                               2709                 :                :     /*
                               2710                 :                :      * An invalid minRecoveryPoint means that we need to recover all the WAL,
                               2711                 :                :      * i.e., we're doing crash recovery.  We never modify the control file's
                               2712                 :                :      * value in that case, so we can short-circuit future checks here too. The
                               2713                 :                :      * local values of minRecoveryPoint and minRecoveryPointTLI should not be
                               2714                 :                :      * updated until crash recovery finishes.  We only do this for the startup
                               2715                 :                :      * process as it should not update its own reference of minRecoveryPoint
                               2716                 :                :      * until it has finished crash recovery to make sure that all WAL
                               2717                 :                :      * available is replayed in this case.  This also saves from extra locks
                               2718                 :                :      * taken on the control file from the startup process.
                               2719                 :                :      */
   42 alvherre@kurilemu.de     2720   [ +  +  +  + ]:GNC        7094 :     if (!XLogRecPtrIsValid(LocalMinRecoveryPoint) && InRecovery)
                               2721                 :                :     {
 2723 michael@paquier.xyz      2722                 :CBC          31 :         updateMinRecoveryPoint = false;
                               2723                 :             31 :         return;
                               2724                 :                :     }
                               2725                 :                : 
 6147 heikki.linnakangas@i     2726                 :           7063 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               2727                 :                : 
                               2728                 :                :     /* update local copy */
 1401                          2729                 :           7063 :     LocalMinRecoveryPoint = ControlFile->minRecoveryPoint;
                               2730                 :           7063 :     LocalMinRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
                               2731                 :                : 
   42 alvherre@kurilemu.de     2732         [ +  + ]:GNC        7063 :     if (!XLogRecPtrIsValid(LocalMinRecoveryPoint))
 2666 michael@paquier.xyz      2733                 :CBC           3 :         updateMinRecoveryPoint = false;
 1401 heikki.linnakangas@i     2734   [ +  +  +  + ]:           7060 :     else if (force || LocalMinRecoveryPoint < lsn)
                               2735                 :                :     {
                               2736                 :                :         XLogRecPtr  newMinRecoveryPoint;
                               2737                 :                :         TimeLineID  newMinRecoveryPointTLI;
                               2738                 :                : 
                               2739                 :                :         /*
                               2740                 :                :          * To avoid having to update the control file too often, we update it
                               2741                 :                :          * all the way to the last record being replayed, even though 'lsn'
                               2742                 :                :          * would suffice for correctness.  This also allows the 'force' case
                               2743                 :                :          * to not need a valid 'lsn' value.
                               2744                 :                :          *
                               2745                 :                :          * Another important reason for doing it this way is that the passed
                               2746                 :                :          * 'lsn' value could be bogus, i.e., past the end of available WAL, if
                               2747                 :                :          * the caller got it from a corrupted heap page.  Accepting such a
                               2748                 :                :          * value as the min recovery point would prevent us from coming up at
                               2749                 :                :          * all.  Instead, we just log a warning and continue with recovery.
                               2750                 :                :          * (See also the comments about corrupt LSNs in XLogFlush.)
                               2751                 :                :          */
                               2752                 :           5632 :         newMinRecoveryPoint = GetCurrentReplayRecPtr(&newMinRecoveryPointTLI);
 4738 alvherre@alvh.no-ip.     2753   [ +  +  -  + ]:           5632 :         if (!force && newMinRecoveryPoint < lsn)
 6019 tgl@sss.pgh.pa.us        2754         [ #  # ]:UBC           0 :             elog(WARNING,
                               2755                 :                :                  "xlog min recovery request %X/%08X is past current point %X/%08X",
                               2756                 :                :                  LSN_FORMAT_ARGS(lsn), LSN_FORMAT_ARGS(newMinRecoveryPoint));
                               2757                 :                : 
                               2758                 :                :         /* update control file */
 4738 alvherre@alvh.no-ip.     2759         [ +  + ]:CBC        5632 :         if (ControlFile->minRecoveryPoint < newMinRecoveryPoint)
                               2760                 :                :         {
 6147 heikki.linnakangas@i     2761                 :           5282 :             ControlFile->minRecoveryPoint = newMinRecoveryPoint;
 4762                          2762                 :           5282 :             ControlFile->minRecoveryPointTLI = newMinRecoveryPointTLI;
 6147                          2763                 :           5282 :             UpdateControlFile();
 1401                          2764                 :           5282 :             LocalMinRecoveryPoint = newMinRecoveryPoint;
                               2765                 :           5282 :             LocalMinRecoveryPointTLI = newMinRecoveryPointTLI;
                               2766                 :                : 
 6147                          2767         [ +  + ]:           5282 :             ereport(DEBUG2,
                               2768                 :                :                     errmsg_internal("updated min recovery point to %X/%08X on timeline %u",
                               2769                 :                :                                     LSN_FORMAT_ARGS(newMinRecoveryPoint),
                               2770                 :                :                                     newMinRecoveryPointTLI));
                               2771                 :                :         }
                               2772                 :                :     }
                               2773                 :           7063 :     LWLockRelease(ControlFileLock);
                               2774                 :                : }
                               2775                 :                : 
                               2776                 :                : /*
                               2777                 :                :  * Ensure that all XLOG data through the given position is flushed to disk.
                               2778                 :                :  *
                               2779                 :                :  * NOTE: this differs from XLogWrite mainly in that the WALWriteLock is not
                               2780                 :                :  * already held, and we try to avoid acquiring it if possible.
                               2781                 :                :  */
                               2782                 :                : void
 9046 tgl@sss.pgh.pa.us        2783                 :         705372 : XLogFlush(XLogRecPtr record)
                               2784                 :                : {
                               2785                 :                :     XLogRecPtr  WriteRqstPtr;
                               2786                 :                :     XLogwrtRqst WriteRqst;
 1499 rhaas@postgresql.org     2787                 :         705372 :     TimeLineID  insertTLI = XLogCtl->InsertTimeLineID;
                               2788                 :                : 
                               2789                 :                :     /*
                               2790                 :                :      * During REDO, we are reading not writing WAL.  Therefore, instead of
                               2791                 :                :      * trying to flush the WAL, we should update minRecoveryPoint instead. We
                               2792                 :                :      * test XLogInsertAllowed(), not InRecovery, because we need checkpointer
                               2793                 :                :      * to act this way too, and because when it tries to write the
                               2794                 :                :      * end-of-recovery checkpoint, it should indeed flush.
                               2795                 :                :      */
 6019 tgl@sss.pgh.pa.us        2796         [ +  + ]:         705372 :     if (!XLogInsertAllowed())
                               2797                 :                :     {
 6147 heikki.linnakangas@i     2798                 :         107027 :         UpdateMinRecoveryPoint(record, false);
 9046 tgl@sss.pgh.pa.us        2799                 :         560727 :         return;
                               2800                 :                :     }
                               2801                 :                : 
                               2802                 :                :     /* Quick exit if already known flushed */
 4738 alvherre@alvh.no-ip.     2803         [ +  + ]:         598345 :     if (record <= LogwrtResult.Flush)
 9046 tgl@sss.pgh.pa.us        2804                 :         453700 :         return;
                               2805                 :                : 
                               2806                 :                : #ifdef WAL_DEBUG
                               2807                 :                :     if (XLOG_DEBUG)
                               2808                 :                :         elog(LOG, "xlog flush request %X/%08X; write %X/%08X; flush %X/%08X",
                               2809                 :                :              LSN_FORMAT_ARGS(record),
                               2810                 :                :              LSN_FORMAT_ARGS(LogwrtResult.Write),
                               2811                 :                :              LSN_FORMAT_ARGS(LogwrtResult.Flush));
                               2812                 :                : #endif
                               2813                 :                : 
                               2814                 :         144645 :     START_CRIT_SECTION();
                               2815                 :                : 
                               2816                 :                :     /*
                               2817                 :                :      * Since fsync is usually a horribly expensive operation, we try to
                               2818                 :                :      * piggyback as much data as we can on each fsync: if we see any more data
                               2819                 :                :      * entered into the xlog buffer, we'll write and fsync that too, so that
                               2820                 :                :      * the final value of LogwrtResult.Flush is as large as possible. This
                               2821                 :                :      * gives us some chance of avoiding another fsync immediately after.
                               2822                 :                :      */
                               2823                 :                : 
                               2824                 :                :     /* initialize to given target; may increase below */
                               2825                 :         144645 :     WriteRqstPtr = record;
                               2826                 :                : 
                               2827                 :                :     /*
                               2828                 :                :      * Now wait until we get the write lock, or someone else does the flush
                               2829                 :                :      * for us.
                               2830                 :                :      */
                               2831                 :                :     for (;;)
 8756                          2832                 :           2413 :     {
                               2833                 :                :         XLogRecPtr  insertpos;
                               2834                 :                : 
                               2835                 :                :         /* done already? */
  622 alvherre@alvh.no-ip.     2836                 :         147058 :         RefreshXLogWriteResult(LogwrtResult);
 4738                          2837         [ +  + ]:         147058 :         if (record <= LogwrtResult.Flush)
 5071 heikki.linnakangas@i     2838                 :          10867 :             break;
                               2839                 :                : 
                               2840                 :                :         /*
                               2841                 :                :          * Before actually performing the write, wait for all in-flight
                               2842                 :                :          * insertions to the pages we're about to write to finish.
                               2843                 :                :          */
  622 alvherre@alvh.no-ip.     2844         [ +  + ]:         136191 :         SpinLockAcquire(&XLogCtl->info_lck);
                               2845         [ +  + ]:         136191 :         if (WriteRqstPtr < XLogCtl->LogwrtRqst.Write)
                               2846                 :           9601 :             WriteRqstPtr = XLogCtl->LogwrtRqst.Write;
                               2847                 :         136191 :         SpinLockRelease(&XLogCtl->info_lck);
 4546 heikki.linnakangas@i     2848                 :         136191 :         insertpos = WaitXLogInsertionsToFinish(WriteRqstPtr);
                               2849                 :                : 
                               2850                 :                :         /*
                               2851                 :                :          * Try to get the write lock. If we can't get it immediately, wait
                               2852                 :                :          * until it's released, and recheck if we still need to do the flush
                               2853                 :                :          * or if the backend that held the lock did it for us already. This
                               2854                 :                :          * helps to maintain a good rate of group committing when the system
                               2855                 :                :          * is bottlenecked by the speed of fsyncing.
                               2856                 :                :          */
 5062                          2857         [ +  + ]:         136191 :         if (!LWLockAcquireOrWait(WALWriteLock, LW_EXCLUSIVE))
                               2858                 :                :         {
                               2859                 :                :             /*
                               2860                 :                :              * The lock is now free, but we didn't acquire it yet. Before we
                               2861                 :                :              * do, loop back to check if someone else flushed the record for
                               2862                 :                :              * us already.
                               2863                 :                :              */
 5071                          2864                 :           2413 :             continue;
                               2865                 :                :         }
                               2866                 :                : 
                               2867                 :                :         /* Got the lock; recheck whether request is satisfied */
  624 alvherre@alvh.no-ip.     2868                 :         133778 :         RefreshXLogWriteResult(LogwrtResult);
 4738                          2869         [ +  + ]:         133778 :         if (record <= LogwrtResult.Flush)
                               2870                 :                :         {
 4917 rhaas@postgresql.org     2871                 :           3723 :             LWLockRelease(WALWriteLock);
                               2872                 :           3723 :             break;
                               2873                 :                :         }
                               2874                 :                : 
                               2875                 :                :         /*
                               2876                 :                :          * Sleep before flush! By adding a delay here, we may give further
                               2877                 :                :          * backends the opportunity to join the backlog of group commit
                               2878                 :                :          * followers; this can significantly improve transaction throughput,
                               2879                 :                :          * at the risk of increasing transaction latency.
                               2880                 :                :          *
                               2881                 :                :          * We do not sleep if enableFsync is not turned on, nor if there are
                               2882                 :                :          * fewer than CommitSiblings other backends with active transactions.
                               2883                 :                :          */
                               2884   [ -  +  -  -  :         130055 :         if (CommitDelay > 0 && enableFsync &&
                                              -  - ]
 4917 rhaas@postgresql.org     2885                 :UBC           0 :             MinimumActiveBackends(CommitSiblings))
                               2886                 :                :         {
    9 heikki.linnakangas@i     2887                 :UNC           0 :             pgstat_report_wait_start(WAIT_EVENT_COMMIT_DELAY);
 4917 rhaas@postgresql.org     2888                 :UBC           0 :             pg_usleep(CommitDelay);
    9 heikki.linnakangas@i     2889                 :UNC           0 :             pgstat_report_wait_end();
                               2890                 :                : 
                               2891                 :                :             /*
                               2892                 :                :              * Re-check how far we can now flush the WAL. It's generally not
                               2893                 :                :              * safe to call WaitXLogInsertionsToFinish while holding
                               2894                 :                :              * WALWriteLock, because an in-progress insertion might need to
                               2895                 :                :              * also grab WALWriteLock to make progress. But we know that all
                               2896                 :                :              * the insertions up to insertpos have already finished, because
                               2897                 :                :              * that's what the earlier WaitXLogInsertionsToFinish() returned.
                               2898                 :                :              * We're only calling it again to allow insertpos to be moved
                               2899                 :                :              * further forward, not to actually wait for anyone.
                               2900                 :                :              */
 4546 heikki.linnakangas@i     2901                 :UBC           0 :             insertpos = WaitXLogInsertionsToFinish(insertpos);
                               2902                 :                :         }
                               2903                 :                : 
                               2904                 :                :         /* try to write/flush later additions to XLOG as well */
 4546 heikki.linnakangas@i     2905                 :CBC      130055 :         WriteRqst.Write = insertpos;
                               2906                 :         130055 :         WriteRqst.Flush = insertpos;
                               2907                 :                : 
 1504 rhaas@postgresql.org     2908                 :         130055 :         XLogWrite(WriteRqst, insertTLI, false);
                               2909                 :                : 
 8846 tgl@sss.pgh.pa.us        2910                 :         130055 :         LWLockRelease(WALWriteLock);
                               2911                 :                :         /* done */
 5071 heikki.linnakangas@i     2912                 :         130055 :         break;
                               2913                 :                :     }
                               2914                 :                : 
 9046 tgl@sss.pgh.pa.us        2915         [ -  + ]:         144645 :     END_CRIT_SECTION();
                               2916                 :                : 
                               2917                 :                :     /* wake up walsenders now that we've released heavily contended locks */
  985 andres@anarazel.de       2918                 :         144645 :     WalSndWakeupProcessRequests(true, !RecoveryInProgress());
                               2919                 :                : 
                               2920                 :                :     /*
                               2921                 :                :      * If we still haven't flushed to the request point then we have a
                               2922                 :                :      * problem; most likely, the requested flush point is past end of XLOG.
                               2923                 :                :      * This has been seen to occur when a disk page has a corrupted LSN.
                               2924                 :                :      *
                               2925                 :                :      * Formerly we treated this as a PANIC condition, but that hurts the
                               2926                 :                :      * system's robustness rather than helping it: we do not want to take down
                               2927                 :                :      * the whole system due to corruption on one data page.  In particular, if
                               2928                 :                :      * the bad page is encountered again during recovery then we would be
                               2929                 :                :      * unable to restart the database at all!  (This scenario actually
                               2930                 :                :      * happened in the field several times with 7.1 releases.)  As of 8.4, bad
                               2931                 :                :      * LSNs encountered during recovery are UpdateMinRecoveryPoint's problem;
                               2932                 :                :      * the only time we can reach here during recovery is while flushing the
                               2933                 :                :      * end-of-recovery checkpoint record, and we don't expect that to have a
                               2934                 :                :      * bad LSN.
                               2935                 :                :      *
                               2936                 :                :      * Note that for calls from xact.c, the ERROR will be promoted to PANIC
                               2937                 :                :      * since xact.c calls this routine inside a critical section.  However,
                               2938                 :                :      * calls from bufmgr.c are not within critical sections and so we will not
                               2939                 :                :      * force a restart for a bad LSN on a data page.
                               2940                 :                :      */
 4738 alvherre@alvh.no-ip.     2941         [ -  + ]:         144645 :     if (LogwrtResult.Flush < record)
 6019 tgl@sss.pgh.pa.us        2942         [ #  # ]:UBC           0 :         elog(ERROR,
                               2943                 :                :              "xlog flush request %X/%08X is not satisfied --- flushed only to %X/%08X",
                               2944                 :                :              LSN_FORMAT_ARGS(record),
                               2945                 :                :              LSN_FORMAT_ARGS(LogwrtResult.Flush));
                               2946                 :                : 
                               2947                 :                :     /*
                               2948                 :                :      * Cross-check XLogNeedsFlush().  Some of the checks of XLogFlush() and
                               2949                 :                :      * XLogNeedsFlush() are duplicated, and this assertion ensures that these
                               2950                 :                :      * remain consistent.
                               2951                 :                :      */
   90 michael@paquier.xyz      2952         [ -  + ]:GNC      144645 :     Assert(!XLogNeedsFlush(record));
                               2953                 :                : }
                               2954                 :                : 
                               2955                 :                : /*
                               2956                 :                :  * Write & flush xlog, but without specifying exactly where to.
                               2957                 :                :  *
                               2958                 :                :  * We normally write only completed blocks; but if there is nothing to do on
                               2959                 :                :  * that basis, we check for unwritten async commits in the current incomplete
                               2960                 :                :  * block, and write through the latest one of those.  Thus, if async commits
                               2961                 :                :  * are not being used, we will write complete blocks only.
                               2962                 :                :  *
                               2963                 :                :  * If, based on the above, there's anything to write we do so immediately. But
                               2964                 :                :  * to avoid calling fsync, fdatasync et. al. at a rate that'd impact
                               2965                 :                :  * concurrent IO, we only flush WAL every wal_writer_delay ms, or if there's
                               2966                 :                :  * more than wal_writer_flush_after unflushed blocks.
                               2967                 :                :  *
                               2968                 :                :  * We can guarantee that async commits reach disk after at most three
                               2969                 :                :  * wal_writer_delay cycles. (When flushing complete blocks, we allow XLogWrite
                               2970                 :                :  * to write "flexibly", meaning it can stop at the end of the buffer ring;
                               2971                 :                :  * this makes a difference only with very high load or long wal_writer_delay,
                               2972                 :                :  * but imposes one extra cycle for the worst case for async commits.)
                               2973                 :                :  *
                               2974                 :                :  * This routine is invoked periodically by the background walwriter process.
                               2975                 :                :  *
                               2976                 :                :  * Returns true if there was any work to do, even if we skipped flushing due
                               2977                 :                :  * to wal_writer_delay/wal_writer_flush_after.
                               2978                 :                :  */
                               2979                 :                : bool
 6722 tgl@sss.pgh.pa.us        2980                 :CBC       10811 : XLogBackgroundFlush(void)
                               2981                 :                : {
                               2982                 :                :     XLogwrtRqst WriteRqst;
                               2983                 :          10811 :     bool        flexible = true;
                               2984                 :                :     static TimestampTz lastflush;
                               2985                 :                :     TimestampTz now;
                               2986                 :                :     int         flushblocks;
                               2987                 :                :     TimeLineID  insertTLI;
                               2988                 :                : 
                               2989                 :                :     /* XLOG doesn't need flushing during recovery */
 6147 heikki.linnakangas@i     2990         [ -  + ]:          10811 :     if (RecoveryInProgress())
 4972 tgl@sss.pgh.pa.us        2991                 :UBC           0 :         return false;
                               2992                 :                : 
                               2993                 :                :     /*
                               2994                 :                :      * Since we're not in recovery, InsertTimeLineID is set and can't change,
                               2995                 :                :      * so we can read it without a lock.
                               2996                 :                :      */
 1499 rhaas@postgresql.org     2997                 :CBC       10811 :     insertTLI = XLogCtl->InsertTimeLineID;
                               2998                 :                : 
                               2999                 :                :     /* read updated LogwrtRqst */
 4105 andres@anarazel.de       3000         [ -  + ]:          10811 :     SpinLockAcquire(&XLogCtl->info_lck);
 3594                          3001                 :          10811 :     WriteRqst = XLogCtl->LogwrtRqst;
 4105                          3002                 :          10811 :     SpinLockRelease(&XLogCtl->info_lck);
                               3003                 :                : 
                               3004                 :                :     /* back off to last completed page boundary */
 3594                          3005                 :          10811 :     WriteRqst.Write -= WriteRqst.Write % XLOG_BLCKSZ;
                               3006                 :                : 
                               3007                 :                :     /* if we have already flushed that far, consider async commit records */
  622 alvherre@alvh.no-ip.     3008                 :          10811 :     RefreshXLogWriteResult(LogwrtResult);
 3594 andres@anarazel.de       3009         [ +  + ]:          10811 :     if (WriteRqst.Write <= LogwrtResult.Flush)
                               3010                 :                :     {
 4105                          3011         [ -  + ]:           8209 :         SpinLockAcquire(&XLogCtl->info_lck);
 3594                          3012                 :           8209 :         WriteRqst.Write = XLogCtl->asyncXactLSN;
 4105                          3013                 :           8209 :         SpinLockRelease(&XLogCtl->info_lck);
 6722 tgl@sss.pgh.pa.us        3014                 :           8209 :         flexible = false;       /* ensure it all gets written */
                               3015                 :                :     }
                               3016                 :                : 
                               3017                 :                :     /*
                               3018                 :                :      * If already known flushed, we're done. Just need to check if we are
                               3019                 :                :      * holding an open file handle to a logfile that's no longer in use,
                               3020                 :                :      * preventing the file from being deleted.
                               3021                 :                :      */
 3594 andres@anarazel.de       3022         [ +  + ]:          10811 :     if (WriteRqst.Write <= LogwrtResult.Flush)
                               3023                 :                :     {
 5644 bruce@momjian.us         3024         [ +  + ]:           7523 :         if (openLogFile >= 0)
                               3025                 :                :         {
 3012 andres@anarazel.de       3026         [ +  + ]:           4827 :             if (!XLByteInPrevSeg(LogwrtResult.Write, openLogSegNo,
                               3027                 :                :                                  wal_segment_size))
                               3028                 :                :             {
 5671 magnus@hagander.net      3029                 :            185 :                 XLogFileClose();
                               3030                 :                :             }
                               3031                 :                :         }
 4972 tgl@sss.pgh.pa.us        3032                 :           7523 :         return false;
                               3033                 :                :     }
                               3034                 :                : 
                               3035                 :                :     /*
                               3036                 :                :      * Determine how far to flush WAL, based on the wal_writer_delay and
                               3037                 :                :      * wal_writer_flush_after GUCs.
                               3038                 :                :      *
                               3039                 :                :      * Note that XLogSetAsyncXactLSN() performs similar calculation based on
                               3040                 :                :      * wal_writer_flush_after, to decide when to wake us up.  Make sure the
                               3041                 :                :      * logic is the same in both places if you change this.
                               3042                 :                :      */
 3594 andres@anarazel.de       3043                 :           3288 :     now = GetCurrentTimestamp();
  752 heikki.linnakangas@i     3044                 :           3288 :     flushblocks =
 3594 andres@anarazel.de       3045                 :           3288 :         WriteRqst.Write / XLOG_BLCKSZ - LogwrtResult.Flush / XLOG_BLCKSZ;
                               3046                 :                : 
                               3047   [ +  -  +  + ]:           3288 :     if (WalWriterFlushAfter == 0 || lastflush == 0)
                               3048                 :                :     {
                               3049                 :                :         /* first call, or block based limits disabled */
                               3050                 :            269 :         WriteRqst.Flush = WriteRqst.Write;
                               3051                 :            269 :         lastflush = now;
                               3052                 :                :     }
                               3053         [ +  + ]:           3019 :     else if (TimestampDifferenceExceeds(lastflush, now, WalWriterDelay))
                               3054                 :                :     {
                               3055                 :                :         /*
                               3056                 :                :          * Flush the writes at least every WalWriterDelay ms. This is
                               3057                 :                :          * important to bound the amount of time it takes for an asynchronous
                               3058                 :                :          * commit to hit disk.
                               3059                 :                :          */
                               3060                 :           2878 :         WriteRqst.Flush = WriteRqst.Write;
                               3061                 :           2878 :         lastflush = now;
                               3062                 :                :     }
  752 heikki.linnakangas@i     3063         [ +  + ]:            141 :     else if (flushblocks >= WalWriterFlushAfter)
                               3064                 :                :     {
                               3065                 :                :         /* exceeded wal_writer_flush_after blocks, flush */
 3594 andres@anarazel.de       3066                 :            129 :         WriteRqst.Flush = WriteRqst.Write;
                               3067                 :            129 :         lastflush = now;
                               3068                 :                :     }
                               3069                 :                :     else
                               3070                 :                :     {
                               3071                 :                :         /* no flushing, this time round */
                               3072                 :             12 :         WriteRqst.Flush = 0;
                               3073                 :                :     }
                               3074                 :                : 
                               3075                 :                : #ifdef WAL_DEBUG
                               3076                 :                :     if (XLOG_DEBUG)
                               3077                 :                :         elog(LOG, "xlog bg flush request write %X/%08X; flush: %X/%08X, current is write %X/%08X; flush %X/%08X",
                               3078                 :                :              LSN_FORMAT_ARGS(WriteRqst.Write),
                               3079                 :                :              LSN_FORMAT_ARGS(WriteRqst.Flush),
                               3080                 :                :              LSN_FORMAT_ARGS(LogwrtResult.Write),
                               3081                 :                :              LSN_FORMAT_ARGS(LogwrtResult.Flush));
                               3082                 :                : #endif
                               3083                 :                : 
 6722 tgl@sss.pgh.pa.us        3084                 :           3288 :     START_CRIT_SECTION();
                               3085                 :                : 
                               3086                 :                :     /* now wait for any in-progress insertions to finish and get write lock */
 3594 andres@anarazel.de       3087                 :           3288 :     WaitXLogInsertionsToFinish(WriteRqst.Write);
 6722 tgl@sss.pgh.pa.us        3088                 :           3288 :     LWLockAcquire(WALWriteLock, LW_EXCLUSIVE);
  624 alvherre@alvh.no-ip.     3089                 :           3288 :     RefreshXLogWriteResult(LogwrtResult);
 3594 andres@anarazel.de       3090         [ +  + ]:           3288 :     if (WriteRqst.Write > LogwrtResult.Write ||
                               3091         [ +  + ]:            181 :         WriteRqst.Flush > LogwrtResult.Flush)
                               3092                 :                :     {
 1504 rhaas@postgresql.org     3093                 :           3220 :         XLogWrite(WriteRqst, insertTLI, flexible);
                               3094                 :                :     }
 6722 tgl@sss.pgh.pa.us        3095                 :           3288 :     LWLockRelease(WALWriteLock);
                               3096                 :                : 
                               3097         [ -  + ]:           3288 :     END_CRIT_SECTION();
                               3098                 :                : 
                               3099                 :                :     /* wake up walsenders now that we've released heavily contended locks */
  985 andres@anarazel.de       3100                 :           3288 :     WalSndWakeupProcessRequests(true, !RecoveryInProgress());
                               3101                 :                : 
                               3102                 :                :     /*
                               3103                 :                :      * Great, done. To take some work off the critical path, try to initialize
                               3104                 :                :      * as many of the no-longer-needed WAL buffers for future use as we can.
                               3105                 :                :      */
 1504 rhaas@postgresql.org     3106                 :           3288 :     AdvanceXLInsertBuffer(InvalidXLogRecPtr, insertTLI, true);
                               3107                 :                : 
                               3108                 :                :     /*
                               3109                 :                :      * If we determined that we need to write data, but somebody else
                               3110                 :                :      * wrote/flushed already, it should be considered as being active, to
                               3111                 :                :      * avoid hibernating too early.
                               3112                 :                :      */
 3594 andres@anarazel.de       3113                 :           3288 :     return true;
                               3114                 :                : }
                               3115                 :                : 
                               3116                 :                : /*
                               3117                 :                :  * Test whether XLOG data has been flushed up to (at least) the given
                               3118                 :                :  * position, or whether the minimum recovery point has been updated past
                               3119                 :                :  * the given position.
                               3120                 :                :  *
                               3121                 :                :  * Returns true if a flush is still needed, or if the minimum recovery point
                               3122                 :                :  * must be updated.
                               3123                 :                :  *
                               3124                 :                :  * It is possible that someone else is already in the process of flushing
                               3125                 :                :  * that far, or has updated the minimum recovery point up to the given
                               3126                 :                :  * position.
                               3127                 :                :  */
                               3128                 :                : bool
 6777 tgl@sss.pgh.pa.us        3129                 :        8869379 : XLogNeedsFlush(XLogRecPtr record)
                               3130                 :                : {
                               3131                 :                :     /*
                               3132                 :                :      * During recovery, we don't flush WAL but update minRecoveryPoint
                               3133                 :                :      * instead. So "needs flush" is taken to mean whether minRecoveryPoint
                               3134                 :                :      * would need to be updated.
                               3135                 :                :      *
                               3136                 :                :      * Using XLogInsertAllowed() rather than RecoveryInProgress() matters for
                               3137                 :                :      * the case of an end-of-recovery checkpoint, where WAL data is flushed.
                               3138                 :                :      * This check should be consistent with the one in XLogFlush().
                               3139                 :                :      */
   90 michael@paquier.xyz      3140         [ +  + ]:GNC     8869379 :     if (!XLogInsertAllowed())
                               3141                 :                :     {
                               3142                 :                :         /* Quick exit if already known to be updated or cannot be updated */
   79                          3143   [ +  -  +  + ]:         619319 :         if (!updateMinRecoveryPoint || record <= LocalMinRecoveryPoint)
                               3144                 :         610499 :             return false;
                               3145                 :                : 
                               3146                 :                :         /*
                               3147                 :                :          * An invalid minRecoveryPoint means that we need to recover all the
                               3148                 :                :          * WAL, i.e., we're doing crash recovery.  We never modify the control
                               3149                 :                :          * file's value in that case, so we can short-circuit future checks
                               3150                 :                :          * here too.  This triggers a quick exit path for the startup process,
                               3151                 :                :          * which cannot update its local copy of minRecoveryPoint as long as
                               3152                 :                :          * it has not replayed all WAL available when doing crash recovery.
                               3153                 :                :          */
   42 alvherre@kurilemu.de     3154   [ +  +  -  + ]:           8820 :         if (!XLogRecPtrIsValid(LocalMinRecoveryPoint) && InRecovery)
                               3155                 :                :         {
 2723 michael@paquier.xyz      3156                 :UBC           0 :             updateMinRecoveryPoint = false;
 5843 simon@2ndQuadrant.co     3157                 :LBC    (609784) :             return false;
                               3158                 :                :         }
                               3159                 :                : 
                               3160                 :                :         /*
                               3161                 :                :          * Update local copy of minRecoveryPoint. But if the lock is busy,
                               3162                 :                :          * just return a conservative guess.
                               3163                 :                :          */
 5843 simon@2ndQuadrant.co     3164         [ -  + ]:CBC        8820 :         if (!LWLockConditionalAcquire(ControlFileLock, LW_SHARED))
 5843 simon@2ndQuadrant.co     3165                 :UBC           0 :             return true;
 1401 heikki.linnakangas@i     3166                 :CBC        8820 :         LocalMinRecoveryPoint = ControlFile->minRecoveryPoint;
                               3167                 :           8820 :         LocalMinRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
 5843 simon@2ndQuadrant.co     3168                 :           8820 :         LWLockRelease(ControlFileLock);
                               3169                 :                : 
                               3170                 :                :         /*
                               3171                 :                :          * Check minRecoveryPoint for any other process than the startup
                               3172                 :                :          * process doing crash recovery, which should not update the control
                               3173                 :                :          * file value if crash recovery is still running.
                               3174                 :                :          */
   42 alvherre@kurilemu.de     3175         [ -  + ]:GNC        8820 :         if (!XLogRecPtrIsValid(LocalMinRecoveryPoint))
 2666 michael@paquier.xyz      3176                 :UBC           0 :             updateMinRecoveryPoint = false;
                               3177                 :                : 
                               3178                 :                :         /* check again */
 1401 heikki.linnakangas@i     3179   [ +  +  -  + ]:CBC        8820 :         if (record <= LocalMinRecoveryPoint || !updateMinRecoveryPoint)
 2666 michael@paquier.xyz      3180                 :             73 :             return false;
                               3181                 :                :         else
                               3182                 :           8747 :             return true;
                               3183                 :                :     }
                               3184                 :                : 
                               3185                 :                :     /* Quick exit if already known flushed */
 4738 alvherre@alvh.no-ip.     3186         [ +  + ]:        8250060 :     if (record <= LogwrtResult.Flush)
 6777 tgl@sss.pgh.pa.us        3187                 :        8056587 :         return false;
                               3188                 :                : 
                               3189                 :                :     /* read LogwrtResult and update local state */
  624 alvherre@alvh.no-ip.     3190                 :         193473 :     RefreshXLogWriteResult(LogwrtResult);
                               3191                 :                : 
                               3192                 :                :     /* check again */
 4738                          3193         [ +  + ]:         193473 :     if (record <= LogwrtResult.Flush)
 6777 tgl@sss.pgh.pa.us        3194                 :           4069 :         return false;
                               3195                 :                : 
                               3196                 :         189404 :     return true;
                               3197                 :                : }
                               3198                 :                : 
                               3199                 :                : /*
                               3200                 :                :  * Try to make a given XLOG file segment exist.
                               3201                 :                :  *
                               3202                 :                :  * logsegno: identify segment.
                               3203                 :                :  *
                               3204                 :                :  * *added: on return, true if this call raised the number of extant segments.
                               3205                 :                :  *
                               3206                 :                :  * path: on return, this char[MAXPGPATH] has the path to the logsegno file.
                               3207                 :                :  *
                               3208                 :                :  * Returns -1 or FD of opened file.  A -1 here is not an error; a caller
                               3209                 :                :  * wanting an open segment should attempt to open "path", which usually will
                               3210                 :                :  * succeed.  (This is weird, but it's efficient for the callers.)
                               3211                 :                :  */
                               3212                 :                : static int
 1504 rhaas@postgresql.org     3213                 :          14382 : XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli,
                               3214                 :                :                      bool *added, char *path)
                               3215                 :                : {
                               3216                 :                :     char        tmppath[MAXPGPATH];
                               3217                 :                :     XLogSegNo   installed_segno;
                               3218                 :                :     XLogSegNo   max_segno;
                               3219                 :                :     int         fd;
                               3220                 :                :     int         save_errno;
  985 tmunro@postgresql.or     3221                 :          14382 :     int         open_flags = O_RDWR | O_CREAT | O_EXCL | PG_BINARY;
                               3222                 :                :     instr_time  io_start;
                               3223                 :                : 
 1504 rhaas@postgresql.org     3224         [ -  + ]:          14382 :     Assert(logtli != 0);
                               3225                 :                : 
                               3226                 :          14382 :     XLogFilePath(path, logtli, logsegno, wal_segment_size);
                               3227                 :                : 
                               3228                 :                :     /*
                               3229                 :                :      * Try to use existent file (checkpoint maker may have created it already)
                               3230                 :                :      */
 1634 noah@leadboat.com        3231                 :          14382 :     *added = false;
 1021 tmunro@postgresql.or     3232                 :          14382 :     fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC |
  797 nathan@postgresql.or     3233                 :          14382 :                        get_sync_bit(wal_sync_method));
 1634 noah@leadboat.com        3234         [ +  + ]:          14382 :     if (fd < 0)
                               3235                 :                :     {
                               3236         [ -  + ]:           1375 :         if (errno != ENOENT)
 1634 noah@leadboat.com        3237         [ #  # ]:UBC           0 :             ereport(ERROR,
                               3238                 :                :                     (errcode_for_file_access(),
                               3239                 :                :                      errmsg("could not open file \"%s\": %m", path)));
                               3240                 :                :     }
                               3241                 :                :     else
 1634 noah@leadboat.com        3242                 :CBC       13007 :         return fd;
                               3243                 :                : 
                               3244                 :                :     /*
                               3245                 :                :      * Initialize an empty (all zeroes) segment.  NOTE: it is possible that
                               3246                 :                :      * another process is doing the same thing.  If so, we will end up
                               3247                 :                :      * pre-creating an extra log segment.  That seems OK, and better than
                               3248                 :                :      * holding the lock throughout this lengthy process.
                               3249                 :                :      */
 6746 tgl@sss.pgh.pa.us        3250         [ +  + ]:           1375 :     elog(DEBUG2, "creating and filling new WAL file");
                               3251                 :                : 
 7472                          3252                 :           1375 :     snprintf(tmppath, MAXPGPATH, XLOGDIR "/xlogtemp.%d", (int) getpid());
                               3253                 :                : 
 9042                          3254                 :           1375 :     unlink(tmppath);
                               3255                 :                : 
  985 tmunro@postgresql.or     3256         [ -  + ]:           1375 :     if (io_direct_flags & IO_DIRECT_WAL_INIT)
  985 tmunro@postgresql.or     3257                 :UBC           0 :         open_flags |= PG_O_DIRECT;
                               3258                 :                : 
                               3259                 :                :     /* do not use get_sync_bit() here --- want to fsync only at end of fill */
  985 tmunro@postgresql.or     3260                 :CBC        1375 :     fd = BasicOpenFile(tmppath, open_flags);
 9579 vadim4o@yahoo.com        3261         [ -  + ]:           1375 :     if (fd < 0)
 7552 tgl@sss.pgh.pa.us        3262         [ #  # ]:UBC           0 :         ereport(ERROR,
                               3263                 :                :                 (errcode_for_file_access(),
                               3264                 :                :                  errmsg("could not create file \"%s\": %m", tmppath)));
                               3265                 :                : 
                               3266                 :                :     /* Measure I/O timing when initializing segment */
  295 michael@paquier.xyz      3267                 :CBC        1375 :     io_start = pgstat_prepare_io_time(track_wal_io_timing);
                               3268                 :                : 
 2452 tmunro@postgresql.or     3269                 :           1375 :     pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_WRITE);
                               3270                 :           1375 :     save_errno = 0;
                               3271         [ +  - ]:           1375 :     if (wal_init_zero)
                               3272                 :                :     {
                               3273                 :                :         ssize_t     rc;
                               3274                 :                : 
                               3275                 :                :         /*
                               3276                 :                :          * Zero-fill the file.  With this setting, we do this the hard way to
                               3277                 :                :          * ensure that all the file space has really been allocated.  On
                               3278                 :                :          * platforms that allow "holes" in files, just seeking to the end
                               3279                 :                :          * doesn't allocate intermediate space.  This way, we know that we
                               3280                 :                :          * have all the space and (after the fsync below) that all the
                               3281                 :                :          * indirect blocks are down on disk.  Therefore, fdatasync(2) or
                               3282                 :                :          * O_DSYNC will be sufficient to sync future writes to the log file.
                               3283                 :                :          */
 1018 michael@paquier.xyz      3284                 :           1375 :         rc = pg_pwrite_zeros(fd, wal_segment_size, 0);
                               3285                 :                : 
 1136                          3286         [ -  + ]:           1375 :         if (rc < 0)
 1136 michael@paquier.xyz      3287                 :UBC           0 :             save_errno = errno;
                               3288                 :                :     }
                               3289                 :                :     else
                               3290                 :                :     {
                               3291                 :                :         /*
                               3292                 :                :          * Otherwise, seeking to the end and writing a solitary byte is
                               3293                 :                :          * enough.
                               3294                 :                :          */
 4488 jdavis@postgresql.or     3295                 :              0 :         errno = 0;
 1136 michael@paquier.xyz      3296         [ #  # ]:              0 :         if (pg_pwrite(fd, "\0", 1, wal_segment_size - 1) != 1)
                               3297                 :                :         {
                               3298                 :                :             /* if write didn't set errno, assume no disk space */
 2452 tmunro@postgresql.or     3299         [ #  # ]:              0 :             save_errno = errno ? errno : ENOSPC;
                               3300                 :                :         }
                               3301                 :                :     }
 2452 tmunro@postgresql.or     3302                 :CBC        1375 :     pgstat_report_wait_end();
                               3303                 :                : 
                               3304                 :                :     /*
                               3305                 :                :      * A full segment worth of data is written when using wal_init_zero. One
                               3306                 :                :      * byte is written when not using it.
                               3307                 :                :      */
  317 michael@paquier.xyz      3308                 :           1375 :     pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_INIT, IOOP_WRITE,
                               3309                 :                :                             io_start, 1,
                               3310         [ +  - ]:           1375 :                             wal_init_zero ? wal_segment_size : 1);
                               3311                 :                : 
 2452 tmunro@postgresql.or     3312         [ -  + ]:           1375 :     if (save_errno)
                               3313                 :                :     {
                               3314                 :                :         /*
                               3315                 :                :          * If we fail to make the file, delete it to release disk space
                               3316                 :                :          */
 2452 tmunro@postgresql.or     3317                 :UBC           0 :         unlink(tmppath);
                               3318                 :                : 
                               3319                 :              0 :         close(fd);
                               3320                 :                : 
                               3321                 :              0 :         errno = save_errno;
                               3322                 :                : 
                               3323         [ #  # ]:              0 :         ereport(ERROR,
                               3324                 :                :                 (errcode_for_file_access(),
                               3325                 :                :                  errmsg("could not write to file \"%s\": %m", tmppath)));
                               3326                 :                :     }
                               3327                 :                : 
                               3328                 :                :     /* Measure I/O timing when flushing segment */
  295 michael@paquier.xyz      3329                 :CBC        1375 :     io_start = pgstat_prepare_io_time(track_wal_io_timing);
                               3330                 :                : 
 3197 rhaas@postgresql.org     3331                 :           1375 :     pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_SYNC);
 9141 tgl@sss.pgh.pa.us        3332         [ -  + ]:           1375 :     if (pg_fsync(fd) != 0)
                               3333                 :                :     {
 1210 drowley@postgresql.o     3334                 :UBC           0 :         save_errno = errno;
 4769 heikki.linnakangas@i     3335                 :              0 :         close(fd);
 2733 michael@paquier.xyz      3336                 :              0 :         errno = save_errno;
 7552 tgl@sss.pgh.pa.us        3337         [ #  # ]:              0 :         ereport(ERROR,
                               3338                 :                :                 (errcode_for_file_access(),
                               3339                 :                :                  errmsg("could not fsync file \"%s\": %m", tmppath)));
                               3340                 :                :     }
 3197 rhaas@postgresql.org     3341                 :CBC        1375 :     pgstat_report_wait_end();
                               3342                 :                : 
  317 michael@paquier.xyz      3343                 :           1375 :     pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_INIT,
                               3344                 :                :                             IOOP_FSYNC, io_start, 1, 0);
                               3345                 :                : 
 2357 peter@eisentraut.org     3346         [ -  + ]:           1375 :     if (close(fd) != 0)
 7552 tgl@sss.pgh.pa.us        3347         [ #  # ]:UBC           0 :         ereport(ERROR,
                               3348                 :                :                 (errcode_for_file_access(),
                               3349                 :                :                  errmsg("could not close file \"%s\": %m", tmppath)));
                               3350                 :                : 
                               3351                 :                :     /*
                               3352                 :                :      * Now move the segment into place with its final name.  Cope with
                               3353                 :                :      * possibility that someone else has created the file while we were
                               3354                 :                :      * filling ours: if so, use ours to pre-create a future log segment.
                               3355                 :                :      */
 4925 heikki.linnakangas@i     3356                 :CBC        1375 :     installed_segno = logsegno;
                               3357                 :                : 
                               3358                 :                :     /*
                               3359                 :                :      * XXX: What should we use as max_segno? We used to use XLOGfileslop when
                               3360                 :                :      * that was a constant, but that was always a bit dubious: normally, at a
                               3361                 :                :      * checkpoint, XLOGfileslop was the offset from the checkpoint record, but
                               3362                 :                :      * here, it was the offset from the insert location. We can't do the
                               3363                 :                :      * normal XLOGfileslop calculation here because we don't have access to
                               3364                 :                :      * the prior checkpoint's redo location. So somewhat arbitrarily, just use
                               3365                 :                :      * CheckPointSegments.
                               3366                 :                :      */
 3951                          3367                 :           1375 :     max_segno = logsegno + CheckPointSegments;
 1504 rhaas@postgresql.org     3368         [ +  - ]:           1375 :     if (InstallXLogFileSegment(&installed_segno, tmppath, true, max_segno,
                               3369                 :                :                                logtli))
                               3370                 :                :     {
 1634 noah@leadboat.com        3371                 :           1375 :         *added = true;
                               3372         [ +  + ]:           1375 :         elog(DEBUG2, "done creating and filling new WAL file");
                               3373                 :                :     }
                               3374                 :                :     else
                               3375                 :                :     {
                               3376                 :                :         /*
                               3377                 :                :          * No need for any more future segments, or InstallXLogFileSegment()
                               3378                 :                :          * failed to rename the file into place. If the rename failed, a
                               3379                 :                :          * caller opening the file may fail.
                               3380                 :                :          */
 8918 tgl@sss.pgh.pa.us        3381                 :UBC           0 :         unlink(tmppath);
 1634 noah@leadboat.com        3382         [ #  # ]:              0 :         elog(DEBUG2, "abandoned new WAL file");
                               3383                 :                :     }
                               3384                 :                : 
 1634 noah@leadboat.com        3385                 :CBC        1375 :     return -1;
                               3386                 :                : }
                               3387                 :                : 
                               3388                 :                : /*
                               3389                 :                :  * Create a new XLOG file segment, or open a pre-existing one.
                               3390                 :                :  *
                               3391                 :                :  * logsegno: identify segment to be created/opened.
                               3392                 :                :  *
                               3393                 :                :  * Returns FD of opened file.
                               3394                 :                :  *
                               3395                 :                :  * Note: errors here are ERROR not PANIC because we might or might not be
                               3396                 :                :  * inside a critical section (eg, during checkpoint there is no reason to
                               3397                 :                :  * take down the system on failure).  They will promote to PANIC if we are
                               3398                 :                :  * in a critical section.
                               3399                 :                :  */
                               3400                 :                : int
 1504 rhaas@postgresql.org     3401                 :          14150 : XLogFileInit(XLogSegNo logsegno, TimeLineID logtli)
                               3402                 :                : {
                               3403                 :                :     bool        ignore_added;
                               3404                 :                :     char        path[MAXPGPATH];
                               3405                 :                :     int         fd;
                               3406                 :                : 
                               3407         [ -  + ]:          14150 :     Assert(logtli != 0);
                               3408                 :                : 
                               3409                 :          14150 :     fd = XLogFileInitInternal(logsegno, logtli, &ignore_added, path);
 1634 noah@leadboat.com        3410         [ +  + ]:          14150 :     if (fd >= 0)
                               3411                 :          12845 :         return fd;
                               3412                 :                : 
                               3413                 :                :     /* Now open original target segment (might not be file I just made) */
 1021 tmunro@postgresql.or     3414                 :           1305 :     fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC |
  797 nathan@postgresql.or     3415                 :           1305 :                        get_sync_bit(wal_sync_method));
 8918 tgl@sss.pgh.pa.us        3416         [ -  + ]:           1305 :     if (fd < 0)
 7552 tgl@sss.pgh.pa.us        3417         [ #  # ]:UBC           0 :         ereport(ERROR,
                               3418                 :                :                 (errcode_for_file_access(),
                               3419                 :                :                  errmsg("could not open file \"%s\": %m", path)));
 7281 neilc@samurai.com        3420                 :CBC        1305 :     return fd;
                               3421                 :                : }
                               3422                 :                : 
                               3423                 :                : /*
                               3424                 :                :  * Create a new XLOG file segment by copying a pre-existing one.
                               3425                 :                :  *
                               3426                 :                :  * destsegno: identify segment to be created.
                               3427                 :                :  *
                               3428                 :                :  * srcTLI, srcsegno: identify segment to be copied (could be from
                               3429                 :                :  *      a different timeline)
                               3430                 :                :  *
                               3431                 :                :  * upto: how much of the source file to copy (the rest is filled with
                               3432                 :                :  *      zeros)
                               3433                 :                :  *
                               3434                 :                :  * Currently this is only used during recovery, and so there are no locking
                               3435                 :                :  * considerations.  But we should be just as tense as XLogFileInit to avoid
                               3436                 :                :  * emplacing a bogus file.
                               3437                 :                :  */
                               3438                 :                : static void
 1504 rhaas@postgresql.org     3439                 :             39 : XLogFileCopy(TimeLineID destTLI, XLogSegNo destsegno,
                               3440                 :                :              TimeLineID srcTLI, XLogSegNo srcsegno,
                               3441                 :                :              int upto)
                               3442                 :                : {
                               3443                 :                :     char        path[MAXPGPATH];
                               3444                 :                :     char        tmppath[MAXPGPATH];
                               3445                 :                :     PGAlignedXLogBlock buffer;
                               3446                 :                :     int         srcfd;
                               3447                 :                :     int         fd;
                               3448                 :                :     int         nbytes;
                               3449                 :                : 
                               3450                 :                :     /*
                               3451                 :                :      * Open the source file
                               3452                 :                :      */
 3012 andres@anarazel.de       3453                 :             39 :     XLogFilePath(path, srcTLI, srcsegno, wal_segment_size);
 3008 peter_e@gmx.net          3454                 :             39 :     srcfd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
 7820 tgl@sss.pgh.pa.us        3455         [ -  + ]:             39 :     if (srcfd < 0)
 7552 tgl@sss.pgh.pa.us        3456         [ #  # ]:UBC           0 :         ereport(ERROR,
                               3457                 :                :                 (errcode_for_file_access(),
                               3458                 :                :                  errmsg("could not open file \"%s\": %m", path)));
                               3459                 :                : 
                               3460                 :                :     /*
                               3461                 :                :      * Copy into a temp file name.
                               3462                 :                :      */
 7472 tgl@sss.pgh.pa.us        3463                 :CBC          39 :     snprintf(tmppath, MAXPGPATH, XLOGDIR "/xlogtemp.%d", (int) getpid());
                               3464                 :                : 
 7820                          3465                 :             39 :     unlink(tmppath);
                               3466                 :                : 
                               3467                 :                :     /* do not use get_sync_bit() here --- want to fsync only at end of fill */
 3008 peter_e@gmx.net          3468                 :             39 :     fd = OpenTransientFile(tmppath, O_RDWR | O_CREAT | O_EXCL | PG_BINARY);
 7820 tgl@sss.pgh.pa.us        3469         [ -  + ]:             39 :     if (fd < 0)
 7552 tgl@sss.pgh.pa.us        3470         [ #  # ]:UBC           0 :         ereport(ERROR,
                               3471                 :                :                 (errcode_for_file_access(),
                               3472                 :                :                  errmsg("could not create file \"%s\": %m", tmppath)));
                               3473                 :                : 
                               3474                 :                :     /*
                               3475                 :                :      * Do the data copying.
                               3476                 :                :      */
 3012 andres@anarazel.de       3477         [ +  + ]:CBC       79911 :     for (nbytes = 0; nbytes < wal_segment_size; nbytes += sizeof(buffer))
                               3478                 :                :     {
                               3479                 :                :         int         nread;
                               3480                 :                : 
 4018 heikki.linnakangas@i     3481                 :          79872 :         nread = upto - nbytes;
                               3482                 :                : 
                               3483                 :                :         /*
                               3484                 :                :          * The part that is not read from the source file is filled with
                               3485                 :                :          * zeros.
                               3486                 :                :          */
                               3487         [ +  + ]:          79872 :         if (nread < sizeof(buffer))
 2665 tgl@sss.pgh.pa.us        3488                 :             39 :             memset(buffer.data, 0, sizeof(buffer));
                               3489                 :                : 
 4018 heikki.linnakangas@i     3490         [ +  + ]:          79872 :         if (nread > 0)
                               3491                 :                :         {
                               3492                 :                :             int         r;
                               3493                 :                : 
                               3494         [ +  + ]:           2198 :             if (nread > sizeof(buffer))
                               3495                 :           2159 :                 nread = sizeof(buffer);
 3197 rhaas@postgresql.org     3496                 :           2198 :             pgstat_report_wait_start(WAIT_EVENT_WAL_COPY_READ);
 2665 tgl@sss.pgh.pa.us        3497                 :           2198 :             r = read(srcfd, buffer.data, nread);
 2710 michael@paquier.xyz      3498         [ -  + ]:           2198 :             if (r != nread)
                               3499                 :                :             {
 2710 michael@paquier.xyz      3500         [ #  # ]:UBC           0 :                 if (r < 0)
 4018 heikki.linnakangas@i     3501         [ #  # ]:              0 :                     ereport(ERROR,
                               3502                 :                :                             (errcode_for_file_access(),
                               3503                 :                :                              errmsg("could not read file \"%s\": %m",
                               3504                 :                :                                     path)));
                               3505                 :                :                 else
                               3506         [ #  # ]:              0 :                     ereport(ERROR,
                               3507                 :                :                             (errcode(ERRCODE_DATA_CORRUPTED),
                               3508                 :                :                              errmsg("could not read file \"%s\": read %d of %zu",
                               3509                 :                :                                     path, r, (Size) nread)));
                               3510                 :                :             }
 3197 rhaas@postgresql.org     3511                 :CBC        2198 :             pgstat_report_wait_end();
                               3512                 :                :         }
 7820 tgl@sss.pgh.pa.us        3513                 :          79872 :         errno = 0;
 3197 rhaas@postgresql.org     3514                 :          79872 :         pgstat_report_wait_start(WAIT_EVENT_WAL_COPY_WRITE);
 2665 tgl@sss.pgh.pa.us        3515         [ -  + ]:          79872 :         if ((int) write(fd, buffer.data, sizeof(buffer)) != (int) sizeof(buffer))
                               3516                 :                :         {
 7820 tgl@sss.pgh.pa.us        3517                 :UBC           0 :             int         save_errno = errno;
                               3518                 :                : 
                               3519                 :                :             /*
                               3520                 :                :              * If we fail to make the file, delete it to release disk space
                               3521                 :                :              */
                               3522                 :              0 :             unlink(tmppath);
                               3523                 :                :             /* if write didn't set errno, assume problem is no disk space */
                               3524         [ #  # ]:              0 :             errno = save_errno ? save_errno : ENOSPC;
                               3525                 :                : 
 7552                          3526         [ #  # ]:              0 :             ereport(ERROR,
                               3527                 :                :                     (errcode_for_file_access(),
                               3528                 :                :                      errmsg("could not write to file \"%s\": %m", tmppath)));
                               3529                 :                :         }
 3197 rhaas@postgresql.org     3530                 :CBC       79872 :         pgstat_report_wait_end();
                               3531                 :                :     }
                               3532                 :                : 
                               3533                 :             39 :     pgstat_report_wait_start(WAIT_EVENT_WAL_COPY_SYNC);
 7820 tgl@sss.pgh.pa.us        3534         [ -  + ]:             39 :     if (pg_fsync(fd) != 0)
 2586 tmunro@postgresql.or     3535         [ #  # ]:UBC           0 :         ereport(data_sync_elevel(ERROR),
                               3536                 :                :                 (errcode_for_file_access(),
                               3537                 :                :                  errmsg("could not fsync file \"%s\": %m", tmppath)));
 3197 rhaas@postgresql.org     3538                 :CBC          39 :     pgstat_report_wait_end();
                               3539                 :                : 
 2357 peter@eisentraut.org     3540         [ -  + ]:             39 :     if (CloseTransientFile(fd) != 0)
 7552 tgl@sss.pgh.pa.us        3541         [ #  # ]:UBC           0 :         ereport(ERROR,
                               3542                 :                :                 (errcode_for_file_access(),
                               3543                 :                :                  errmsg("could not close file \"%s\": %m", tmppath)));
                               3544                 :                : 
 2357 peter@eisentraut.org     3545         [ -  + ]:CBC          39 :     if (CloseTransientFile(srcfd) != 0)
 2476 michael@paquier.xyz      3546         [ #  # ]:UBC           0 :         ereport(ERROR,
                               3547                 :                :                 (errcode_for_file_access(),
                               3548                 :                :                  errmsg("could not close file \"%s\": %m", path)));
                               3549                 :                : 
                               3550                 :                :     /*
                               3551                 :                :      * Now move the segment into place with its final name.
                               3552                 :                :      */
 1504 rhaas@postgresql.org     3553         [ -  + ]:CBC          39 :     if (!InstallXLogFileSegment(&destsegno, tmppath, false, 0, destTLI))
 3823 fujii@postgresql.org     3554         [ #  # ]:UBC           0 :         elog(ERROR, "InstallXLogFileSegment should not have failed");
 7820 tgl@sss.pgh.pa.us        3555                 :CBC          39 : }
                               3556                 :                : 
                               3557                 :                : /*
                               3558                 :                :  * Install a new XLOG segment file as a current or future log segment.
                               3559                 :                :  *
                               3560                 :                :  * This is used both to install a newly-created segment (which has a temp
                               3561                 :                :  * filename while it's being created) and to recycle an old segment.
                               3562                 :                :  *
                               3563                 :                :  * *segno: identify segment to install as (or first possible target).
                               3564                 :                :  * When find_free is true, this is modified on return to indicate the
                               3565                 :                :  * actual installation location or last segment searched.
                               3566                 :                :  *
                               3567                 :                :  * tmppath: initial name of file to install.  It will be renamed into place.
                               3568                 :                :  *
                               3569                 :                :  * find_free: if true, install the new segment at the first empty segno
                               3570                 :                :  * number at or after the passed numbers.  If false, install the new segment
                               3571                 :                :  * exactly where specified, deleting any existing segment file there.
                               3572                 :                :  *
                               3573                 :                :  * max_segno: maximum segment number to install the new file as.  Fail if no
                               3574                 :                :  * free slot is found between *segno and max_segno. (Ignored when find_free
                               3575                 :                :  * is false.)
                               3576                 :                :  *
                               3577                 :                :  * tli: The timeline on which the new segment should be installed.
                               3578                 :                :  *
                               3579                 :                :  * Returns true if the file was installed successfully.  false indicates that
                               3580                 :                :  * max_segno limit was exceeded, the startup process has disabled this
                               3581                 :                :  * function for now, or an error occurred while renaming the file into place.
                               3582                 :                :  */
                               3583                 :                : static bool
 4925 heikki.linnakangas@i     3584                 :           2979 : InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
                               3585                 :                :                        bool find_free, XLogSegNo max_segno, TimeLineID tli)
                               3586                 :                : {
                               3587                 :                :     char        path[MAXPGPATH];
                               3588                 :                :     struct stat stat_buf;
                               3589                 :                : 
 1504 rhaas@postgresql.org     3590         [ -  + ]:           2979 :     Assert(tli != 0);
                               3591                 :                : 
                               3592                 :           2979 :     XLogFilePath(path, tli, *segno, wal_segment_size);
                               3593                 :                : 
 1634 noah@leadboat.com        3594                 :           2979 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               3595         [ -  + ]:           2979 :     if (!XLogCtl->InstallXLogFileSegmentActive)
                               3596                 :                :     {
 1634 noah@leadboat.com        3597                 :UBC           0 :         LWLockRelease(ControlFileLock);
                               3598                 :              0 :         return false;
                               3599                 :                :     }
                               3600                 :                : 
 8918 tgl@sss.pgh.pa.us        3601         [ +  + ]:CBC        2979 :     if (!find_free)
                               3602                 :                :     {
                               3603                 :                :         /* Force installation: get rid of any pre-existing segment file */
 3188 teodor@sigaev.ru         3604                 :             39 :         durable_unlink(path, DEBUG1);
                               3605                 :                :     }
                               3606                 :                :     else
                               3607                 :                :     {
                               3608                 :                :         /* Find a free slot to put it in */
 8363 tgl@sss.pgh.pa.us        3609         [ +  + ]:           4071 :         while (stat(path, &stat_buf) == 0)
                               3610                 :                :         {
 3951 heikki.linnakangas@i     3611         [ +  + ]:           1285 :             if ((*segno) >= max_segno)
                               3612                 :                :             {
                               3613                 :                :                 /* Failed to find a free slot within specified range */
 1634 noah@leadboat.com        3614                 :            154 :                 LWLockRelease(ControlFileLock);
 8918 tgl@sss.pgh.pa.us        3615                 :            154 :                 return false;
                               3616                 :                :             }
 4925 heikki.linnakangas@i     3617                 :           1131 :             (*segno)++;
 1504 rhaas@postgresql.org     3618                 :           1131 :             XLogFilePath(path, tli, *segno, wal_segment_size);
                               3619                 :                :         }
                               3620                 :                :     }
                               3621                 :                : 
 1262 michael@paquier.xyz      3622   [ +  -  -  + ]:           2825 :     Assert(access(path, F_OK) != 0 && errno == ENOENT);
                               3623         [ -  + ]:           2825 :     if (durable_rename(tmppath, path, LOG) != 0)
                               3624                 :                :     {
 1634 noah@leadboat.com        3625                 :UBC           0 :         LWLockRelease(ControlFileLock);
                               3626                 :                :         /* durable_rename already emitted log message */
 5940 heikki.linnakangas@i     3627                 :              0 :         return false;
                               3628                 :                :     }
                               3629                 :                : 
 1634 noah@leadboat.com        3630                 :CBC        2825 :     LWLockRelease(ControlFileLock);
                               3631                 :                : 
 8918 tgl@sss.pgh.pa.us        3632                 :           2825 :     return true;
                               3633                 :                : }
                               3634                 :                : 
                               3635                 :                : /*
                               3636                 :                :  * Open a pre-existing logfile segment for writing.
                               3637                 :                :  */
                               3638                 :                : int
 1504 rhaas@postgresql.org     3639                 :            174 : XLogFileOpen(XLogSegNo segno, TimeLineID tli)
                               3640                 :                : {
                               3641                 :                :     char        path[MAXPGPATH];
                               3642                 :                :     int         fd;
                               3643                 :                : 
                               3644                 :            174 :     XLogFilePath(path, tli, segno, wal_segment_size);
                               3645                 :                : 
 1021 tmunro@postgresql.or     3646                 :            174 :     fd = BasicOpenFile(path, O_RDWR | PG_BINARY | O_CLOEXEC |
  797 nathan@postgresql.or     3647                 :            174 :                        get_sync_bit(wal_sync_method));
 9579 vadim4o@yahoo.com        3648         [ -  + ]:            174 :     if (fd < 0)
 8186 tgl@sss.pgh.pa.us        3649         [ #  # ]:UBC           0 :         ereport(PANIC,
                               3650                 :                :                 (errcode_for_file_access(),
                               3651                 :                :                  errmsg("could not open file \"%s\": %m", path)));
                               3652                 :                : 
 7820 tgl@sss.pgh.pa.us        3653                 :CBC         174 :     return fd;
                               3654                 :                : }
                               3655                 :                : 
                               3656                 :                : /*
                               3657                 :                :  * Close the current logfile segment for writing.
                               3658                 :                :  */
                               3659                 :                : static void
 7126 bruce@momjian.us         3660                 :           6702 : XLogFileClose(void)
                               3661                 :                : {
                               3662         [ -  + ]:           6702 :     Assert(openLogFile >= 0);
                               3663                 :                : 
                               3664                 :                :     /*
                               3665                 :                :      * WAL segment files will not be re-read in normal operation, so we advise
                               3666                 :                :      * the OS to release any cached pages.  But do not do so if WAL archiving
                               3667                 :                :      * or streaming is active, because archiver and walsender process could
                               3668                 :                :      * use the cache to read the WAL segment.
                               3669                 :                :      */
                               3670                 :                : #if defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_DONTNEED)
  985 tmunro@postgresql.or     3671   [ +  +  +  - ]:           6702 :     if (!XLogIsNeeded() && (io_direct_flags & IO_DIRECT_WAL) == 0)
 6185 tgl@sss.pgh.pa.us        3672                 :           1671 :         (void) posix_fadvise(openLogFile, 0, 0, POSIX_FADV_DONTNEED);
                               3673                 :                : #endif
                               3674                 :                : 
 2357 peter@eisentraut.org     3675         [ -  + ]:           6702 :     if (close(openLogFile) != 0)
                               3676                 :                :     {
                               3677                 :                :         char        xlogfname[MAXFNAMELEN];
 2207 michael@paquier.xyz      3678                 :UBC           0 :         int         save_errno = errno;
                               3679                 :                : 
 1504 rhaas@postgresql.org     3680                 :              0 :         XLogFileName(xlogfname, openLogTLI, openLogSegNo, wal_segment_size);
 2207 michael@paquier.xyz      3681                 :              0 :         errno = save_errno;
 7126 bruce@momjian.us         3682         [ #  # ]:              0 :         ereport(PANIC,
                               3683                 :                :                 (errcode_for_file_access(),
                               3684                 :                :                  errmsg("could not close file \"%s\": %m", xlogfname)));
                               3685                 :                :     }
                               3686                 :                : 
 7126 bruce@momjian.us         3687                 :CBC        6702 :     openLogFile = -1;
 2124 tgl@sss.pgh.pa.us        3688                 :           6702 :     ReleaseExternalFD();
 7126 bruce@momjian.us         3689                 :           6702 : }
                               3690                 :                : 
                               3691                 :                : /*
                               3692                 :                :  * Preallocate log files beyond the specified log endpoint.
                               3693                 :                :  *
                               3694                 :                :  * XXX this is currently extremely conservative, since it forces only one
                               3695                 :                :  * future log segment to exist, and even that only if we are 75% done with
                               3696                 :                :  * the current one.  This is only appropriate for very low-WAL-volume systems.
                               3697                 :                :  * High-volume systems will be OK once they've built up a sufficient set of
                               3698                 :                :  * recycled log segments, but the startup transient is likely to include
                               3699                 :                :  * a lot of segment creations by foreground processes, which is not so good.
                               3700                 :                :  *
                               3701                 :                :  * XLogFileInitInternal() can ereport(ERROR).  All known causes indicate big
                               3702                 :                :  * trouble; for example, a full filesystem is one cause.  The checkpoint WAL
                               3703                 :                :  * and/or ControlFile updates already completed.  If a RequestCheckpoint()
                               3704                 :                :  * initiated the present checkpoint and an ERROR ends this function, the
                               3705                 :                :  * command that called RequestCheckpoint() fails.  That's not ideal, but it's
                               3706                 :                :  * not worth contorting more functions to use caller-specified elevel values.
                               3707                 :                :  * (With or without RequestCheckpoint(), an ERROR forestalls some inessential
                               3708                 :                :  * reporting and resource reclamation.)
                               3709                 :                :  */
                               3710                 :                : static void
 1504 rhaas@postgresql.org     3711                 :           1987 : PreallocXlogFiles(XLogRecPtr endptr, TimeLineID tli)
                               3712                 :                : {
                               3713                 :                :     XLogSegNo   _logSegNo;
                               3714                 :                :     int         lf;
                               3715                 :                :     bool        added;
                               3716                 :                :     char        path[MAXPGPATH];
                               3717                 :                :     uint64      offset;
                               3718                 :                : 
 1634 noah@leadboat.com        3719         [ +  + ]:           1987 :     if (!XLogCtl->InstallXLogFileSegmentActive)
                               3720                 :             11 :         return;                 /* unlocked check says no */
                               3721                 :                : 
 3012 andres@anarazel.de       3722                 :           1976 :     XLByteToPrevSeg(endptr, _logSegNo, wal_segment_size);
                               3723                 :           1976 :     offset = XLogSegmentOffset(endptr - 1, wal_segment_size);
                               3724         [ +  + ]:           1976 :     if (offset >= (uint32) (0.75 * wal_segment_size))
                               3725                 :                :     {
 4925 heikki.linnakangas@i     3726                 :            232 :         _logSegNo++;
 1504 rhaas@postgresql.org     3727                 :            232 :         lf = XLogFileInitInternal(_logSegNo, tli, &added, path);
 1634 noah@leadboat.com        3728         [ +  + ]:            232 :         if (lf >= 0)
                               3729                 :            162 :             close(lf);
                               3730         [ +  + ]:            232 :         if (added)
 6746 tgl@sss.pgh.pa.us        3731                 :             70 :             CheckpointStats.ckpt_segs_added++;
                               3732                 :                :     }
                               3733                 :                : }
                               3734                 :                : 
                               3735                 :                : /*
                               3736                 :                :  * Throws an error if the given log segment has already been removed or
                               3737                 :                :  * recycled. The caller should only pass a segment that it knows to have
                               3738                 :                :  * existed while the server has been running, as this function always
                               3739                 :                :  * succeeds if no WAL segments have been removed since startup.
                               3740                 :                :  * 'tli' is only used in the error message.
                               3741                 :                :  *
                               3742                 :                :  * Note: this function guarantees to keep errno unchanged on return.
                               3743                 :                :  * This supports callers that use this to possibly deliver a better
                               3744                 :                :  * error message about a missing file, while still being able to throw
                               3745                 :                :  * a normal file-access error afterwards, if this does return.
                               3746                 :                :  */
                               3747                 :                : void
 4732 heikki.linnakangas@i     3748                 :         122037 : CheckXLogRemoved(XLogSegNo segno, TimeLineID tli)
                               3749                 :                : {
 2936 tgl@sss.pgh.pa.us        3750                 :         122037 :     int         save_errno = errno;
                               3751                 :                :     XLogSegNo   lastRemovedSegNo;
                               3752                 :                : 
 4105 andres@anarazel.de       3753         [ +  + ]:         122037 :     SpinLockAcquire(&XLogCtl->info_lck);
                               3754                 :         122037 :     lastRemovedSegNo = XLogCtl->lastRemovedSegNo;
                               3755                 :         122037 :     SpinLockRelease(&XLogCtl->info_lck);
                               3756                 :                : 
 4732 heikki.linnakangas@i     3757         [ -  + ]:         122037 :     if (segno <= lastRemovedSegNo)
                               3758                 :                :     {
                               3759                 :                :         char        filename[MAXFNAMELEN];
                               3760                 :                : 
 3012 andres@anarazel.de       3761                 :UBC           0 :         XLogFileName(filename, tli, segno, wal_segment_size);
 2936 tgl@sss.pgh.pa.us        3762                 :              0 :         errno = save_errno;
 4732 heikki.linnakangas@i     3763         [ #  # ]:              0 :         ereport(ERROR,
                               3764                 :                :                 (errcode_for_file_access(),
                               3765                 :                :                  errmsg("requested WAL segment %s has already been removed",
                               3766                 :                :                         filename)));
                               3767                 :                :     }
 2936 tgl@sss.pgh.pa.us        3768                 :CBC      122037 :     errno = save_errno;
 5729 heikki.linnakangas@i     3769                 :         122037 : }
                               3770                 :                : 
                               3771                 :                : /*
                               3772                 :                :  * Return the last WAL segment removed, or 0 if no segment has been removed
                               3773                 :                :  * since startup.
                               3774                 :                :  *
                               3775                 :                :  * NB: the result can be out of date arbitrarily fast, the caller has to deal
                               3776                 :                :  * with that.
                               3777                 :                :  */
                               3778                 :                : XLogSegNo
 4308 rhaas@postgresql.org     3779                 :            963 : XLogGetLastRemovedSegno(void)
                               3780                 :                : {
                               3781                 :                :     XLogSegNo   lastRemovedSegNo;
                               3782                 :                : 
 4105 andres@anarazel.de       3783         [ -  + ]:            963 :     SpinLockAcquire(&XLogCtl->info_lck);
                               3784                 :            963 :     lastRemovedSegNo = XLogCtl->lastRemovedSegNo;
                               3785                 :            963 :     SpinLockRelease(&XLogCtl->info_lck);
                               3786                 :                : 
 4308 rhaas@postgresql.org     3787                 :            963 :     return lastRemovedSegNo;
                               3788                 :                : }
                               3789                 :                : 
                               3790                 :                : /*
                               3791                 :                :  * Return the oldest WAL segment on the given TLI that still exists in
                               3792                 :                :  * XLOGDIR, or 0 if none.
                               3793                 :                :  */
                               3794                 :                : XLogSegNo
  729                          3795                 :              6 : XLogGetOldestSegno(TimeLineID tli)
                               3796                 :                : {
                               3797                 :                :     DIR        *xldir;
                               3798                 :                :     struct dirent *xlde;
                               3799                 :              6 :     XLogSegNo   oldest_segno = 0;
                               3800                 :                : 
                               3801                 :              6 :     xldir = AllocateDir(XLOGDIR);
                               3802         [ +  + ]:             40 :     while ((xlde = ReadDir(xldir, XLOGDIR)) != NULL)
                               3803                 :                :     {
                               3804                 :                :         TimeLineID  file_tli;
                               3805                 :                :         XLogSegNo   file_segno;
                               3806                 :                : 
                               3807                 :                :         /* Ignore files that are not XLOG segments. */
                               3808         [ +  + ]:             34 :         if (!IsXLogFileName(xlde->d_name))
                               3809                 :             24 :             continue;
                               3810                 :                : 
                               3811                 :                :         /* Parse filename to get TLI and segno. */
                               3812                 :             10 :         XLogFromFileName(xlde->d_name, &file_tli, &file_segno,
                               3813                 :                :                          wal_segment_size);
                               3814                 :                : 
                               3815                 :                :         /* Ignore anything that's not from the TLI of interest. */
                               3816         [ -  + ]:             10 :         if (tli != file_tli)
  729 rhaas@postgresql.org     3817                 :UBC           0 :             continue;
                               3818                 :                : 
                               3819                 :                :         /* If it's the oldest so far, update oldest_segno. */
  729 rhaas@postgresql.org     3820   [ +  +  +  + ]:CBC          10 :         if (oldest_segno == 0 || file_segno < oldest_segno)
                               3821                 :              8 :             oldest_segno = file_segno;
                               3822                 :                :     }
                               3823                 :                : 
                               3824                 :              6 :     FreeDir(xldir);
                               3825                 :              6 :     return oldest_segno;
                               3826                 :                : }
                               3827                 :                : 
                               3828                 :                : /*
                               3829                 :                :  * Update the last removed segno pointer in shared memory, to reflect that the
                               3830                 :                :  * given XLOG file has been removed.
                               3831                 :                :  */
                               3832                 :                : static void
 5729 heikki.linnakangas@i     3833                 :           2553 : UpdateLastRemovedPtr(char *filename)
                               3834                 :                : {
                               3835                 :                :     uint32      tli;
                               3836                 :                :     XLogSegNo   segno;
                               3837                 :                : 
 3012 andres@anarazel.de       3838                 :           2553 :     XLogFromFileName(filename, &tli, &segno, wal_segment_size);
                               3839                 :                : 
 4105                          3840         [ +  + ]:           2553 :     SpinLockAcquire(&XLogCtl->info_lck);
                               3841         [ +  + ]:           2553 :     if (segno > XLogCtl->lastRemovedSegNo)
                               3842                 :           1136 :         XLogCtl->lastRemovedSegNo = segno;
                               3843                 :           2553 :     SpinLockRelease(&XLogCtl->info_lck);
 5729 heikki.linnakangas@i     3844                 :           2553 : }
                               3845                 :                : 
                               3846                 :                : /*
                               3847                 :                :  * Remove all temporary log files in pg_wal
                               3848                 :                :  *
                               3849                 :                :  * This is called at the beginning of recovery after a previous crash,
                               3850                 :                :  * at a point where no other processes write fresh WAL data.
                               3851                 :                :  */
                               3852                 :                : static void
 2715 michael@paquier.xyz      3853                 :            172 : RemoveTempXlogFiles(void)
                               3854                 :                : {
                               3855                 :                :     DIR        *xldir;
                               3856                 :                :     struct dirent *xlde;
                               3857                 :                : 
                               3858         [ +  + ]:            172 :     elog(DEBUG2, "removing all temporary WAL segments");
                               3859                 :                : 
                               3860                 :            172 :     xldir = AllocateDir(XLOGDIR);
                               3861         [ +  + ]:           1111 :     while ((xlde = ReadDir(xldir, XLOGDIR)) != NULL)
                               3862                 :                :     {
                               3863                 :                :         char        path[MAXPGPATH];
                               3864                 :                : 
                               3865         [ +  - ]:            939 :         if (strncmp(xlde->d_name, "xlogtemp.", 9) != 0)
                               3866                 :            939 :             continue;
                               3867                 :                : 
 2715 michael@paquier.xyz      3868                 :UBC           0 :         snprintf(path, MAXPGPATH, XLOGDIR "/%s", xlde->d_name);
                               3869                 :              0 :         unlink(path);
                               3870         [ #  # ]:              0 :         elog(DEBUG2, "removed temporary WAL segment \"%s\"", path);
                               3871                 :                :     }
 2715 michael@paquier.xyz      3872                 :CBC         172 :     FreeDir(xldir);
                               3873                 :            172 : }
                               3874                 :                : 
                               3875                 :                : /*
                               3876                 :                :  * Recycle or remove all log files older or equal to passed segno.
                               3877                 :                :  *
                               3878                 :                :  * endptr is current (or recent) end of xlog, and lastredoptr is the
                               3879                 :                :  * redo pointer of the last checkpoint. These are used to determine
                               3880                 :                :  * whether we want to recycle rather than delete no-longer-wanted log files.
                               3881                 :                :  *
                               3882                 :                :  * insertTLI is the current timeline for XLOG insertion. Any recycled
                               3883                 :                :  * segments should be reused for this timeline.
                               3884                 :                :  */
                               3885                 :                : static void
 1504 rhaas@postgresql.org     3886                 :           1732 : RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr lastredoptr, XLogRecPtr endptr,
                               3887                 :                :                    TimeLineID insertTLI)
                               3888                 :                : {
                               3889                 :                :     DIR        *xldir;
                               3890                 :                :     struct dirent *xlde;
                               3891                 :                :     char        lastoff[MAXFNAMELEN];
                               3892                 :                :     XLogSegNo   endlogSegNo;
                               3893                 :                :     XLogSegNo   recycleSegNo;
                               3894                 :                : 
                               3895                 :                :     /* Initialize info about where to try to recycle to */
 1798 michael@paquier.xyz      3896                 :           1732 :     XLByteToSeg(endptr, endlogSegNo, wal_segment_size);
                               3897                 :           1732 :     recycleSegNo = XLOGfileslop(lastredoptr);
                               3898                 :                : 
                               3899                 :                :     /*
                               3900                 :                :      * Construct a filename of the last segment to be kept. The timeline ID
                               3901                 :                :      * doesn't matter, we ignore that in the comparison. (During recovery,
                               3902                 :                :      * InsertTimeLineID isn't set, so we can't use that.)
                               3903                 :                :      */
 3012 andres@anarazel.de       3904                 :           1732 :     XLogFileName(lastoff, 0, segno, wal_segment_size);
                               3905                 :                : 
 5589 simon@2ndQuadrant.co     3906         [ +  + ]:           1732 :     elog(DEBUG2, "attempting to remove WAL segments older than log file %s",
                               3907                 :                :          lastoff);
                               3908                 :                : 
 2936 tgl@sss.pgh.pa.us        3909                 :           1732 :     xldir = AllocateDir(XLOGDIR);
                               3910                 :                : 
 7472                          3911         [ +  + ]:          48667 :     while ((xlde = ReadDir(xldir, XLOGDIR)) != NULL)
                               3912                 :                :     {
                               3913                 :                :         /* Ignore files that are not XLOG segments */
 3877 heikki.linnakangas@i     3914         [ +  + ]:          46935 :         if (!IsXLogFileName(xlde->d_name) &&
                               3915         [ +  + ]:           7369 :             !IsPartialXLogFileName(xlde->d_name))
 3902                          3916                 :           7365 :             continue;
                               3917                 :                : 
                               3918                 :                :         /*
                               3919                 :                :          * We ignore the timeline part of the XLOG segment identifiers in
                               3920                 :                :          * deciding whether a segment is still needed.  This ensures that we
                               3921                 :                :          * won't prematurely remove a segment from a parent timeline. We could
                               3922                 :                :          * probably be a little more proactive about removing segments of
                               3923                 :                :          * non-parent timelines, but that would be a whole lot more
                               3924                 :                :          * complicated.
                               3925                 :                :          *
                               3926                 :                :          * We use the alphanumeric sorting property of the filenames to decide
                               3927                 :                :          * which ones are earlier than the lastoff segment.
                               3928                 :                :          */
                               3929         [ +  + ]:          39570 :         if (strcmp(xlde->d_name + 8, lastoff + 8) <= 0)
                               3930                 :                :         {
 4689                          3931         [ +  + ]:          33109 :             if (XLogArchiveCheckDone(xlde->d_name))
                               3932                 :                :             {
                               3933                 :                :                 /* Update the last removed location in shared memory first */
 5729                          3934                 :           2553 :                 UpdateLastRemovedPtr(xlde->d_name);
                               3935                 :                : 
 1203 michael@paquier.xyz      3936                 :           2553 :                 RemoveXlogFile(xlde, recycleSegNo, &endlogSegNo, insertTLI);
                               3937                 :                :             }
                               3938                 :                :         }
                               3939                 :                :     }
                               3940                 :                : 
 3902 heikki.linnakangas@i     3941                 :           1732 :     FreeDir(xldir);
                               3942                 :           1732 : }
                               3943                 :                : 
                               3944                 :                : /*
                               3945                 :                :  * Recycle or remove WAL files that are not part of the given timeline's
                               3946                 :                :  * history.
                               3947                 :                :  *
                               3948                 :                :  * This is called during recovery, whenever we switch to follow a new
                               3949                 :                :  * timeline, and at the end of recovery when we create a new timeline. We
                               3950                 :                :  * wouldn't otherwise care about extra WAL files lying in pg_wal, but they
                               3951                 :                :  * might be leftover pre-allocated or recycled WAL segments on the old timeline
                               3952                 :                :  * that we haven't used yet, and contain garbage. If we just leave them in
                               3953                 :                :  * pg_wal, they will eventually be archived, and we can't let that happen.
                               3954                 :                :  * Files that belong to our timeline history are valid, because we have
                               3955                 :                :  * successfully replayed them, but from others we can't be sure.
                               3956                 :                :  *
                               3957                 :                :  * 'switchpoint' is the current point in WAL where we switch to new timeline,
                               3958                 :                :  * and 'newTLI' is the new timeline we switch to.
                               3959                 :                :  */
                               3960                 :                : void
                               3961                 :             59 : RemoveNonParentXlogFiles(XLogRecPtr switchpoint, TimeLineID newTLI)
                               3962                 :                : {
                               3963                 :                :     DIR        *xldir;
                               3964                 :                :     struct dirent *xlde;
                               3965                 :                :     char        switchseg[MAXFNAMELEN];
                               3966                 :                :     XLogSegNo   endLogSegNo;
                               3967                 :                :     XLogSegNo   switchLogSegNo;
                               3968                 :                :     XLogSegNo   recycleSegNo;
                               3969                 :                : 
                               3970                 :                :     /*
                               3971                 :                :      * Initialize info about where to begin the work.  This will recycle,
                               3972                 :                :      * somewhat arbitrarily, 10 future segments.
                               3973                 :                :      */
 1798 michael@paquier.xyz      3974                 :             59 :     XLByteToPrevSeg(switchpoint, switchLogSegNo, wal_segment_size);
                               3975                 :             59 :     XLByteToSeg(switchpoint, endLogSegNo, wal_segment_size);
                               3976                 :             59 :     recycleSegNo = endLogSegNo + 10;
                               3977                 :                : 
                               3978                 :                :     /*
                               3979                 :                :      * Construct a filename of the last segment to be kept.
                               3980                 :                :      */
                               3981                 :             59 :     XLogFileName(switchseg, newTLI, switchLogSegNo, wal_segment_size);
                               3982                 :                : 
 3902 heikki.linnakangas@i     3983         [ +  + ]:             59 :     elog(DEBUG2, "attempting to remove WAL segments newer than log file %s",
                               3984                 :                :          switchseg);
                               3985                 :                : 
 2936 tgl@sss.pgh.pa.us        3986                 :             59 :     xldir = AllocateDir(XLOGDIR);
                               3987                 :                : 
 3902 heikki.linnakangas@i     3988         [ +  + ]:            562 :     while ((xlde = ReadDir(xldir, XLOGDIR)) != NULL)
                               3989                 :                :     {
                               3990                 :                :         /* Ignore files that are not XLOG segments */
 3877                          3991         [ +  + ]:            503 :         if (!IsXLogFileName(xlde->d_name))
 3902                          3992                 :            313 :             continue;
                               3993                 :                : 
                               3994                 :                :         /*
                               3995                 :                :          * Remove files that are on a timeline older than the new one we're
                               3996                 :                :          * switching to, but with a segment number >= the first segment on the
                               3997                 :                :          * new timeline.
                               3998                 :                :          */
                               3999         [ +  + ]:            190 :         if (strncmp(xlde->d_name, switchseg, 8) < 0 &&
                               4000         [ +  + ]:            123 :             strcmp(xlde->d_name + 8, switchseg + 8) > 0)
                               4001                 :                :         {
                               4002                 :                :             /*
                               4003                 :                :              * If the file has already been marked as .ready, however, don't
                               4004                 :                :              * remove it yet. It should be OK to remove it - files that are
                               4005                 :                :              * not part of our timeline history are not required for recovery
                               4006                 :                :              * - but seems safer to let them be archived and removed later.
                               4007                 :                :              */
                               4008         [ +  - ]:             14 :             if (!XLogArchiveIsReady(xlde->d_name))
 1203 michael@paquier.xyz      4009                 :             14 :                 RemoveXlogFile(xlde, recycleSegNo, &endLogSegNo, newTLI);
                               4010                 :                :         }
                               4011                 :                :     }
                               4012                 :                : 
 3902 heikki.linnakangas@i     4013                 :             59 :     FreeDir(xldir);
                               4014                 :             59 : }
                               4015                 :                : 
                               4016                 :                : /*
                               4017                 :                :  * Recycle or remove a log file that's no longer needed.
                               4018                 :                :  *
                               4019                 :                :  * segment_de is the dirent structure of the segment to recycle or remove.
                               4020                 :                :  * recycleSegNo is the segment number to recycle up to.  endlogSegNo is
                               4021                 :                :  * the segment number of the current (or recent) end of WAL.
                               4022                 :                :  *
                               4023                 :                :  * endlogSegNo gets incremented if the segment is recycled so as it is not
                               4024                 :                :  * checked again with future callers of this function.
                               4025                 :                :  *
                               4026                 :                :  * insertTLI is the current timeline for XLOG insertion. Any recycled segments
                               4027                 :                :  * should be used for this timeline.
                               4028                 :                :  */
                               4029                 :                : static void
 1203 michael@paquier.xyz      4030                 :           2567 : RemoveXlogFile(const struct dirent *segment_de,
                               4031                 :                :                XLogSegNo recycleSegNo, XLogSegNo *endlogSegNo,
                               4032                 :                :                TimeLineID insertTLI)
                               4033                 :                : {
                               4034                 :                :     char        path[MAXPGPATH];
                               4035                 :                : #ifdef WIN32
                               4036                 :                :     char        newpath[MAXPGPATH];
                               4037                 :                : #endif
                               4038                 :           2567 :     const char *segname = segment_de->d_name;
                               4039                 :                : 
 3902 heikki.linnakangas@i     4040                 :           2567 :     snprintf(path, MAXPGPATH, XLOGDIR "/%s", segname);
                               4041                 :                : 
                               4042                 :                :     /*
                               4043                 :                :      * Before deleting the file, see if it can be recycled as a future log
                               4044                 :                :      * segment. Only recycle normal files, because we don't want to recycle
                               4045                 :                :      * symbolic links pointing to a separate archive directory.
                               4046                 :                :      */
 2452 tmunro@postgresql.or     4047         [ +  - ]:           2567 :     if (wal_recycle &&
 1798 michael@paquier.xyz      4048         [ +  + ]:           2567 :         *endlogSegNo <= recycleSegNo &&
 1634 noah@leadboat.com        4049   [ +  +  +  - ]:           3457 :         XLogCtl->InstallXLogFileSegmentActive && /* callee rechecks this */
 1203 michael@paquier.xyz      4050         [ +  + ]:           3130 :         get_dirent_type(path, segment_de, false, DEBUG2) == PGFILETYPE_REG &&
 1798                          4051                 :           1565 :         InstallXLogFileSegment(endlogSegNo, path,
                               4052                 :                :                                true, recycleSegNo, insertTLI))
                               4053                 :                :     {
 3902 heikki.linnakangas@i     4054         [ +  + ]:           1411 :         ereport(DEBUG2,
                               4055                 :                :                 (errmsg_internal("recycled write-ahead log file \"%s\"",
                               4056                 :                :                                  segname)));
                               4057                 :           1411 :         CheckpointStats.ckpt_segs_recycled++;
                               4058                 :                :         /* Needn't recheck that slot on future iterations */
 1798 michael@paquier.xyz      4059                 :           1411 :         (*endlogSegNo)++;
                               4060                 :                :     }
                               4061                 :                :     else
                               4062                 :                :     {
                               4063                 :                :         /* No need for any more future segments, or recycling failed ... */
                               4064                 :                :         int         rc;
                               4065                 :                : 
 3902 heikki.linnakangas@i     4066         [ +  + ]:           1156 :         ereport(DEBUG2,
                               4067                 :                :                 (errmsg_internal("removing write-ahead log file \"%s\"",
                               4068                 :                :                                  segname)));
                               4069                 :                : 
                               4070                 :                : #ifdef WIN32
                               4071                 :                : 
                               4072                 :                :         /*
                               4073                 :                :          * On Windows, if another process (e.g another backend) holds the file
                               4074                 :                :          * open in FILE_SHARE_DELETE mode, unlink will succeed, but the file
                               4075                 :                :          * will still show up in directory listing until the last handle is
                               4076                 :                :          * closed. To avoid confusing the lingering deleted file for a live
                               4077                 :                :          * WAL file that needs to be archived, rename it before deleting it.
                               4078                 :                :          *
                               4079                 :                :          * If another process holds the file open without FILE_SHARE_DELETE
                               4080                 :                :          * flag, rename will fail. We'll try again at the next checkpoint.
                               4081                 :                :          */
                               4082                 :                :         snprintf(newpath, MAXPGPATH, "%s.deleted", path);
                               4083                 :                :         if (rename(path, newpath) != 0)
                               4084                 :                :         {
                               4085                 :                :             ereport(LOG,
                               4086                 :                :                     (errcode_for_file_access(),
                               4087                 :                :                      errmsg("could not rename file \"%s\": %m",
                               4088                 :                :                             path)));
                               4089                 :                :             return;
                               4090                 :                :         }
                               4091                 :                :         rc = durable_unlink(newpath, LOG);
                               4092                 :                : #else
 3188 teodor@sigaev.ru         4093                 :           1156 :         rc = durable_unlink(path, LOG);
                               4094                 :                : #endif
 3902 heikki.linnakangas@i     4095         [ -  + ]:           1156 :         if (rc != 0)
                               4096                 :                :         {
                               4097                 :                :             /* Message already logged by durable_unlink() */
 3902 heikki.linnakangas@i     4098                 :UBC           0 :             return;
                               4099                 :                :         }
 3902 heikki.linnakangas@i     4100                 :CBC        1156 :         CheckpointStats.ckpt_segs_removed++;
                               4101                 :                :     }
                               4102                 :                : 
                               4103                 :           2567 :     XLogArchiveCleanup(segname);
                               4104                 :                : }
                               4105                 :                : 
                               4106                 :                : /*
                               4107                 :                :  * Verify whether pg_wal, pg_wal/archive_status, and pg_wal/summaries exist.
                               4108                 :                :  * If the latter do not exist, recreate them.
                               4109                 :                :  *
                               4110                 :                :  * It is not the goal of this function to verify the contents of these
                               4111                 :                :  * directories, but to help in cases where someone has performed a cluster
                               4112                 :                :  * copy for PITR purposes but omitted pg_wal from the copy.
                               4113                 :                :  *
                               4114                 :                :  * We could also recreate pg_wal if it doesn't exist, but a deliberate
                               4115                 :                :  * policy decision was made not to.  It is fairly common for pg_wal to be
                               4116                 :                :  * a symlink, and if that was the DBA's intent then automatically making a
                               4117                 :                :  * plain directory would result in degraded performance with no notice.
                               4118                 :                :  */
                               4119                 :                : static void
 6248 tgl@sss.pgh.pa.us        4120                 :            927 : ValidateXLOGDirectoryStructure(void)
                               4121                 :                : {
                               4122                 :                :     char        path[MAXPGPATH];
                               4123                 :                :     struct stat stat_buf;
                               4124                 :                : 
                               4125                 :                :     /* Check for pg_wal; if it doesn't exist, error out */
                               4126         [ +  - ]:            927 :     if (stat(XLOGDIR, &stat_buf) != 0 ||
                               4127         [ -  + ]:            927 :         !S_ISDIR(stat_buf.st_mode))
 6034 bruce@momjian.us         4128         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4129                 :                :                 (errcode_for_file_access(),
                               4130                 :                :                  errmsg("required WAL directory \"%s\" does not exist",
                               4131                 :                :                         XLOGDIR)));
                               4132                 :                : 
                               4133                 :                :     /* Check for archive_status */
 6248 tgl@sss.pgh.pa.us        4134                 :CBC         927 :     snprintf(path, MAXPGPATH, XLOGDIR "/archive_status");
                               4135         [ +  + ]:            927 :     if (stat(path, &stat_buf) == 0)
                               4136                 :                :     {
                               4137                 :                :         /* Check for weird cases where it exists but isn't a directory */
                               4138         [ -  + ]:            926 :         if (!S_ISDIR(stat_buf.st_mode))
 6034 bruce@momjian.us         4139         [ #  # ]:UBC           0 :             ereport(FATAL,
                               4140                 :                :                     (errcode_for_file_access(),
                               4141                 :                :                      errmsg("required WAL directory \"%s\" does not exist",
                               4142                 :                :                             path)));
                               4143                 :                :     }
                               4144                 :                :     else
                               4145                 :                :     {
 6248 tgl@sss.pgh.pa.us        4146         [ +  - ]:CBC           1 :         ereport(LOG,
                               4147                 :                :                 (errmsg("creating missing WAL directory \"%s\"", path)));
 2812 sfrost@snowman.net       4148         [ -  + ]:              1 :         if (MakePGDirectory(path) < 0)
 6034 bruce@momjian.us         4149         [ #  # ]:UBC           0 :             ereport(FATAL,
                               4150                 :                :                     (errcode_for_file_access(),
                               4151                 :                :                      errmsg("could not create missing directory \"%s\": %m",
                               4152                 :                :                             path)));
                               4153                 :                :     }
                               4154                 :                : 
                               4155                 :                :     /* Check for summaries */
  729 rhaas@postgresql.org     4156                 :CBC         927 :     snprintf(path, MAXPGPATH, XLOGDIR "/summaries");
                               4157         [ +  + ]:            927 :     if (stat(path, &stat_buf) == 0)
                               4158                 :                :     {
                               4159                 :                :         /* Check for weird cases where it exists but isn't a directory */
                               4160         [ -  + ]:            926 :         if (!S_ISDIR(stat_buf.st_mode))
  729 rhaas@postgresql.org     4161         [ #  # ]:UBC           0 :             ereport(FATAL,
                               4162                 :                :                     (errmsg("required WAL directory \"%s\" does not exist",
                               4163                 :                :                             path)));
                               4164                 :                :     }
                               4165                 :                :     else
                               4166                 :                :     {
  729 rhaas@postgresql.org     4167         [ +  - ]:CBC           1 :         ereport(LOG,
                               4168                 :                :                 (errmsg("creating missing WAL directory \"%s\"", path)));
                               4169         [ -  + ]:              1 :         if (MakePGDirectory(path) < 0)
  729 rhaas@postgresql.org     4170         [ #  # ]:UBC           0 :             ereport(FATAL,
                               4171                 :                :                     (errmsg("could not create missing directory \"%s\": %m",
                               4172                 :                :                             path)));
                               4173                 :                :     }
 6248 tgl@sss.pgh.pa.us        4174                 :CBC         927 : }
                               4175                 :                : 
                               4176                 :                : /*
                               4177                 :                :  * Remove previous backup history files.  This also retries creation of
                               4178                 :                :  * .ready files for any backup history files for which XLogArchiveNotify
                               4179                 :                :  * failed earlier.
                               4180                 :                :  */
                               4181                 :                : static void
 7119                          4182                 :            155 : CleanupBackupHistory(void)
                               4183                 :                : {
                               4184                 :                :     DIR        *xldir;
                               4185                 :                :     struct dirent *xlde;
                               4186                 :                :     char        path[MAXPGPATH + sizeof(XLOGDIR)];
                               4187                 :                : 
 7472                          4188                 :            155 :     xldir = AllocateDir(XLOGDIR);
                               4189                 :                : 
                               4190         [ +  + ]:           1554 :     while ((xlde = ReadDir(xldir, XLOGDIR)) != NULL)
                               4191                 :                :     {
 3877 heikki.linnakangas@i     4192         [ +  + ]:           1244 :         if (IsBackupHistoryFileName(xlde->d_name))
                               4193                 :                :         {
 6310 tgl@sss.pgh.pa.us        4194         [ +  + ]:            163 :             if (XLogArchiveCheckDone(xlde->d_name))
                               4195                 :                :             {
 3142 peter_e@gmx.net          4196         [ +  + ]:            131 :                 elog(DEBUG2, "removing WAL backup history file \"%s\"",
                               4197                 :                :                      xlde->d_name);
 3173                          4198                 :            131 :                 snprintf(path, sizeof(path), XLOGDIR "/%s", xlde->d_name);
 7491 bruce@momjian.us         4199                 :            131 :                 unlink(path);
                               4200                 :            131 :                 XLogArchiveCleanup(xlde->d_name);
                               4201                 :                :             }
                               4202                 :                :         }
                               4203                 :                :     }
                               4204                 :                : 
                               4205                 :            155 :     FreeDir(xldir);
                               4206                 :            155 : }
                               4207                 :                : 
                               4208                 :                : /*
                               4209                 :                :  * I/O routines for pg_control
                               4210                 :                :  *
                               4211                 :                :  * *ControlFile is a buffer in shared memory that holds an image of the
                               4212                 :                :  * contents of pg_control.  WriteControlFile() initializes pg_control
                               4213                 :                :  * given a preloaded buffer, ReadControlFile() loads the buffer from
                               4214                 :                :  * the pg_control file (during postmaster or standalone-backend startup),
                               4215                 :                :  * and UpdateControlFile() rewrites pg_control after we modify xlog state.
                               4216                 :                :  * InitControlFile() fills the buffer with initial values.
                               4217                 :                :  *
                               4218                 :                :  * For simplicity, WriteControlFile() initializes the fields of pg_control
                               4219                 :                :  * that are related to checking backend/database compatibility, and
                               4220                 :                :  * ReadControlFile() verifies they are correct.  We could split out the
                               4221                 :                :  * I/O and compatibility-check functions, but there seems no need currently.
                               4222                 :                :  */
                               4223                 :                : 
                               4224                 :                : static void
  513 peter@eisentraut.org     4225                 :             51 : InitControlFile(uint64 sysidentifier, uint32 data_checksum_version)
                               4226                 :                : {
                               4227                 :                :     char        mock_auth_nonce[MOCK_AUTH_NONCE_LEN];
                               4228                 :                : 
                               4229                 :                :     /*
                               4230                 :                :      * Generate a random nonce. This is used for authentication requests that
                               4231                 :                :      * will fail because the user does not exist. The nonce is used to create
                               4232                 :                :      * a genuine-looking password challenge for the non-existent user, in lieu
                               4233                 :                :      * of an actual stored password.
                               4234                 :                :      */
 1401 heikki.linnakangas@i     4235         [ -  + ]:             51 :     if (!pg_strong_random(mock_auth_nonce, MOCK_AUTH_NONCE_LEN))
 1401 heikki.linnakangas@i     4236         [ #  # ]:UBC           0 :         ereport(PANIC,
                               4237                 :                :                 (errcode(ERRCODE_INTERNAL_ERROR),
                               4238                 :                :                  errmsg("could not generate secret authorization token")));
                               4239                 :                : 
 1401 heikki.linnakangas@i     4240                 :CBC          51 :     memset(ControlFile, 0, sizeof(ControlFileData));
                               4241                 :                :     /* Initialize pg_control status fields */
                               4242                 :             51 :     ControlFile->system_identifier = sysidentifier;
                               4243                 :             51 :     memcpy(ControlFile->mock_authentication_nonce, mock_auth_nonce, MOCK_AUTH_NONCE_LEN);
                               4244                 :             51 :     ControlFile->state = DB_SHUTDOWNED;
                               4245                 :             51 :     ControlFile->unloggedLSN = FirstNormalUnloggedLSN;
                               4246                 :                : 
                               4247                 :                :     /* Set important parameter values for use when replaying WAL */
 2131 peter@eisentraut.org     4248                 :             51 :     ControlFile->MaxConnections = MaxConnections;
                               4249                 :             51 :     ControlFile->max_worker_processes = max_worker_processes;
                               4250                 :             51 :     ControlFile->max_wal_senders = max_wal_senders;
                               4251                 :             51 :     ControlFile->max_prepared_xacts = max_prepared_xacts;
                               4252                 :             51 :     ControlFile->max_locks_per_xact = max_locks_per_xact;
                               4253                 :             51 :     ControlFile->wal_level = wal_level;
                               4254                 :             51 :     ControlFile->wal_log_hints = wal_log_hints;
                               4255                 :             51 :     ControlFile->track_commit_timestamp = track_commit_timestamp;
  513                          4256                 :             51 :     ControlFile->data_checksum_version = data_checksum_version;
 2131                          4257                 :             51 : }
                               4258                 :                : 
                               4259                 :                : static void
 9154 tgl@sss.pgh.pa.us        4260                 :             51 : WriteControlFile(void)
                               4261                 :                : {
                               4262                 :                :     int         fd;
                               4263                 :                :     char        buffer[PG_CONTROL_FILE_SIZE];   /* need not be aligned */
                               4264                 :                : 
                               4265                 :                :     /*
                               4266                 :                :      * Initialize version and compatibility-check fields
                               4267                 :                :      */
 9046                          4268                 :             51 :     ControlFile->pg_control_version = PG_CONTROL_VERSION;
                               4269                 :             51 :     ControlFile->catalog_version_no = CATALOG_VERSION_NO;
                               4270                 :                : 
 7381                          4271                 :             51 :     ControlFile->maxAlign = MAXIMUM_ALIGNOF;
                               4272                 :             51 :     ControlFile->floatFormat = FLOATFORMAT_VALUE;
                               4273                 :                : 
 9154                          4274                 :             51 :     ControlFile->blcksz = BLCKSZ;
                               4275                 :             51 :     ControlFile->relseg_size = RELSEG_SIZE;
   38 heikki.linnakangas@i     4276                 :GNC          51 :     ControlFile->slru_pages_per_segment = SLRU_PAGES_PER_SEGMENT;
 7199 tgl@sss.pgh.pa.us        4277                 :CBC          51 :     ControlFile->xlog_blcksz = XLOG_BLCKSZ;
 3012 andres@anarazel.de       4278                 :             51 :     ControlFile->xlog_seg_size = wal_segment_size;
                               4279                 :                : 
 8642 lockhart@fourpalms.o     4280                 :             51 :     ControlFile->nameDataLen = NAMEDATALEN;
 7569 tgl@sss.pgh.pa.us        4281                 :             51 :     ControlFile->indexMaxKeys = INDEX_MAX_KEYS;
                               4282                 :                : 
 6834                          4283                 :             51 :     ControlFile->toast_max_chunk_size = TOAST_MAX_CHUNK_SIZE;
 4214                          4284                 :             51 :     ControlFile->loblksize = LOBLKSIZE;
                               4285                 :                : 
  127 tgl@sss.pgh.pa.us        4286                 :GNC          51 :     ControlFile->float8ByVal = true; /* vestigial */
                               4287                 :                : 
                               4288                 :                :     /*
                               4289                 :                :      * Initialize the default 'char' signedness.
                               4290                 :                :      *
                               4291                 :                :      * The signedness of the char type is implementation-defined. For instance
                               4292                 :                :      * on x86 architecture CPUs, the char data type is typically treated as
                               4293                 :                :      * signed by default, whereas on aarch architecture CPUs, it is typically
                               4294                 :                :      * treated as unsigned by default. In v17 or earlier, we accidentally let
                               4295                 :                :      * C implementation signedness affect persistent data. This led to
                               4296                 :                :      * inconsistent results when comparing char data across different
                               4297                 :                :      * platforms.
                               4298                 :                :      *
                               4299                 :                :      * This flag can be used as a hint to ensure consistent behavior for
                               4300                 :                :      * pre-v18 data files that store data sorted by the 'char' type on disk,
                               4301                 :                :      * especially in cross-platform replication scenarios.
                               4302                 :                :      *
                               4303                 :                :      * Newly created database clusters unconditionally set the default char
                               4304                 :                :      * signedness to true. pg_upgrade changes this flag for clusters that were
                               4305                 :                :      * initialized on signedness=false platforms. As a result,
                               4306                 :                :      * signedness=false setting will become rare over time. If we had known
                               4307                 :                :      * about this problem during the last development cycle that forced initdb
                               4308                 :                :      * (v8.3), we would have made all clusters signed or all clusters
                               4309                 :                :      * unsigned. Making pg_upgrade the only source of signedness=false will
                               4310                 :                :      * cause the population of database clusters to converge toward that
                               4311                 :                :      * retrospective ideal.
                               4312                 :                :      */
  300 msawada@postgresql.o     4313                 :CBC          51 :     ControlFile->default_char_signedness = true;
                               4314                 :                : 
                               4315                 :                :     /* Contents are protected with a CRC */
 4062 heikki.linnakangas@i     4316                 :             51 :     INIT_CRC32C(ControlFile->crc);
                               4317                 :             51 :     COMP_CRC32C(ControlFile->crc,
                               4318                 :                :                 ControlFile,
                               4319                 :                :                 offsetof(ControlFileData, crc));
                               4320                 :             51 :     FIN_CRC32C(ControlFile->crc);
                               4321                 :                : 
                               4322                 :                :     /*
                               4323                 :                :      * We write out PG_CONTROL_FILE_SIZE bytes into pg_control, zero-padding
                               4324                 :                :      * the excess over sizeof(ControlFileData).  This reduces the odds of
                               4325                 :                :      * premature-EOF errors when reading pg_control.  We'll still fail when we
                               4326                 :                :      * check the contents of the file, but hopefully with a more specific
                               4327                 :                :      * error than "couldn't read pg_control".
                               4328                 :                :      */
 3074 tgl@sss.pgh.pa.us        4329                 :             51 :     memset(buffer, 0, PG_CONTROL_FILE_SIZE);
 9154                          4330                 :             51 :     memcpy(buffer, ControlFile, sizeof(ControlFileData));
                               4331                 :                : 
 7472                          4332                 :             51 :     fd = BasicOpenFile(XLOG_CONTROL_FILE,
                               4333                 :                :                        O_RDWR | O_CREAT | O_EXCL | PG_BINARY);
 9154                          4334         [ -  + ]:             51 :     if (fd < 0)
 8186 tgl@sss.pgh.pa.us        4335         [ #  # ]:UBC           0 :         ereport(PANIC,
                               4336                 :                :                 (errcode_for_file_access(),
                               4337                 :                :                  errmsg("could not create file \"%s\": %m",
                               4338                 :                :                         XLOG_CONTROL_FILE)));
                               4339                 :                : 
 8961 tgl@sss.pgh.pa.us        4340                 :CBC          51 :     errno = 0;
 3197 rhaas@postgresql.org     4341                 :             51 :     pgstat_report_wait_start(WAIT_EVENT_CONTROL_FILE_WRITE);
 3074 tgl@sss.pgh.pa.us        4342         [ -  + ]:             51 :     if (write(fd, buffer, PG_CONTROL_FILE_SIZE) != PG_CONTROL_FILE_SIZE)
                               4343                 :                :     {
                               4344                 :                :         /* if write didn't set errno, assume problem is no disk space */
 8961 tgl@sss.pgh.pa.us        4345         [ #  # ]:UBC           0 :         if (errno == 0)
                               4346                 :              0 :             errno = ENOSPC;
 8186                          4347         [ #  # ]:              0 :         ereport(PANIC,
                               4348                 :                :                 (errcode_for_file_access(),
                               4349                 :                :                  errmsg("could not write to file \"%s\": %m",
                               4350                 :                :                         XLOG_CONTROL_FILE)));
                               4351                 :                :     }
 3197 rhaas@postgresql.org     4352                 :CBC          51 :     pgstat_report_wait_end();
                               4353                 :                : 
                               4354                 :             51 :     pgstat_report_wait_start(WAIT_EVENT_CONTROL_FILE_SYNC);
 9141 tgl@sss.pgh.pa.us        4355         [ -  + ]:             51 :     if (pg_fsync(fd) != 0)
 8186 tgl@sss.pgh.pa.us        4356         [ #  # ]:UBC           0 :         ereport(PANIC,
                               4357                 :                :                 (errcode_for_file_access(),
                               4358                 :                :                  errmsg("could not fsync file \"%s\": %m",
                               4359                 :                :                         XLOG_CONTROL_FILE)));
 3197 rhaas@postgresql.org     4360                 :CBC          51 :     pgstat_report_wait_end();
                               4361                 :                : 
 2357 peter@eisentraut.org     4362         [ -  + ]:             51 :     if (close(fd) != 0)
 7997 tgl@sss.pgh.pa.us        4363         [ #  # ]:UBC           0 :         ereport(PANIC,
                               4364                 :                :                 (errcode_for_file_access(),
                               4365                 :                :                  errmsg("could not close file \"%s\": %m",
                               4366                 :                :                         XLOG_CONTROL_FILE)));
 9154 tgl@sss.pgh.pa.us        4367                 :CBC          51 : }
                               4368                 :                : 
                               4369                 :                : static void
                               4370                 :            977 : ReadControlFile(void)
                               4371                 :                : {
                               4372                 :                :     pg_crc32c   crc;
                               4373                 :                :     int         fd;
                               4374                 :                :     char        wal_segsz_str[20];
                               4375                 :                :     int         r;
                               4376                 :                : 
                               4377                 :                :     /*
                               4378                 :                :      * Read data...
                               4379                 :                :      */
 7472                          4380                 :            977 :     fd = BasicOpenFile(XLOG_CONTROL_FILE,
                               4381                 :                :                        O_RDWR | PG_BINARY);
 9154                          4382         [ -  + ]:            977 :     if (fd < 0)
 8186 tgl@sss.pgh.pa.us        4383         [ #  # ]:UBC           0 :         ereport(PANIC,
                               4384                 :                :                 (errcode_for_file_access(),
                               4385                 :                :                  errmsg("could not open file \"%s\": %m",
                               4386                 :                :                         XLOG_CONTROL_FILE)));
                               4387                 :                : 
 3197 rhaas@postgresql.org     4388                 :CBC         977 :     pgstat_report_wait_start(WAIT_EVENT_CONTROL_FILE_READ);
 2771 magnus@hagander.net      4389                 :            977 :     r = read(fd, ControlFile, sizeof(ControlFileData));
                               4390         [ -  + ]:            977 :     if (r != sizeof(ControlFileData))
                               4391                 :                :     {
 2771 magnus@hagander.net      4392         [ #  # ]:UBC           0 :         if (r < 0)
                               4393         [ #  # ]:              0 :             ereport(PANIC,
                               4394                 :                :                     (errcode_for_file_access(),
                               4395                 :                :                      errmsg("could not read file \"%s\": %m",
                               4396                 :                :                             XLOG_CONTROL_FILE)));
                               4397                 :                :         else
                               4398         [ #  # ]:              0 :             ereport(PANIC,
                               4399                 :                :                     (errcode(ERRCODE_DATA_CORRUPTED),
                               4400                 :                :                      errmsg("could not read file \"%s\": read %d of %zu",
                               4401                 :                :                             XLOG_CONTROL_FILE, r, sizeof(ControlFileData))));
                               4402                 :                :     }
 3197 rhaas@postgresql.org     4403                 :CBC         977 :     pgstat_report_wait_end();
                               4404                 :                : 
 9154 tgl@sss.pgh.pa.us        4405                 :            977 :     close(fd);
                               4406                 :                : 
                               4407                 :                :     /*
                               4408                 :                :      * Check for expected pg_control format version.  If this is wrong, the
                               4409                 :                :      * CRC check will likely fail because we'll be checking the wrong number
                               4410                 :                :      * of bytes.  Complaining about wrong version will probably be more
                               4411                 :                :      * enlightening than complaining about wrong CRC.
                               4412                 :                :      */
                               4413                 :                : 
 6541 peter_e@gmx.net          4414   [ -  +  -  -  :            977 :     if (ControlFile->pg_control_version != PG_CONTROL_VERSION && ControlFile->pg_control_version % 65536 == 0 && ControlFile->pg_control_version / 65536 != 0)
                                              -  - ]
 6541 peter_e@gmx.net          4415         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4416                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4417                 :                :                  errmsg("database files are incompatible with server"),
                               4418                 :                :                  errdetail("The database cluster was initialized with PG_CONTROL_VERSION %d (0x%08x),"
                               4419                 :                :                            " but the server was compiled with PG_CONTROL_VERSION %d (0x%08x).",
                               4420                 :                :                            ControlFile->pg_control_version, ControlFile->pg_control_version,
                               4421                 :                :                            PG_CONTROL_VERSION, PG_CONTROL_VERSION),
                               4422                 :                :                  errhint("This could be a problem of mismatched byte ordering.  It looks like you need to initdb.")));
                               4423                 :                : 
 9046 tgl@sss.pgh.pa.us        4424         [ -  + ]:CBC         977 :     if (ControlFile->pg_control_version != PG_CONTROL_VERSION)
 8186 tgl@sss.pgh.pa.us        4425         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4426                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4427                 :                :                  errmsg("database files are incompatible with server"),
                               4428                 :                :                  errdetail("The database cluster was initialized with PG_CONTROL_VERSION %d,"
                               4429                 :                :                            " but the server was compiled with PG_CONTROL_VERSION %d.",
                               4430                 :                :                            ControlFile->pg_control_version, PG_CONTROL_VERSION),
                               4431                 :                :                  errhint("It looks like you need to initdb.")));
                               4432                 :                : 
                               4433                 :                :     /* Now check the CRC. */
 4062 heikki.linnakangas@i     4434                 :CBC         977 :     INIT_CRC32C(crc);
                               4435                 :            977 :     COMP_CRC32C(crc,
                               4436                 :                :                 ControlFile,
                               4437                 :                :                 offsetof(ControlFileData, crc));
                               4438                 :            977 :     FIN_CRC32C(crc);
                               4439                 :                : 
                               4440         [ -  + ]:            977 :     if (!EQ_CRC32C(crc, ControlFile->crc))
 8186 tgl@sss.pgh.pa.us        4441         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4442                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4443                 :                :                  errmsg("incorrect checksum in control file")));
                               4444                 :                : 
                               4445                 :                :     /*
                               4446                 :                :      * Do compatibility checking immediately.  If the database isn't
                               4447                 :                :      * compatible with the backend executable, we want to abort before we can
                               4448                 :                :      * possibly do any damage.
                               4449                 :                :      */
 9046 tgl@sss.pgh.pa.us        4450         [ -  + ]:CBC         977 :     if (ControlFile->catalog_version_no != CATALOG_VERSION_NO)
 8186 tgl@sss.pgh.pa.us        4451         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4452                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4453                 :                :                  errmsg("database files are incompatible with server"),
                               4454                 :                :         /* translator: %s is a variable name and %d is its value */
                               4455                 :                :                  errdetail("The database cluster was initialized with %s %d,"
                               4456                 :                :                            " but the server was compiled with %s %d.",
                               4457                 :                :                            "CATALOG_VERSION_NO", ControlFile->catalog_version_no,
                               4458                 :                :                            "CATALOG_VERSION_NO", CATALOG_VERSION_NO),
                               4459                 :                :                  errhint("It looks like you need to initdb.")));
 7381 tgl@sss.pgh.pa.us        4460         [ -  + ]:CBC         977 :     if (ControlFile->maxAlign != MAXIMUM_ALIGNOF)
 7381 tgl@sss.pgh.pa.us        4461         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4462                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4463                 :                :                  errmsg("database files are incompatible with server"),
                               4464                 :                :         /* translator: %s is a variable name and %d is its value */
                               4465                 :                :                  errdetail("The database cluster was initialized with %s %d,"
                               4466                 :                :                            " but the server was compiled with %s %d.",
                               4467                 :                :                            "MAXALIGN", ControlFile->maxAlign,
                               4468                 :                :                            "MAXALIGN", MAXIMUM_ALIGNOF),
                               4469                 :                :                  errhint("It looks like you need to initdb.")));
 7381 tgl@sss.pgh.pa.us        4470         [ -  + ]:CBC         977 :     if (ControlFile->floatFormat != FLOATFORMAT_VALUE)
 7381 tgl@sss.pgh.pa.us        4471         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4472                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4473                 :                :                  errmsg("database files are incompatible with server"),
                               4474                 :                :                  errdetail("The database cluster appears to use a different floating-point number format than the server executable."),
                               4475                 :                :                  errhint("It looks like you need to initdb.")));
 9154 tgl@sss.pgh.pa.us        4476         [ -  + ]:CBC         977 :     if (ControlFile->blcksz != BLCKSZ)
 8186 tgl@sss.pgh.pa.us        4477         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4478                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4479                 :                :                  errmsg("database files are incompatible with server"),
                               4480                 :                :         /* translator: %s is a variable name and %d is its value */
                               4481                 :                :                  errdetail("The database cluster was initialized with %s %d,"
                               4482                 :                :                            " but the server was compiled with %s %d.",
                               4483                 :                :                            "BLCKSZ", ControlFile->blcksz,
                               4484                 :                :                            "BLCKSZ", BLCKSZ),
                               4485                 :                :                  errhint("It looks like you need to recompile or initdb.")));
 9154 tgl@sss.pgh.pa.us        4486         [ -  + ]:CBC         977 :     if (ControlFile->relseg_size != RELSEG_SIZE)
 8186 tgl@sss.pgh.pa.us        4487         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4488                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4489                 :                :                  errmsg("database files are incompatible with server"),
                               4490                 :                :         /* translator: %s is a variable name and %d is its value */
                               4491                 :                :                  errdetail("The database cluster was initialized with %s %d,"
                               4492                 :                :                            " but the server was compiled with %s %d.",
                               4493                 :                :                            "RELSEG_SIZE", ControlFile->relseg_size,
                               4494                 :                :                            "RELSEG_SIZE", RELSEG_SIZE),
                               4495                 :                :                  errhint("It looks like you need to recompile or initdb.")));
   38 heikki.linnakangas@i     4496         [ -  + ]:GNC         977 :     if (ControlFile->slru_pages_per_segment != SLRU_PAGES_PER_SEGMENT)
   38 heikki.linnakangas@i     4497         [ #  # ]:UNC           0 :         ereport(FATAL,
                               4498                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4499                 :                :                  errmsg("database files are incompatible with server"),
                               4500                 :                :         /* translator: %s is a variable name and %d is its value */
                               4501                 :                :                  errdetail("The database cluster was initialized with %s %d,"
                               4502                 :                :                            " but the server was compiled with %s %d.",
                               4503                 :                :                            "SLRU_PAGES_PER_SEGMENT", ControlFile->slru_pages_per_segment,
                               4504                 :                :                            "SLRU_PAGES_PER_SEGMENT", SLRU_PAGES_PER_SEGMENT),
                               4505                 :                :                  errhint("It looks like you need to recompile or initdb.")));
 7199 tgl@sss.pgh.pa.us        4506         [ -  + ]:CBC         977 :     if (ControlFile->xlog_blcksz != XLOG_BLCKSZ)
 7199 tgl@sss.pgh.pa.us        4507         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4508                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4509                 :                :                  errmsg("database files are incompatible with server"),
                               4510                 :                :         /* translator: %s is a variable name and %d is its value */
                               4511                 :                :                  errdetail("The database cluster was initialized with %s %d,"
                               4512                 :                :                            " but the server was compiled with %s %d.",
                               4513                 :                :                            "XLOG_BLCKSZ", ControlFile->xlog_blcksz,
                               4514                 :                :                            "XLOG_BLCKSZ", XLOG_BLCKSZ),
                               4515                 :                :                  errhint("It looks like you need to recompile or initdb.")));
 8642 lockhart@fourpalms.o     4516         [ -  + ]:CBC         977 :     if (ControlFile->nameDataLen != NAMEDATALEN)
 8186 tgl@sss.pgh.pa.us        4517         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4518                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4519                 :                :                  errmsg("database files are incompatible with server"),
                               4520                 :                :         /* translator: %s is a variable name and %d is its value */
                               4521                 :                :                  errdetail("The database cluster was initialized with %s %d,"
                               4522                 :                :                            " but the server was compiled with %s %d.",
                               4523                 :                :                            "NAMEDATALEN", ControlFile->nameDataLen,
                               4524                 :                :                            "NAMEDATALEN", NAMEDATALEN),
                               4525                 :                :                  errhint("It looks like you need to recompile or initdb.")));
 7569 tgl@sss.pgh.pa.us        4526         [ -  + ]:CBC         977 :     if (ControlFile->indexMaxKeys != INDEX_MAX_KEYS)
 8186 tgl@sss.pgh.pa.us        4527         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4528                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4529                 :                :                  errmsg("database files are incompatible with server"),
                               4530                 :                :         /* translator: %s is a variable name and %d is its value */
                               4531                 :                :                  errdetail("The database cluster was initialized with %s %d,"
                               4532                 :                :                            " but the server was compiled with %s %d.",
                               4533                 :                :                            "INDEX_MAX_KEYS", ControlFile->indexMaxKeys,
                               4534                 :                :                            "INDEX_MAX_KEYS", INDEX_MAX_KEYS),
                               4535                 :                :                  errhint("It looks like you need to recompile or initdb.")));
 6834 tgl@sss.pgh.pa.us        4536         [ -  + ]:CBC         977 :     if (ControlFile->toast_max_chunk_size != TOAST_MAX_CHUNK_SIZE)
 6834 tgl@sss.pgh.pa.us        4537         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4538                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4539                 :                :                  errmsg("database files are incompatible with server"),
                               4540                 :                :         /* translator: %s is a variable name and %d is its value */
                               4541                 :                :                  errdetail("The database cluster was initialized with %s %d,"
                               4542                 :                :                            " but the server was compiled with %s %d.",
                               4543                 :                :                            "TOAST_MAX_CHUNK_SIZE", ControlFile->toast_max_chunk_size,
                               4544                 :                :                            "TOAST_MAX_CHUNK_SIZE", (int) TOAST_MAX_CHUNK_SIZE),
                               4545                 :                :                  errhint("It looks like you need to recompile or initdb.")));
 4214 tgl@sss.pgh.pa.us        4546         [ -  + ]:CBC         977 :     if (ControlFile->loblksize != LOBLKSIZE)
 4214 tgl@sss.pgh.pa.us        4547         [ #  # ]:UBC           0 :         ereport(FATAL,
                               4548                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               4549                 :                :                  errmsg("database files are incompatible with server"),
                               4550                 :                :         /* translator: %s is a variable name and %d is its value */
                               4551                 :                :                  errdetail("The database cluster was initialized with %s %d,"
                               4552                 :                :                            " but the server was compiled with %s %d.",
                               4553                 :                :                            "LOBLKSIZE", ControlFile->loblksize,
                               4554                 :                :                            "LOBLKSIZE", (int) LOBLKSIZE),
                               4555                 :                :                  errhint("It looks like you need to recompile or initdb.")));
                               4556                 :                : 
  127 tgl@sss.pgh.pa.us        4557         [ -  + ]:GNC         977 :     Assert(ControlFile->float8ByVal);    /* vestigial, not worth an error msg */
                               4558                 :                : 
 3012 andres@anarazel.de       4559                 :CBC         977 :     wal_segment_size = ControlFile->xlog_seg_size;
                               4560                 :                : 
                               4561   [ +  -  +  -  :            977 :     if (!IsValidWalSegSize(wal_segment_size))
                                        +  -  -  + ]
 3012 andres@anarazel.de       4562         [ #  # ]:UBC           0 :         ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                               4563                 :                :                         errmsg_plural("invalid WAL segment size in control file (%d byte)",
                               4564                 :                :                                       "invalid WAL segment size in control file (%d bytes)",
                               4565                 :                :                                       wal_segment_size,
                               4566                 :                :                                       wal_segment_size),
                               4567                 :                :                         errdetail("The WAL segment size must be a power of two between 1 MB and 1 GB.")));
                               4568                 :                : 
 3012 andres@anarazel.de       4569                 :CBC         977 :     snprintf(wal_segsz_str, sizeof(wal_segsz_str), "%d", wal_segment_size);
                               4570                 :            977 :     SetConfigOption("wal_segment_size", wal_segsz_str, PGC_INTERNAL,
                               4571                 :                :                     PGC_S_DYNAMIC_DEFAULT);
                               4572                 :                : 
                               4573                 :                :     /* check and update variables dependent on wal_segment_size */
                               4574         [ -  + ]:            977 :     if (ConvertToXSegs(min_wal_size_mb, wal_segment_size) < 2)
 3012 andres@anarazel.de       4575         [ #  # ]:UBC           0 :         ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                               4576                 :                :         /* translator: both %s are GUC names */
                               4577                 :                :                         errmsg("\"%s\" must be at least twice \"%s\"",
                               4578                 :                :                                "min_wal_size", "wal_segment_size")));
                               4579                 :                : 
 3012 andres@anarazel.de       4580         [ -  + ]:CBC         977 :     if (ConvertToXSegs(max_wal_size_mb, wal_segment_size) < 2)
 3012 andres@anarazel.de       4581         [ #  # ]:UBC           0 :         ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                               4582                 :                :         /* translator: both %s are GUC names */
                               4583                 :                :                         errmsg("\"%s\" must be at least twice \"%s\"",
                               4584                 :                :                                "max_wal_size", "wal_segment_size")));
                               4585                 :                : 
 3012 andres@anarazel.de       4586                 :CBC         977 :     UsableBytesInSegment =
                               4587                 :            977 :         (wal_segment_size / XLOG_BLCKSZ * UsableBytesInPage) -
                               4588                 :                :         (SizeOfXLogLongPHD - SizeOfXLogShortPHD);
                               4589                 :                : 
                               4590                 :            977 :     CalculateCheckpointSegments();
                               4591                 :                : 
                               4592                 :                :     /* Make the initdb settings visible as GUC variables, too */
 2810 magnus@hagander.net      4593         [ +  + ]:            977 :     SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
                               4594                 :                :                     PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);
 9154 tgl@sss.pgh.pa.us        4595                 :            977 : }
                               4596                 :                : 
                               4597                 :                : /*
                               4598                 :                :  * Utility wrapper to update the control file.  Note that the control
                               4599                 :                :  * file gets flushed.
                               4600                 :                :  */
                               4601                 :                : static void
                               4602                 :           9322 : UpdateControlFile(void)
                               4603                 :                : {
 2453 peter@eisentraut.org     4604                 :           9322 :     update_controlfile(DataDir, ControlFile, true);
 9570 vadim4o@yahoo.com        4605                 :           9322 : }
                               4606                 :                : 
                               4607                 :                : /*
                               4608                 :                :  * Returns the unique system identifier from control file.
                               4609                 :                :  */
                               4610                 :                : uint64
 5816 heikki.linnakangas@i     4611                 :           1372 : GetSystemIdentifier(void)
                               4612                 :                : {
                               4613         [ -  + ]:           1372 :     Assert(ControlFile != NULL);
                               4614                 :           1372 :     return ControlFile->system_identifier;
                               4615                 :                : }
                               4616                 :                : 
                               4617                 :                : /*
                               4618                 :                :  * Returns the random nonce from control file.
                               4619                 :                :  */
                               4620                 :                : char *
 3208                          4621                 :              1 : GetMockAuthenticationNonce(void)
                               4622                 :                : {
                               4623         [ -  + ]:              1 :     Assert(ControlFile != NULL);
                               4624                 :              1 :     return ControlFile->mock_authentication_nonce;
                               4625                 :                : }
                               4626                 :                : 
                               4627                 :                : /*
                               4628                 :                :  * Are checksums enabled for data pages?
                               4629                 :                :  */
                               4630                 :                : bool
 2810 magnus@hagander.net      4631                 :        9288954 : DataChecksumsEnabled(void)
                               4632                 :                : {
 4654 simon@2ndQuadrant.co     4633         [ -  + ]:        9288954 :     Assert(ControlFile != NULL);
 4615                          4634                 :        9288954 :     return (ControlFile->data_checksum_version > 0);
                               4635                 :                : }
                               4636                 :                : 
                               4637                 :                : /*
                               4638                 :                :  * Return true if the cluster was initialized on a platform where the
                               4639                 :                :  * default signedness of char is "signed". This function exists for code
                               4640                 :                :  * that deals with pre-v18 data files that store data sorted by the 'char'
                               4641                 :                :  * type on disk (e.g., GIN and GiST indexes). See the comments in
                               4642                 :                :  * WriteControlFile() for details.
                               4643                 :                :  */
                               4644                 :                : bool
  300 msawada@postgresql.o     4645                 :              3 : GetDefaultCharSignedness(void)
                               4646                 :                : {
                               4647                 :              3 :     return ControlFile->default_char_signedness;
                               4648                 :                : }
                               4649                 :                : 
                               4650                 :                : /*
                               4651                 :                :  * Returns a fake LSN for unlogged relations.
                               4652                 :                :  *
                               4653                 :                :  * Each call generates an LSN that is greater than any previous value
                               4654                 :                :  * returned. The current counter value is saved and restored across clean
                               4655                 :                :  * shutdowns, but like unlogged relations, does not survive a crash. This can
                               4656                 :                :  * be used in lieu of real LSN values returned by XLogInsert, if you need an
                               4657                 :                :  * LSN-like increasing sequence of numbers without writing any WAL.
                               4658                 :                :  */
                               4659                 :                : XLogRecPtr
 4693 heikki.linnakangas@i     4660                 :             33 : GetFakeLSNForUnloggedRel(void)
                               4661                 :                : {
  658 nathan@postgresql.or     4662                 :             33 :     return pg_atomic_fetch_add_u64(&XLogCtl->unloggedLSN, 1);
                               4663                 :                : }
                               4664                 :                : 
                               4665                 :                : /*
                               4666                 :                :  * Auto-tune the number of XLOG buffers.
                               4667                 :                :  *
                               4668                 :                :  * The preferred setting for wal_buffers is about 3% of shared_buffers, with
                               4669                 :                :  * a maximum of one XLOG segment (there is little reason to think that more
                               4670                 :                :  * is helpful, at least so long as we force an fsync when switching log files)
                               4671                 :                :  * and a minimum of 8 blocks (which was the default value prior to PostgreSQL
                               4672                 :                :  * 9.1, when auto-tuning was added).
                               4673                 :                :  *
                               4674                 :                :  * This should not be called until NBuffers has received its final value.
                               4675                 :                :  */
                               4676                 :                : static int
 5369 tgl@sss.pgh.pa.us        4677                 :           1070 : XLOGChooseNumBuffers(void)
                               4678                 :                : {
                               4679                 :                :     int         xbuffers;
                               4680                 :                : 
                               4681                 :           1070 :     xbuffers = NBuffers / 32;
 3012 andres@anarazel.de       4682         [ +  + ]:           1070 :     if (xbuffers > (wal_segment_size / XLOG_BLCKSZ))
                               4683                 :             24 :         xbuffers = (wal_segment_size / XLOG_BLCKSZ);
 5369 tgl@sss.pgh.pa.us        4684         [ +  + ]:           1070 :     if (xbuffers < 8)
                               4685                 :            414 :         xbuffers = 8;
                               4686                 :           1070 :     return xbuffers;
                               4687                 :                : }
                               4688                 :                : 
                               4689                 :                : /*
                               4690                 :                :  * GUC check_hook for wal_buffers
                               4691                 :                :  */
                               4692                 :                : bool
                               4693                 :           2179 : check_wal_buffers(int *newval, void **extra, GucSource source)
                               4694                 :                : {
                               4695                 :                :     /*
                               4696                 :                :      * -1 indicates a request for auto-tune.
                               4697                 :                :      */
                               4698         [ +  + ]:           2179 :     if (*newval == -1)
                               4699                 :                :     {
                               4700                 :                :         /*
                               4701                 :                :          * If we haven't yet changed the boot_val default of -1, just let it
                               4702                 :                :          * be.  We'll fix it when XLOGShmemSize is called.
                               4703                 :                :          */
                               4704         [ +  - ]:           1109 :         if (XLOGbuffers == -1)
                               4705                 :           1109 :             return true;
                               4706                 :                : 
                               4707                 :                :         /* Otherwise, substitute the auto-tune value */
 5369 tgl@sss.pgh.pa.us        4708                 :UBC           0 :         *newval = XLOGChooseNumBuffers();
                               4709                 :                :     }
                               4710                 :                : 
                               4711                 :                :     /*
                               4712                 :                :      * We clamp manually-set values to at least 4 blocks.  Prior to PostgreSQL
                               4713                 :                :      * 9.1, a minimum of 4 was enforced by guc.c, but since that is no longer
                               4714                 :                :      * the case, we just silently treat such values as a request for the
                               4715                 :                :      * minimum.  (We could throw an error instead, but that doesn't seem very
                               4716                 :                :      * helpful.)
                               4717                 :                :      */
 5369 tgl@sss.pgh.pa.us        4718         [ -  + ]:CBC        1070 :     if (*newval < 4)
 5369 tgl@sss.pgh.pa.us        4719                 :UBC           0 :         *newval = 4;
                               4720                 :                : 
 5369 tgl@sss.pgh.pa.us        4721                 :CBC        1070 :     return true;
                               4722                 :                : }
                               4723                 :                : 
                               4724                 :                : /*
                               4725                 :                :  * GUC check_hook for wal_consistency_checking
                               4726                 :                :  */
                               4727                 :                : bool
 1192                          4728                 :           2061 : check_wal_consistency_checking(char **newval, void **extra, GucSource source)
                               4729                 :                : {
                               4730                 :                :     char       *rawstring;
                               4731                 :                :     List       *elemlist;
                               4732                 :                :     ListCell   *l;
                               4733                 :                :     bool        newwalconsistency[RM_MAX_ID + 1];
                               4734                 :                : 
                               4735                 :                :     /* Initialize the array */
                               4736   [ +  -  +  -  :          68013 :     MemSet(newwalconsistency, 0, (RM_MAX_ID + 1) * sizeof(bool));
                                     +  -  +  -  +  
                                                 + ]
                               4737                 :                : 
                               4738                 :                :     /* Need a modifiable copy of string */
                               4739                 :           2061 :     rawstring = pstrdup(*newval);
                               4740                 :                : 
                               4741                 :                :     /* Parse string into list of identifiers */
                               4742         [ -  + ]:           2061 :     if (!SplitIdentifierString(rawstring, ',', &elemlist))
                               4743                 :                :     {
                               4744                 :                :         /* syntax error in list */
 1192 tgl@sss.pgh.pa.us        4745                 :UBC           0 :         GUC_check_errdetail("List syntax is invalid.");
                               4746                 :              0 :         pfree(rawstring);
                               4747                 :              0 :         list_free(elemlist);
                               4748                 :              0 :         return false;
                               4749                 :                :     }
                               4750                 :                : 
 1192 tgl@sss.pgh.pa.us        4751   [ +  +  +  +  :CBC        2540 :     foreach(l, elemlist)
                                              +  + ]
                               4752                 :                :     {
                               4753                 :            479 :         char       *tok = (char *) lfirst(l);
                               4754                 :                :         int         rmid;
                               4755                 :                : 
                               4756                 :                :         /* Check for 'all'. */
                               4757         [ +  + ]:            479 :         if (pg_strcasecmp(tok, "all") == 0)
                               4758                 :                :         {
                               4759         [ +  + ]:         122589 :             for (rmid = 0; rmid <= RM_MAX_ID; rmid++)
                               4760   [ +  +  +  + ]:         122112 :                 if (RmgrIdExists(rmid) && GetRmgr(rmid).rm_mask != NULL)
                               4761                 :           4770 :                     newwalconsistency[rmid] = true;
                               4762                 :                :         }
                               4763                 :                :         else
                               4764                 :                :         {
                               4765                 :                :             /* Check if the token matches any known resource manager. */
                               4766                 :              2 :             bool        found = false;
                               4767                 :                : 
                               4768         [ +  - ]:             36 :             for (rmid = 0; rmid <= RM_MAX_ID; rmid++)
                               4769                 :                :             {
                               4770   [ +  -  +  +  :             54 :                 if (RmgrIdExists(rmid) && GetRmgr(rmid).rm_mask != NULL &&
                                              +  + ]
                               4771                 :             18 :                     pg_strcasecmp(tok, GetRmgr(rmid).rm_name) == 0)
                               4772                 :                :                 {
                               4773                 :              2 :                     newwalconsistency[rmid] = true;
                               4774                 :              2 :                     found = true;
                               4775                 :              2 :                     break;
                               4776                 :                :                 }
                               4777                 :                :             }
                               4778         [ -  + ]:              2 :             if (!found)
                               4779                 :                :             {
                               4780                 :                :                 /*
                               4781                 :                :                  * During startup, it might be a not-yet-loaded custom
                               4782                 :                :                  * resource manager.  Defer checking until
                               4783                 :                :                  * InitializeWalConsistencyChecking().
                               4784                 :                :                  */
 1192 tgl@sss.pgh.pa.us        4785         [ #  # ]:UBC           0 :                 if (!process_shared_preload_libraries_done)
                               4786                 :                :                 {
                               4787                 :              0 :                     check_wal_consistency_checking_deferred = true;
                               4788                 :                :                 }
                               4789                 :                :                 else
                               4790                 :                :                 {
                               4791                 :              0 :                     GUC_check_errdetail("Unrecognized key word: \"%s\".", tok);
                               4792                 :              0 :                     pfree(rawstring);
                               4793                 :              0 :                     list_free(elemlist);
                               4794                 :              0 :                     return false;
                               4795                 :                :                 }
                               4796                 :                :             }
                               4797                 :                :         }
                               4798                 :                :     }
                               4799                 :                : 
 1192 tgl@sss.pgh.pa.us        4800                 :CBC        2061 :     pfree(rawstring);
                               4801                 :           2061 :     list_free(elemlist);
                               4802                 :                : 
                               4803                 :                :     /* assign new value */
  266 dgustafsson@postgres     4804                 :           2061 :     *extra = guc_malloc(LOG, (RM_MAX_ID + 1) * sizeof(bool));
                               4805         [ -  + ]:           2061 :     if (!*extra)
  266 dgustafsson@postgres     4806                 :UBC           0 :         return false;
 1192 tgl@sss.pgh.pa.us        4807                 :CBC        2061 :     memcpy(*extra, newwalconsistency, (RM_MAX_ID + 1) * sizeof(bool));
                               4808                 :           2061 :     return true;
                               4809                 :                : }
                               4810                 :                : 
                               4811                 :                : /*
                               4812                 :                :  * GUC assign_hook for wal_consistency_checking
                               4813                 :                :  */
                               4814                 :                : void
                               4815                 :           2060 : assign_wal_consistency_checking(const char *newval, void *extra)
                               4816                 :                : {
                               4817                 :                :     /*
                               4818                 :                :      * If some checks were deferred, it's possible that the checks will fail
                               4819                 :                :      * later during InitializeWalConsistencyChecking(). But in that case, the
                               4820                 :                :      * postmaster will exit anyway, so it's safe to proceed with the
                               4821                 :                :      * assignment.
                               4822                 :                :      *
                               4823                 :                :      * Any built-in resource managers specified are assigned immediately,
                               4824                 :                :      * which affects WAL created before shared_preload_libraries are
                               4825                 :                :      * processed. Any custom resource managers specified won't be assigned
                               4826                 :                :      * until after shared_preload_libraries are processed, but that's OK
                               4827                 :                :      * because WAL for a custom resource manager can't be written before the
                               4828                 :                :      * module is loaded anyway.
                               4829                 :                :      */
                               4830                 :           2060 :     wal_consistency_checking = extra;
                               4831                 :           2060 : }
                               4832                 :                : 
                               4833                 :                : /*
                               4834                 :                :  * InitializeWalConsistencyChecking: run after loading custom resource managers
                               4835                 :                :  *
                               4836                 :                :  * If any unknown resource managers were specified in the
                               4837                 :                :  * wal_consistency_checking GUC, processing was deferred.  Now that
                               4838                 :                :  * shared_preload_libraries have been loaded, process wal_consistency_checking
                               4839                 :                :  * again.
                               4840                 :                :  */
                               4841                 :                : void
                               4842                 :            917 : InitializeWalConsistencyChecking(void)
                               4843                 :                : {
                               4844         [ -  + ]:            917 :     Assert(process_shared_preload_libraries_done);
                               4845                 :                : 
                               4846         [ -  + ]:            917 :     if (check_wal_consistency_checking_deferred)
                               4847                 :                :     {
                               4848                 :                :         struct config_generic *guc;
                               4849                 :                : 
 1192 tgl@sss.pgh.pa.us        4850                 :UBC           0 :         guc = find_option("wal_consistency_checking", false, false, ERROR);
                               4851                 :                : 
                               4852                 :              0 :         check_wal_consistency_checking_deferred = false;
                               4853                 :                : 
                               4854                 :              0 :         set_config_option_ext("wal_consistency_checking",
                               4855                 :                :                               wal_consistency_checking_string,
                               4856                 :                :                               guc->scontext, guc->source, guc->srole,
                               4857                 :                :                               GUC_ACTION_SET, true, ERROR, false);
                               4858                 :                : 
                               4859                 :                :         /* checking should not be deferred again */
                               4860         [ #  # ]:              0 :         Assert(!check_wal_consistency_checking_deferred);
                               4861                 :                :     }
 1192 tgl@sss.pgh.pa.us        4862                 :CBC         917 : }
                               4863                 :                : 
                               4864                 :                : /*
                               4865                 :                :  * GUC show_hook for archive_command
                               4866                 :                :  */
                               4867                 :                : const char *
                               4868                 :           1693 : show_archive_command(void)
                               4869                 :                : {
                               4870   [ +  +  -  +  :           1693 :     if (XLogArchivingActive())
                                              +  + ]
                               4871                 :              2 :         return XLogArchiveCommand;
                               4872                 :                :     else
                               4873                 :           1691 :         return "(disabled)";
                               4874                 :                : }
                               4875                 :                : 
                               4876                 :                : /*
                               4877                 :                :  * GUC show_hook for in_hot_standby
                               4878                 :                :  */
                               4879                 :                : const char *
                               4880                 :          14294 : show_in_hot_standby(void)
                               4881                 :                : {
                               4882                 :                :     /*
                               4883                 :                :      * We display the actual state based on shared memory, so that this GUC
                               4884                 :                :      * reports up-to-date state if examined intra-query.  The underlying
                               4885                 :                :      * variable (in_hot_standby_guc) changes only when we transmit a new value
                               4886                 :                :      * to the client.
                               4887                 :                :      */
                               4888         [ +  + ]:          14294 :     return RecoveryInProgress() ? "on" : "off";
                               4889                 :                : }
                               4890                 :                : 
                               4891                 :                : /*
                               4892                 :                :  * Read the control file, set respective GUCs.
                               4893                 :                :  *
                               4894                 :                :  * This is to be called during startup, including a crash recovery cycle,
                               4895                 :                :  * unless in bootstrap mode, where no control file yet exists.  As there's no
                               4896                 :                :  * usable shared memory yet (its sizing can depend on the contents of the
                               4897                 :                :  * control file!), first store the contents in local memory. XLOGShmemInit()
                               4898                 :                :  * will then copy it to shared memory later.
                               4899                 :                :  *
                               4900                 :                :  * reset just controls whether previous contents are to be expected (in the
                               4901                 :                :  * reset case, there's a dangling pointer into old shared memory), or not.
                               4902                 :                :  */
                               4903                 :                : void
 3014 andres@anarazel.de       4904                 :            926 : LocalProcessControlFile(bool reset)
                               4905                 :                : {
                               4906   [ +  +  -  + ]:            926 :     Assert(reset || ControlFile == NULL);
    8 michael@paquier.xyz      4907                 :GNC         926 :     ControlFile = palloc_object(ControlFileData);
 3018 andres@anarazel.de       4908                 :CBC         926 :     ReadControlFile();
                               4909                 :            926 : }
                               4910                 :                : 
                               4911                 :                : /*
                               4912                 :                :  * Get the wal_level from the control file. For a standby, this value should be
                               4913                 :                :  * considered as its active wal_level, because it may be different from what
                               4914                 :                :  * was originally configured on standby.
                               4915                 :                :  */
                               4916                 :                : WalLevel
  985                          4917                 :              1 : GetActiveWalLevelOnStandby(void)
                               4918                 :                : {
                               4919                 :              1 :     return ControlFile->wal_level;
                               4920                 :                : }
                               4921                 :                : 
                               4922                 :                : /*
                               4923                 :                :  * Initialization of shared memory for XLOG
                               4924                 :                :  */
                               4925                 :                : Size
 9158 peter_e@gmx.net          4926                 :           3061 : XLOGShmemSize(void)
                               4927                 :                : {
                               4928                 :                :     Size        size;
                               4929                 :                : 
                               4930                 :                :     /*
                               4931                 :                :      * If the value of wal_buffers is -1, use the preferred auto-tune value.
                               4932                 :                :      * This isn't an amazingly clean place to do this, but we must wait till
                               4933                 :                :      * NBuffers has received its final value, and must do it before using the
                               4934                 :                :      * value of XLOGbuffers to do anything important.
                               4935                 :                :      *
                               4936                 :                :      * We prefer to report this value's source as PGC_S_DYNAMIC_DEFAULT.
                               4937                 :                :      * However, if the DBA explicitly set wal_buffers = -1 in the config file,
                               4938                 :                :      * then PGC_S_DYNAMIC_DEFAULT will fail to override that and we must force
                               4939                 :                :      * the matter with PGC_S_OVERRIDE.
                               4940                 :                :      */
 5369 tgl@sss.pgh.pa.us        4941         [ +  + ]:           3061 :     if (XLOGbuffers == -1)
                               4942                 :                :     {
                               4943                 :                :         char        buf[32];
                               4944                 :                : 
                               4945                 :           1070 :         snprintf(buf, sizeof(buf), "%d", XLOGChooseNumBuffers());
 1289                          4946                 :           1070 :         SetConfigOption("wal_buffers", buf, PGC_POSTMASTER,
                               4947                 :                :                         PGC_S_DYNAMIC_DEFAULT);
                               4948         [ -  + ]:           1070 :         if (XLOGbuffers == -1)  /* failed to apply it? */
 1289 tgl@sss.pgh.pa.us        4949                 :UBC           0 :             SetConfigOption("wal_buffers", buf, PGC_POSTMASTER,
                               4950                 :                :                             PGC_S_OVERRIDE);
                               4951                 :                :     }
 5444 tgl@sss.pgh.pa.us        4952         [ -  + ]:CBC        3061 :     Assert(XLOGbuffers > 0);
                               4953                 :                : 
                               4954                 :                :     /* XLogCtl */
 7425                          4955                 :           3061 :     size = sizeof(XLogCtlData);
                               4956                 :                : 
                               4957                 :                :     /* WAL insertion locks, plus alignment */
 4096 heikki.linnakangas@i     4958                 :           3061 :     size = add_size(size, mul_size(sizeof(WALInsertLockPadded), NUM_XLOGINSERT_LOCKS + 1));
                               4959                 :                :     /* xlblocks array */
  730 jdavis@postgresql.or     4960                 :           3061 :     size = add_size(size, mul_size(sizeof(pg_atomic_uint64), XLOGbuffers));
                               4961                 :                :     /* extra alignment padding for XLOG I/O buffers */
  985 tmunro@postgresql.or     4962                 :           3061 :     size = add_size(size, Max(XLOG_BLCKSZ, PG_IO_ALIGN_SIZE));
                               4963                 :                :     /* and the buffers themselves */
 7199 tgl@sss.pgh.pa.us        4964                 :           3061 :     size = add_size(size, mul_size(XLOG_BLCKSZ, XLOGbuffers));
                               4965                 :                : 
                               4966                 :                :     /*
                               4967                 :                :      * Note: we don't count ControlFileData, it comes out of the "slop factor"
                               4968                 :                :      * added by CreateSharedMemoryAndSemaphores.  This lets us use this
                               4969                 :                :      * routine again below to compute the actual allocation size.
                               4970                 :                :      */
                               4971                 :                : 
 7425                          4972                 :           3061 :     return size;
                               4973                 :                : }
                               4974                 :                : 
                               4975                 :                : void
 9570 vadim4o@yahoo.com        4976                 :           1071 : XLOGShmemInit(void)
                               4977                 :                : {
                               4978                 :                :     bool        foundCFile,
                               4979                 :                :                 foundXLog;
                               4980                 :                :     char       *allocptr;
                               4981                 :                :     int         i;
                               4982                 :                :     ControlFileData *localControlFile;
                               4983                 :                : 
                               4984                 :                : #ifdef WAL_DEBUG
                               4985                 :                : 
                               4986                 :                :     /*
                               4987                 :                :      * Create a memory context for WAL debugging that's exempt from the normal
                               4988                 :                :      * "no pallocs in critical section" rule. Yes, that can lead to a PANIC if
                               4989                 :                :      * an allocation fails, but wal_debug is not for production use anyway.
                               4990                 :                :      */
                               4991                 :                :     if (walDebugCxt == NULL)
                               4992                 :                :     {
                               4993                 :                :         walDebugCxt = AllocSetContextCreate(TopMemoryContext,
                               4994                 :                :                                             "WAL Debug",
                               4995                 :                :                                             ALLOCSET_DEFAULT_SIZES);
                               4996                 :                :         MemoryContextAllowInCriticalSection(walDebugCxt, true);
                               4997                 :                :     }
                               4998                 :                : #endif
                               4999                 :                : 
                               5000                 :                : 
 3014 andres@anarazel.de       5001                 :           1071 :     XLogCtl = (XLogCtlData *)
                               5002                 :           1071 :         ShmemInitStruct("XLOG Ctl", XLOGShmemSize(), &foundXLog);
                               5003                 :                : 
 3018                          5004                 :           1071 :     localControlFile = ControlFile;
 9154 tgl@sss.pgh.pa.us        5005                 :           1071 :     ControlFile = (ControlFileData *)
 8034 bruce@momjian.us         5006                 :           1071 :         ShmemInitStruct("Control File", sizeof(ControlFileData), &foundCFile);
                               5007                 :                : 
 7423 tgl@sss.pgh.pa.us        5008   [ +  -  -  + ]:           1071 :     if (foundCFile || foundXLog)
                               5009                 :                :     {
                               5010                 :                :         /* both should be present or neither */
 7423 tgl@sss.pgh.pa.us        5011   [ #  #  #  # ]:UBC           0 :         Assert(foundCFile && foundXLog);
                               5012                 :                : 
                               5013                 :                :         /* Initialize local copy of WALInsertLocks */
 4165 rhaas@postgresql.org     5014                 :              0 :         WALInsertLocks = XLogCtl->Insert.WALInsertLocks;
                               5015                 :                : 
 3014 andres@anarazel.de       5016         [ #  # ]:              0 :         if (localControlFile)
                               5017                 :              0 :             pfree(localControlFile);
 8034 bruce@momjian.us         5018                 :              0 :         return;
                               5019                 :                :     }
 9046 tgl@sss.pgh.pa.us        5020                 :CBC        1071 :     memset(XLogCtl, 0, sizeof(XLogCtlData));
                               5021                 :                : 
                               5022                 :                :     /*
                               5023                 :                :      * Already have read control file locally, unless in bootstrap mode. Move
                               5024                 :                :      * contents into shared memory.
                               5025                 :                :      */
 3014 andres@anarazel.de       5026         [ +  + ]:           1071 :     if (localControlFile)
                               5027                 :                :     {
                               5028                 :            918 :         memcpy(ControlFile, localControlFile, sizeof(ControlFileData));
                               5029                 :            918 :         pfree(localControlFile);
                               5030                 :                :     }
                               5031                 :                : 
                               5032                 :                :     /*
                               5033                 :                :      * Since XLogCtlData contains XLogRecPtr fields, its sizeof should be a
                               5034                 :                :      * multiple of the alignment for same, so no extra alignment padding is
                               5035                 :                :      * needed here.
                               5036                 :                :      */
 4546 heikki.linnakangas@i     5037                 :           1071 :     allocptr = ((char *) XLogCtl) + sizeof(XLogCtlData);
  730 jdavis@postgresql.or     5038                 :           1071 :     XLogCtl->xlblocks = (pg_atomic_uint64 *) allocptr;
                               5039                 :           1071 :     allocptr += sizeof(pg_atomic_uint64) * XLOGbuffers;
                               5040                 :                : 
                               5041         [ +  + ]:         306012 :     for (i = 0; i < XLOGbuffers; i++)
                               5042                 :                :     {
                               5043                 :         304941 :         pg_atomic_init_u64(&XLogCtl->xlblocks[i], InvalidXLogRecPtr);
                               5044                 :                :     }
                               5045                 :                : 
                               5046                 :                :     /* WAL insertion locks. Ensure they're aligned to the full padded size */
 4290 heikki.linnakangas@i     5047                 :           1071 :     allocptr += sizeof(WALInsertLockPadded) -
 3102 tgl@sss.pgh.pa.us        5048                 :           1071 :         ((uintptr_t) allocptr) % sizeof(WALInsertLockPadded);
 4290 heikki.linnakangas@i     5049                 :           1071 :     WALInsertLocks = XLogCtl->Insert.WALInsertLocks =
                               5050                 :                :         (WALInsertLockPadded *) allocptr;
 4096                          5051                 :           1071 :     allocptr += sizeof(WALInsertLockPadded) * NUM_XLOGINSERT_LOCKS;
                               5052                 :                : 
                               5053         [ +  + ]:           9639 :     for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
                               5054                 :                :     {
 3656 rhaas@postgresql.org     5055                 :           8568 :         LWLockInitialize(&WALInsertLocks[i].l.lock, LWTRANCHE_WAL_INSERT);
  877 michael@paquier.xyz      5056                 :           8568 :         pg_atomic_init_u64(&WALInsertLocks[i].l.insertingAt, InvalidXLogRecPtr);
 3283 andres@anarazel.de       5057                 :           8568 :         WALInsertLocks[i].l.lastImportantAt = InvalidXLogRecPtr;
                               5058                 :                :     }
                               5059                 :                : 
                               5060                 :                :     /*
                               5061                 :                :      * Align the start of the page buffers to a full xlog block size boundary.
                               5062                 :                :      * This simplifies some calculations in XLOG insertion. It is also
                               5063                 :                :      * required for O_DIRECT.
                               5064                 :                :      */
 4546 heikki.linnakangas@i     5065                 :           1071 :     allocptr = (char *) TYPEALIGN(XLOG_BLCKSZ, allocptr);
 7425 tgl@sss.pgh.pa.us        5066                 :           1071 :     XLogCtl->pages = allocptr;
 7199                          5067                 :           1071 :     memset(XLogCtl->pages, 0, (Size) XLOG_BLCKSZ * XLOGbuffers);
                               5068                 :                : 
                               5069                 :                :     /*
                               5070                 :                :      * Do basic initialization of XLogCtl shared data. (StartupXLOG will fill
                               5071                 :                :      * in additional info.)
                               5072                 :                :      */
 9046                          5073                 :           1071 :     XLogCtl->XLogCacheBlck = XLOGbuffers - 1;
 2064 michael@paquier.xyz      5074                 :           1071 :     XLogCtl->SharedRecoveryState = RECOVERY_STATE_CRASH;
 1634 noah@leadboat.com        5075                 :           1071 :     XLogCtl->InstallXLogFileSegmentActive = false;
 4972 tgl@sss.pgh.pa.us        5076                 :           1071 :     XLogCtl->WalWriterSleeping = false;
                               5077                 :                : 
 4546 heikki.linnakangas@i     5078                 :           1071 :     SpinLockInit(&XLogCtl->Insert.insertpos_lck);
 8846 tgl@sss.pgh.pa.us        5079                 :           1071 :     SpinLockInit(&XLogCtl->info_lck);
  620 alvherre@alvh.no-ip.     5080                 :           1071 :     pg_atomic_init_u64(&XLogCtl->logInsertResult, InvalidXLogRecPtr);
  622                          5081                 :           1071 :     pg_atomic_init_u64(&XLogCtl->logWriteResult, InvalidXLogRecPtr);
                               5082                 :           1071 :     pg_atomic_init_u64(&XLogCtl->logFlushResult, InvalidXLogRecPtr);
  658 nathan@postgresql.or     5083                 :           1071 :     pg_atomic_init_u64(&XLogCtl->unloggedLSN, InvalidXLogRecPtr);
                               5084                 :                : }
                               5085                 :                : 
                               5086                 :                : /*
                               5087                 :                :  * This func must be called ONCE on system install.  It creates pg_control
                               5088                 :                :  * and the initial XLOG segment.
                               5089                 :                :  */
                               5090                 :                : void
  513 peter@eisentraut.org     5091                 :             51 : BootStrapXLOG(uint32 data_checksum_version)
                               5092                 :                : {
                               5093                 :                :     CheckPoint  checkPoint;
                               5094                 :                :     PGAlignedXLogBlock buffer;
                               5095                 :                :     XLogPageHeader page;
                               5096                 :                :     XLogLongPageHeader longpage;
                               5097                 :                :     XLogRecord *record;
                               5098                 :                :     char       *recptr;
                               5099                 :                :     uint64      sysidentifier;
                               5100                 :                :     struct timeval tv;
                               5101                 :                :     pg_crc32c   crc;
                               5102                 :                : 
                               5103                 :                :     /* allow ordinary WAL segment creation, like StartupXLOG() would */
 1219 michael@paquier.xyz      5104                 :             51 :     SetInstallXLogFileSegmentActive();
                               5105                 :                : 
                               5106                 :                :     /*
                               5107                 :                :      * Select a hopefully-unique system identifier code for this installation.
                               5108                 :                :      * We use the result of gettimeofday(), including the fractional seconds
                               5109                 :                :      * field, as being about as unique as we can easily get.  (Think not to
                               5110                 :                :      * use random(), since it hasn't been seeded and there's no portable way
                               5111                 :                :      * to seed it other than the system clock value...)  The upper half of the
                               5112                 :                :      * uint64 value is just the tv_sec part, while the lower half contains the
                               5113                 :                :      * tv_usec part (which must fit in 20 bits), plus 12 bits from our current
                               5114                 :                :      * PID for a little extra uniqueness.  A person knowing this encoding can
                               5115                 :                :      * determine the initialization time of the installation, which could
                               5116                 :                :      * perhaps be useful sometimes.
                               5117                 :                :      */
 7981 tgl@sss.pgh.pa.us        5118                 :             51 :     gettimeofday(&tv, NULL);
                               5119                 :             51 :     sysidentifier = ((uint64) tv.tv_sec) << 32;
 4254                          5120                 :             51 :     sysidentifier |= ((uint64) tv.tv_usec) << 12;
                               5121                 :             51 :     sysidentifier |= getpid() & 0xFFF;
                               5122                 :                : 
   10 peter@eisentraut.org     5123                 :GNC          51 :     memset(&buffer, 0, sizeof buffer);
                               5124                 :             51 :     page = (XLogPageHeader) &buffer;
                               5125                 :                : 
                               5126                 :                :     /*
                               5127                 :                :      * Set up information for the initial checkpoint record
                               5128                 :                :      *
                               5129                 :                :      * The initial checkpoint record is written to the beginning of the WAL
                               5130                 :                :      * segment with logid=0 logseg=1. The very first WAL segment, 0/0, is not
                               5131                 :                :      * used, so that we can use 0/0 to mean "before any valid WAL segment".
                               5132                 :                :      */
 3012 andres@anarazel.de       5133                 :CBC          51 :     checkPoint.redo = wal_segment_size + SizeOfXLogLongPHD;
 1504 rhaas@postgresql.org     5134                 :             51 :     checkPoint.ThisTimeLineID = BootstrapTimeLineID;
                               5135                 :             51 :     checkPoint.PrevTimeLineID = BootstrapTimeLineID;
 5076 simon@2ndQuadrant.co     5136                 :             51 :     checkPoint.fullPageWrites = fullPageWrites;
  514 rhaas@postgresql.org     5137                 :             51 :     checkPoint.wal_level = wal_level;
                               5138                 :                :     checkPoint.nextXid =
 2457 tmunro@postgresql.or     5139                 :             51 :         FullTransactionIdFromEpochAndXid(0, FirstNormalTransactionId);
 1617 tgl@sss.pgh.pa.us        5140                 :             51 :     checkPoint.nextOid = FirstGenbkiObjectId;
 7539                          5141                 :             51 :     checkPoint.nextMulti = FirstMultiXactId;
    9 heikki.linnakangas@i     5142                 :GNC          51 :     checkPoint.nextMultiOffset = 1;
 5953 tgl@sss.pgh.pa.us        5143                 :CBC          51 :     checkPoint.oldestXid = FirstNormalTransactionId;
 1337                          5144                 :             51 :     checkPoint.oldestXidDB = Template1DbOid;
 4712 alvherre@alvh.no-ip.     5145                 :             51 :     checkPoint.oldestMulti = FirstMultiXactId;
 1337 tgl@sss.pgh.pa.us        5146                 :             51 :     checkPoint.oldestMultiDB = Template1DbOid;
 3643 mail@joeconway.com       5147                 :             51 :     checkPoint.oldestCommitTsXid = InvalidTransactionId;
                               5148                 :             51 :     checkPoint.newestCommitTsXid = InvalidTransactionId;
 6514 tgl@sss.pgh.pa.us        5149                 :             51 :     checkPoint.time = (pg_time_t) time(NULL);
 5843 simon@2ndQuadrant.co     5150                 :             51 :     checkPoint.oldestActiveXid = InvalidTransactionId;
                               5151                 :                : 
  741 heikki.linnakangas@i     5152                 :             51 :     TransamVariables->nextXid = checkPoint.nextXid;
                               5153                 :             51 :     TransamVariables->nextOid = checkPoint.nextOid;
                               5154                 :             51 :     TransamVariables->oidCount = 0;
 7498 tgl@sss.pgh.pa.us        5155                 :             51 :     MultiXactSetNextMXact(checkPoint.nextMulti, checkPoint.nextMultiOffset);
 3192 rhaas@postgresql.org     5156                 :             51 :     AdvanceOldestClogXid(checkPoint.oldestXid);
 5783 tgl@sss.pgh.pa.us        5157                 :             51 :     SetTransactionIdLimit(checkPoint.oldestXid, checkPoint.oldestXidDB);
    9 heikki.linnakangas@i     5158                 :GNC          51 :     SetMultiXactIdLimit(checkPoint.oldestMulti, checkPoint.oldestMultiDB);
 4033 alvherre@alvh.no-ip.     5159                 :CBC          51 :     SetCommitTsLimit(InvalidTransactionId, InvalidTransactionId);
                               5160                 :                : 
                               5161                 :                :     /* Set up the XLOG page header */
 9570 vadim4o@yahoo.com        5162                 :             51 :     page->xlp_magic = XLOG_PAGE_MAGIC;
 7820 tgl@sss.pgh.pa.us        5163                 :             51 :     page->xlp_info = XLP_LONG_HEADER;
 1504 rhaas@postgresql.org     5164                 :             51 :     page->xlp_tli = BootstrapTimeLineID;
 3012 andres@anarazel.de       5165                 :             51 :     page->xlp_pageaddr = wal_segment_size;
 7820 tgl@sss.pgh.pa.us        5166                 :             51 :     longpage = (XLogLongPageHeader) page;
                               5167                 :             51 :     longpage->xlp_sysid = sysidentifier;
 3012 andres@anarazel.de       5168                 :             51 :     longpage->xlp_seg_size = wal_segment_size;
 7197 tgl@sss.pgh.pa.us        5169                 :             51 :     longpage->xlp_xlog_blcksz = XLOG_BLCKSZ;
                               5170                 :                : 
                               5171                 :                :     /* Insert the initial checkpoint record */
 4046 heikki.linnakangas@i     5172                 :             51 :     recptr = ((char *) page + SizeOfXLogLongPHD);
                               5173                 :             51 :     record = (XLogRecord *) recptr;
 4925                          5174                 :             51 :     record->xl_prev = 0;
 9570 vadim4o@yahoo.com        5175                 :             51 :     record->xl_xid = InvalidTransactionId;
 4046 heikki.linnakangas@i     5176                 :             51 :     record->xl_tot_len = SizeOfXLogRecord + SizeOfXLogRecordDataHeaderShort + sizeof(checkPoint);
 9046 tgl@sss.pgh.pa.us        5177                 :             51 :     record->xl_info = XLOG_CHECKPOINT_SHUTDOWN;
 9570 vadim4o@yahoo.com        5178                 :             51 :     record->xl_rmid = RM_XLOG_ID;
 4046 heikki.linnakangas@i     5179                 :             51 :     recptr += SizeOfXLogRecord;
                               5180                 :                :     /* fill the XLogRecordDataHeaderShort struct */
 3187 tgl@sss.pgh.pa.us        5181                 :             51 :     *(recptr++) = (char) XLR_BLOCK_ID_DATA_SHORT;
 4046 heikki.linnakangas@i     5182                 :             51 :     *(recptr++) = sizeof(checkPoint);
                               5183                 :             51 :     memcpy(recptr, &checkPoint, sizeof(checkPoint));
                               5184                 :             51 :     recptr += sizeof(checkPoint);
                               5185         [ -  + ]:             51 :     Assert(recptr - (char *) record == record->xl_tot_len);
                               5186                 :                : 
 4062                          5187                 :             51 :     INIT_CRC32C(crc);
 4046                          5188                 :             51 :     COMP_CRC32C(crc, ((char *) record) + SizeOfXLogRecord, record->xl_tot_len - SizeOfXLogRecord);
 4062                          5189                 :             51 :     COMP_CRC32C(crc, (char *) record, offsetof(XLogRecord, xl_crc));
                               5190                 :             51 :     FIN_CRC32C(crc);
 9121 vadim4o@yahoo.com        5191                 :             51 :     record->xl_crc = crc;
                               5192                 :                : 
                               5193                 :                :     /* Create first XLOG segment file */
 1504 rhaas@postgresql.org     5194                 :             51 :     openLogTLI = BootstrapTimeLineID;
                               5195                 :             51 :     openLogFile = XLogFileInit(1, BootstrapTimeLineID);
                               5196                 :                : 
                               5197                 :                :     /*
                               5198                 :                :      * We needn't bother with Reserve/ReleaseExternalFD here, since we'll
                               5199                 :                :      * close the file again in a moment.
                               5200                 :                :      */
                               5201                 :                : 
                               5202                 :                :     /* Write the first page with the initial record */
 8961 tgl@sss.pgh.pa.us        5203                 :             51 :     errno = 0;
 3197 rhaas@postgresql.org     5204                 :             51 :     pgstat_report_wait_start(WAIT_EVENT_WAL_BOOTSTRAP_WRITE);
   10 peter@eisentraut.org     5205         [ -  + ]:GNC          51 :     if (write(openLogFile, &buffer, XLOG_BLCKSZ) != XLOG_BLCKSZ)
                               5206                 :                :     {
                               5207                 :                :         /* if write didn't set errno, assume problem is no disk space */
 8961 tgl@sss.pgh.pa.us        5208         [ #  # ]:UBC           0 :         if (errno == 0)
                               5209                 :              0 :             errno = ENOSPC;
 8186                          5210         [ #  # ]:              0 :         ereport(PANIC,
                               5211                 :                :                 (errcode_for_file_access(),
                               5212                 :                :                  errmsg("could not write bootstrap write-ahead log file: %m")));
                               5213                 :                :     }
 3197 rhaas@postgresql.org     5214                 :CBC          51 :     pgstat_report_wait_end();
                               5215                 :                : 
                               5216                 :             51 :     pgstat_report_wait_start(WAIT_EVENT_WAL_BOOTSTRAP_SYNC);
 9046 tgl@sss.pgh.pa.us        5217         [ -  + ]:             51 :     if (pg_fsync(openLogFile) != 0)
 8186 tgl@sss.pgh.pa.us        5218         [ #  # ]:UBC           0 :         ereport(PANIC,
                               5219                 :                :                 (errcode_for_file_access(),
                               5220                 :                :                  errmsg("could not fsync bootstrap write-ahead log file: %m")));
 3197 rhaas@postgresql.org     5221                 :CBC          51 :     pgstat_report_wait_end();
                               5222                 :                : 
 2357 peter@eisentraut.org     5223         [ -  + ]:             51 :     if (close(openLogFile) != 0)
 7997 tgl@sss.pgh.pa.us        5224         [ #  # ]:UBC           0 :         ereport(PANIC,
                               5225                 :                :                 (errcode_for_file_access(),
                               5226                 :                :                  errmsg("could not close bootstrap write-ahead log file: %m")));
                               5227                 :                : 
 9046 tgl@sss.pgh.pa.us        5228                 :CBC          51 :     openLogFile = -1;
                               5229                 :                : 
                               5230                 :                :     /* Now create pg_control */
  513 peter@eisentraut.org     5231                 :             51 :     InitControlFile(sysidentifier, data_checksum_version);
 9046 tgl@sss.pgh.pa.us        5232                 :             51 :     ControlFile->time = checkPoint.time;
 9570 vadim4o@yahoo.com        5233                 :             51 :     ControlFile->checkPoint = checkPoint.redo;
 9046 tgl@sss.pgh.pa.us        5234                 :             51 :     ControlFile->checkPointCopy = checkPoint;
                               5235                 :                : 
                               5236                 :                :     /* some additional ControlFile fields are set in WriteControlFile() */
 9154                          5237                 :             51 :     WriteControlFile();
                               5238                 :                : 
                               5239                 :                :     /* Bootstrap the commit log, too */
 8881                          5240                 :             51 :     BootStrapCLOG();
 4033 alvherre@alvh.no-ip.     5241                 :             51 :     BootStrapCommitTs();
 7840 tgl@sss.pgh.pa.us        5242                 :             51 :     BootStrapSUBTRANS();
 7539                          5243                 :             51 :     BootStrapMultiXact();
                               5244                 :                : 
                               5245                 :                :     /*
                               5246                 :                :      * Force control file to be read - in contrast to normal processing we'd
                               5247                 :                :      * otherwise never run the checks and GUC related initializations therein.
                               5248                 :                :      */
 3018 andres@anarazel.de       5249                 :             51 :     ReadControlFile();
 9570 vadim4o@yahoo.com        5250                 :             51 : }
                               5251                 :                : 
                               5252                 :                : static char *
  138 tgl@sss.pgh.pa.us        5253                 :GNC         815 : str_time(pg_time_t tnow, char *buf, size_t bufsize)
                               5254                 :                : {
                               5255                 :            815 :     pg_strftime(buf, bufsize,
                               5256                 :                :                 "%Y-%m-%d %H:%M:%S %Z",
 6711 tgl@sss.pgh.pa.us        5257                 :CBC         815 :                 pg_localtime(&tnow, log_timezone));
                               5258                 :                : 
 9158 peter_e@gmx.net          5259                 :            815 :     return buf;
                               5260                 :                : }
                               5261                 :                : 
                               5262                 :                : /*
                               5263                 :                :  * Initialize the first WAL segment on new timeline.
                               5264                 :                :  */
                               5265                 :                : static void
 1401 heikki.linnakangas@i     5266                 :             49 : XLogInitNewTimeline(TimeLineID endTLI, XLogRecPtr endOfLog, TimeLineID newTLI)
                               5267                 :                : {
                               5268                 :                :     char        xlogfname[MAXFNAMELEN];
                               5269                 :                :     XLogSegNo   endLogSegNo;
                               5270                 :                :     XLogSegNo   startLogSegNo;
                               5271                 :                : 
                               5272                 :                :     /* we always switch to a new timeline after archive recovery */
 1504 rhaas@postgresql.org     5273         [ -  + ]:             49 :     Assert(endTLI != newTLI);
                               5274                 :                : 
                               5275                 :                :     /*
                               5276                 :                :      * Update min recovery point one last time.
                               5277                 :                :      */
 6020 heikki.linnakangas@i     5278                 :             49 :     UpdateMinRecoveryPoint(InvalidXLogRecPtr, true);
                               5279                 :                : 
                               5280                 :                :     /*
                               5281                 :                :      * Calculate the last segment on the old timeline, and the first segment
                               5282                 :                :      * on the new timeline. If the switch happens in the middle of a segment,
                               5283                 :                :      * they are the same, but if the switch happens exactly at a segment
                               5284                 :                :      * boundary, startLogSegNo will be endLogSegNo + 1.
                               5285                 :                :      */
 3012 andres@anarazel.de       5286                 :             49 :     XLByteToPrevSeg(endOfLog, endLogSegNo, wal_segment_size);
                               5287                 :             49 :     XLByteToSeg(endOfLog, startLogSegNo, wal_segment_size);
                               5288                 :                : 
                               5289                 :                :     /*
                               5290                 :                :      * Initialize the starting WAL segment for the new timeline. If the switch
                               5291                 :                :      * happens in the middle of a segment, copy data from the last WAL segment
                               5292                 :                :      * of the old timeline up to the switch point, to the starting WAL segment
                               5293                 :                :      * on the new timeline.
                               5294                 :                :      */
 4018 heikki.linnakangas@i     5295         [ +  + ]:             49 :     if (endLogSegNo == startLogSegNo)
                               5296                 :                :     {
                               5297                 :                :         /*
                               5298                 :                :          * Make a copy of the file on the new timeline.
                               5299                 :                :          *
                               5300                 :                :          * Writing WAL isn't allowed yet, so there are no locking
                               5301                 :                :          * considerations. But we should be just as tense as XLogFileInit to
                               5302                 :                :          * avoid emplacing a bogus file.
                               5303                 :                :          */
 1504 rhaas@postgresql.org     5304                 :             39 :         XLogFileCopy(newTLI, endLogSegNo, endTLI, endLogSegNo,
 3012 andres@anarazel.de       5305                 :             39 :                      XLogSegmentOffset(endOfLog, wal_segment_size));
                               5306                 :                :     }
                               5307                 :                :     else
                               5308                 :                :     {
                               5309                 :                :         /*
                               5310                 :                :          * The switch happened at a segment boundary, so just create the next
                               5311                 :                :          * segment on the new timeline.
                               5312                 :                :          */
                               5313                 :                :         int         fd;
                               5314                 :                : 
 1504 rhaas@postgresql.org     5315                 :             10 :         fd = XLogFileInit(startLogSegNo, newTLI);
                               5316                 :                : 
 2357 peter@eisentraut.org     5317         [ -  + ]:             10 :         if (close(fd) != 0)
                               5318                 :                :         {
 2207 michael@paquier.xyz      5319                 :UBC           0 :             int         save_errno = errno;
                               5320                 :                : 
 1504 rhaas@postgresql.org     5321                 :              0 :             XLogFileName(xlogfname, newTLI, startLogSegNo, wal_segment_size);
 2207 michael@paquier.xyz      5322                 :              0 :             errno = save_errno;
 4015 heikki.linnakangas@i     5323         [ #  # ]:              0 :             ereport(ERROR,
                               5324                 :                :                     (errcode_for_file_access(),
                               5325                 :                :                      errmsg("could not close file \"%s\": %m", xlogfname)));
                               5326                 :                :         }
                               5327                 :                :     }
                               5328                 :                : 
                               5329                 :                :     /*
                               5330                 :                :      * Let's just make real sure there are not .ready or .done flags posted
                               5331                 :                :      * for the new segment.
                               5332                 :                :      */
 1504 rhaas@postgresql.org     5333                 :CBC          49 :     XLogFileName(xlogfname, newTLI, startLogSegNo, wal_segment_size);
 4074 fujii@postgresql.org     5334                 :             49 :     XLogArchiveCleanup(xlogfname);
 7822 tgl@sss.pgh.pa.us        5335                 :             49 : }
                               5336                 :                : 
                               5337                 :                : /*
                               5338                 :                :  * Perform cleanup actions at the conclusion of archive recovery.
                               5339                 :                :  */
                               5340                 :                : static void
 1504 rhaas@postgresql.org     5341                 :             49 : CleanupAfterArchiveRecovery(TimeLineID EndOfLogTLI, XLogRecPtr EndOfLog,
                               5342                 :                :                             TimeLineID newTLI)
                               5343                 :                : {
                               5344                 :                :     /*
                               5345                 :                :      * Execute the recovery_end_command, if any.
                               5346                 :                :      */
 1527                          5347   [ +  -  +  + ]:             49 :     if (recoveryEndCommand && strcmp(recoveryEndCommand, "") != 0)
 1046 michael@paquier.xyz      5348                 :              2 :         ExecuteRecoveryCommand(recoveryEndCommand,
                               5349                 :                :                                "recovery_end_command",
                               5350                 :                :                                true,
                               5351                 :                :                                WAIT_EVENT_RECOVERY_END_COMMAND);
                               5352                 :                : 
                               5353                 :                :     /*
                               5354                 :                :      * We switched to a new timeline. Clean up segments on the old timeline.
                               5355                 :                :      *
                               5356                 :                :      * If there are any higher-numbered segments on the old timeline, remove
                               5357                 :                :      * them. They might contain valid WAL, but they might also be
                               5358                 :                :      * pre-allocated files containing garbage. In any case, they are not part
                               5359                 :                :      * of the new timeline's history so we don't need them.
                               5360                 :                :      */
 1504 rhaas@postgresql.org     5361                 :             49 :     RemoveNonParentXlogFiles(EndOfLog, newTLI);
                               5362                 :                : 
                               5363                 :                :     /*
                               5364                 :                :      * If the switch happened in the middle of a segment, what to do with the
                               5365                 :                :      * last, partial segment on the old timeline? If we don't archive it, and
                               5366                 :                :      * the server that created the WAL never archives it either (e.g. because
                               5367                 :                :      * it was hit by a meteor), it will never make it to the archive. That's
                               5368                 :                :      * OK from our point of view, because the new segment that we created with
                               5369                 :                :      * the new TLI contains all the WAL from the old timeline up to the switch
                               5370                 :                :      * point. But if you later try to do PITR to the "missing" WAL on the old
                               5371                 :                :      * timeline, recovery won't find it in the archive. It's physically
                               5372                 :                :      * present in the new file with new TLI, but recovery won't look there
                               5373                 :                :      * when it's recovering to the older timeline. On the other hand, if we
                               5374                 :                :      * archive the partial segment, and the original server on that timeline
                               5375                 :                :      * is still running and archives the completed version of the same segment
                               5376                 :                :      * later, it will fail. (We used to do that in 9.4 and below, and it
                               5377                 :                :      * caused such problems).
                               5378                 :                :      *
                               5379                 :                :      * As a compromise, we rename the last segment with the .partial suffix,
                               5380                 :                :      * and archive it. Archive recovery will never try to read .partial
                               5381                 :                :      * segments, so they will normally go unused. But in the odd PITR case,
                               5382                 :                :      * the administrator can copy them manually to the pg_wal directory
                               5383                 :                :      * (removing the suffix). They can be useful in debugging, too.
                               5384                 :                :      *
                               5385                 :                :      * If a .done or .ready file already exists for the old timeline, however,
                               5386                 :                :      * we had already determined that the segment is complete, so we can let
                               5387                 :                :      * it be archived normally. (In particular, if it was restored from the
                               5388                 :                :      * archive to begin with, it's expected to have a .done file).
                               5389                 :                :      */
 1527                          5390   [ +  +  +  + ]:             88 :     if (XLogSegmentOffset(EndOfLog, wal_segment_size) != 0 &&
                               5391   [ +  +  -  + ]:             39 :         XLogArchivingActive())
                               5392                 :                :     {
                               5393                 :                :         char        origfname[MAXFNAMELEN];
                               5394                 :                :         XLogSegNo   endLogSegNo;
                               5395                 :                : 
                               5396                 :              9 :         XLByteToPrevSeg(EndOfLog, endLogSegNo, wal_segment_size);
                               5397                 :              9 :         XLogFileName(origfname, EndOfLogTLI, endLogSegNo, wal_segment_size);
                               5398                 :                : 
                               5399         [ +  + ]:              9 :         if (!XLogArchiveIsReadyOrDone(origfname))
                               5400                 :                :         {
                               5401                 :                :             char        origpath[MAXPGPATH];
                               5402                 :                :             char        partialfname[MAXFNAMELEN];
                               5403                 :                :             char        partialpath[MAXPGPATH];
                               5404                 :                : 
                               5405                 :                :             /*
                               5406                 :                :              * If we're summarizing WAL, we can't rename the partial file
                               5407                 :                :              * until the summarizer finishes with it, else it will fail.
                               5408                 :                :              */
  510                          5409         [ +  + ]:              5 :             if (summarize_wal)
                               5410                 :              1 :                 WaitForWalSummarization(EndOfLog);
                               5411                 :                : 
 1527                          5412                 :              5 :             XLogFilePath(origpath, EndOfLogTLI, endLogSegNo, wal_segment_size);
                               5413                 :              5 :             snprintf(partialfname, MAXFNAMELEN, "%s.partial", origfname);
                               5414                 :              5 :             snprintf(partialpath, MAXPGPATH, "%s.partial", origpath);
                               5415                 :                : 
                               5416                 :                :             /*
                               5417                 :                :              * Make sure there's no .done or .ready file for the .partial
                               5418                 :                :              * file.
                               5419                 :                :              */
                               5420                 :              5 :             XLogArchiveCleanup(partialfname);
                               5421                 :                : 
                               5422                 :              5 :             durable_rename(origpath, partialpath, ERROR);
                               5423                 :              5 :             XLogArchiveNotify(partialfname);
                               5424                 :                :         }
                               5425                 :                :     }
                               5426                 :             49 : }
                               5427                 :                : 
                               5428                 :                : /*
                               5429                 :                :  * Check to see if required parameters are set high enough on this server
                               5430                 :                :  * for various aspects of recovery operation.
                               5431                 :                :  *
                               5432                 :                :  * Note that all the parameters which this function tests need to be
                               5433                 :                :  * listed in Administrator's Overview section in high-availability.sgml.
                               5434                 :                :  * If you change them, don't forget to update the list.
                               5435                 :                :  */
                               5436                 :                : static void
 1401 heikki.linnakangas@i     5437                 :            238 : CheckRequiredParameterValues(void)
                               5438                 :                : {
                               5439                 :                :     /*
                               5440                 :                :      * For archive recovery, the WAL must be generated with at least 'replica'
                               5441                 :                :      * wal_level.
                               5442                 :                :      */
                               5443   [ +  +  +  + ]:            238 :     if (ArchiveRecoveryRequested && ControlFile->wal_level == WAL_LEVEL_MINIMAL)
                               5444                 :                :     {
                               5445         [ +  - ]:              2 :         ereport(FATAL,
                               5446                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               5447                 :                :                  errmsg("WAL was generated with \"wal_level=minimal\", cannot continue recovering"),
                               5448                 :                :                  errdetail("This happens if you temporarily set \"wal_level=minimal\" on the server."),
                               5449                 :                :                  errhint("Use a backup taken after setting \"wal_level\" to higher than \"minimal\".")));
                               5450                 :                :     }
                               5451                 :                : 
                               5452                 :                :     /*
                               5453                 :                :      * For Hot Standby, the WAL must be generated with 'replica' mode, and we
                               5454                 :                :      * must have at least as many backend slots as the primary.
                               5455                 :                :      */
 4306                          5456   [ +  +  +  + ]:            236 :     if (ArchiveRecoveryRequested && EnableHotStandby)
                               5457                 :                :     {
                               5458                 :                :         /* We ignore autovacuum_worker_slots when we make this test. */
 5713                          5459                 :            118 :         RecoveryRequiresIntParameter("max_connections",
                               5460                 :                :                                      MaxConnections,
 5712 tgl@sss.pgh.pa.us        5461                 :            118 :                                      ControlFile->MaxConnections);
 4550 rhaas@postgresql.org     5462                 :            118 :         RecoveryRequiresIntParameter("max_worker_processes",
                               5463                 :                :                                      max_worker_processes,
                               5464                 :            118 :                                      ControlFile->max_worker_processes);
 2501 michael@paquier.xyz      5465                 :            118 :         RecoveryRequiresIntParameter("max_wal_senders",
                               5466                 :                :                                      max_wal_senders,
                               5467                 :            118 :                                      ControlFile->max_wal_senders);
 4852 tgl@sss.pgh.pa.us        5468                 :            118 :         RecoveryRequiresIntParameter("max_prepared_transactions",
                               5469                 :                :                                      max_prepared_xacts,
 5712                          5470                 :            118 :                                      ControlFile->max_prepared_xacts);
 4852                          5471                 :            118 :         RecoveryRequiresIntParameter("max_locks_per_transaction",
                               5472                 :                :                                      max_locks_per_xact,
 5712                          5473                 :            118 :                                      ControlFile->max_locks_per_xact);
                               5474                 :                :     }
 5843 simon@2ndQuadrant.co     5475                 :            236 : }
                               5476                 :                : 
                               5477                 :                : /*
                               5478                 :                :  * This must be called ONCE during postmaster or standalone-backend startup
                               5479                 :                :  */
                               5480                 :                : void
 9046 tgl@sss.pgh.pa.us        5481                 :            927 : StartupXLOG(void)
                               5482                 :                : {
                               5483                 :                :     XLogCtlInsert *Insert;
                               5484                 :                :     CheckPoint  checkPoint;
                               5485                 :                :     bool        wasShutdown;
                               5486                 :                :     bool        didCrash;
                               5487                 :                :     bool        haveTblspcMap;
                               5488                 :                :     bool        haveBackupLabel;
                               5489                 :                :     XLogRecPtr  EndOfLog;
                               5490                 :                :     TimeLineID  EndOfLogTLI;
                               5491                 :                :     TimeLineID  newTLI;
                               5492                 :                :     bool        performedWalRecovery;
                               5493                 :                :     EndOfWalRecoveryInfo *endOfRecoveryInfo;
                               5494                 :                :     XLogRecPtr  abortedRecPtr;
                               5495                 :                :     XLogRecPtr  missingContrecPtr;
                               5496                 :                :     TransactionId oldestActiveXID;
 1968 fujii@postgresql.org     5497                 :            927 :     bool        promoted = false;
                               5498                 :                :     char        timebuf[128];
                               5499                 :                : 
                               5500                 :                :     /*
                               5501                 :                :      * We should have an aux process resource owner to use, and we should not
                               5502                 :                :      * be in a transaction that's installed some other resowner.
                               5503                 :                :      */
 2710 tgl@sss.pgh.pa.us        5504         [ -  + ]:            927 :     Assert(AuxProcessResourceOwner != NULL);
                               5505   [ +  -  -  + ]:            927 :     Assert(CurrentResourceOwner == NULL ||
                               5506                 :                :            CurrentResourceOwner == AuxProcessResourceOwner);
                               5507                 :            927 :     CurrentResourceOwner = AuxProcessResourceOwner;
                               5508                 :                : 
                               5509                 :                :     /*
                               5510                 :                :      * Check that contents look valid.
                               5511                 :                :      */
 2232 peter@eisentraut.org     5512         [ -  + ]:            927 :     if (!XRecOffIsValid(ControlFile->checkPoint))
 8186 tgl@sss.pgh.pa.us        5513         [ #  # ]:UBC           0 :         ereport(FATAL,
                               5514                 :                :                 (errcode(ERRCODE_DATA_CORRUPTED),
                               5515                 :                :                  errmsg("control file contains invalid checkpoint location")));
                               5516                 :                : 
 2232 peter@eisentraut.org     5517   [ +  +  -  -  :CBC         927 :     switch (ControlFile->state)
                                           +  +  - ]
                               5518                 :                :     {
                               5519                 :            727 :         case DB_SHUTDOWNED:
                               5520                 :                : 
                               5521                 :                :             /*
                               5522                 :                :              * This is the expected case, so don't be chatty in standalone
                               5523                 :                :              * mode
                               5524                 :                :              */
                               5525   [ +  +  +  + ]:            727 :             ereport(IsPostmasterEnvironment ? LOG : NOTICE,
                               5526                 :                :                     (errmsg("database system was shut down at %s",
                               5527                 :                :                             str_time(ControlFile->time,
                               5528                 :                :                                      timebuf, sizeof(timebuf)))));
                               5529                 :            727 :             break;
                               5530                 :                : 
                               5531                 :             28 :         case DB_SHUTDOWNED_IN_RECOVERY:
                               5532         [ +  - ]:             28 :             ereport(LOG,
                               5533                 :                :                     (errmsg("database system was shut down in recovery at %s",
                               5534                 :                :                             str_time(ControlFile->time,
                               5535                 :                :                                      timebuf, sizeof(timebuf)))));
                               5536                 :             28 :             break;
                               5537                 :                : 
 2232 peter@eisentraut.org     5538                 :UBC           0 :         case DB_SHUTDOWNING:
                               5539         [ #  # ]:              0 :             ereport(LOG,
                               5540                 :                :                     (errmsg("database system shutdown was interrupted; last known up at %s",
                               5541                 :                :                             str_time(ControlFile->time,
                               5542                 :                :                                      timebuf, sizeof(timebuf)))));
                               5543                 :              0 :             break;
                               5544                 :                : 
                               5545                 :              0 :         case DB_IN_CRASH_RECOVERY:
                               5546         [ #  # ]:              0 :             ereport(LOG,
                               5547                 :                :                     (errmsg("database system was interrupted while in recovery at %s",
                               5548                 :                :                             str_time(ControlFile->time,
                               5549                 :                :                                      timebuf, sizeof(timebuf))),
                               5550                 :                :                      errhint("This probably means that some data is corrupted and"
                               5551                 :                :                              " you will have to use the last backup for recovery.")));
                               5552                 :              0 :             break;
                               5553                 :                : 
 2232 peter@eisentraut.org     5554                 :CBC           6 :         case DB_IN_ARCHIVE_RECOVERY:
                               5555         [ +  - ]:              6 :             ereport(LOG,
                               5556                 :                :                     (errmsg("database system was interrupted while in recovery at log time %s",
                               5557                 :                :                             str_time(ControlFile->checkPointCopy.time,
                               5558                 :                :                                      timebuf, sizeof(timebuf))),
                               5559                 :                :                      errhint("If this has occurred more than once some data might be corrupted"
                               5560                 :                :                              " and you might need to choose an earlier recovery target.")));
                               5561                 :              6 :             break;
                               5562                 :                : 
                               5563                 :            166 :         case DB_IN_PRODUCTION:
                               5564         [ +  - ]:            166 :             ereport(LOG,
                               5565                 :                :                     (errmsg("database system was interrupted; last known up at %s",
                               5566                 :                :                             str_time(ControlFile->time,
                               5567                 :                :                                      timebuf, sizeof(timebuf)))));
                               5568                 :            166 :             break;
                               5569                 :                : 
 2232 peter@eisentraut.org     5570                 :UBC           0 :         default:
                               5571         [ #  # ]:              0 :             ereport(FATAL,
                               5572                 :                :                     (errcode(ERRCODE_DATA_CORRUPTED),
                               5573                 :                :                      errmsg("control file contains invalid database cluster state")));
                               5574                 :                :     }
                               5575                 :                : 
                               5576                 :                :     /* This is just to allow attaching to startup process with a debugger */
                               5577                 :                : #ifdef XLOG_REPLAY_DELAY
                               5578                 :                :     if (ControlFile->state != DB_SHUTDOWNED)
                               5579                 :                :         pg_usleep(60000000L);
                               5580                 :                : #endif
                               5581                 :                : 
                               5582                 :                :     /*
                               5583                 :                :      * Verify that pg_wal, pg_wal/archive_status, and pg_wal/summaries exist.
                               5584                 :                :      * In cases where someone has performed a copy for PITR, these directories
                               5585                 :                :      * may have been excluded and need to be re-created.
                               5586                 :                :      */
 6248 tgl@sss.pgh.pa.us        5587                 :CBC         927 :     ValidateXLOGDirectoryStructure();
                               5588                 :                : 
                               5589                 :                :     /* Set up timeout handler needed to report startup progress. */
 1515 rhaas@postgresql.org     5590         [ +  + ]:            927 :     if (!IsBootstrapProcessingMode())
                               5591                 :            876 :         RegisterTimeout(STARTUP_PROGRESS_TIMEOUT,
                               5592                 :                :                         startup_progress_timeout_handler);
                               5593                 :                : 
                               5594                 :                :     /*----------
                               5595                 :                :      * If we previously crashed, perform a couple of actions:
                               5596                 :                :      *
                               5597                 :                :      * - The pg_wal directory may still include some temporary WAL segments
                               5598                 :                :      *   used when creating a new segment, so perform some clean up to not
                               5599                 :                :      *   bloat this path.  This is done first as there is no point to sync
                               5600                 :                :      *   this temporary data.
                               5601                 :                :      *
                               5602                 :                :      * - There might be data which we had written, intending to fsync it, but
                               5603                 :                :      *   which we had not actually fsync'd yet.  Therefore, a power failure in
                               5604                 :                :      *   the near future might cause earlier unflushed writes to be lost, even
                               5605                 :                :      *   though more recent data written to disk from here on would be
                               5606                 :                :      *   persisted.  To avoid that, fsync the entire data directory.
                               5607                 :                :      */
 1401 heikki.linnakangas@i     5608         [ +  + ]:            927 :     if (ControlFile->state != DB_SHUTDOWNED &&
                               5609         [ +  + ]:            200 :         ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
                               5610                 :                :     {
                               5611                 :            172 :         RemoveTempXlogFiles();
                               5612                 :            172 :         SyncDataDirectory();
 1352 andres@anarazel.de       5613                 :            172 :         didCrash = true;
                               5614                 :                :     }
                               5615                 :                :     else
                               5616                 :            755 :         didCrash = false;
                               5617                 :                : 
                               5618                 :                :     /*
                               5619                 :                :      * Prepare for WAL recovery if needed.
                               5620                 :                :      *
                               5621                 :                :      * InitWalRecovery analyzes the control file and the backup label file, if
                               5622                 :                :      * any.  It updates the in-memory ControlFile buffer according to the
                               5623                 :                :      * starting checkpoint, and sets InRecovery and ArchiveRecoveryRequested.
                               5624                 :                :      * It also applies the tablespace map file, if any.
                               5625                 :                :      */
 1401 heikki.linnakangas@i     5626                 :            927 :     InitWalRecovery(ControlFile, &wasShutdown,
                               5627                 :                :                     &haveBackupLabel, &haveTblspcMap);
                               5628                 :            927 :     checkPoint = ControlFile->checkPointCopy;
                               5629                 :                : 
                               5630                 :                :     /* initialize shared memory variables from the checkpoint record */
  741                          5631                 :            927 :     TransamVariables->nextXid = checkPoint.nextXid;
                               5632                 :            927 :     TransamVariables->nextOid = checkPoint.nextOid;
                               5633                 :            927 :     TransamVariables->oidCount = 0;
 7498 tgl@sss.pgh.pa.us        5634                 :            927 :     MultiXactSetNextMXact(checkPoint.nextMulti, checkPoint.nextMultiOffset);
 3192 rhaas@postgresql.org     5635                 :            927 :     AdvanceOldestClogXid(checkPoint.oldestXid);
 5783 tgl@sss.pgh.pa.us        5636                 :            927 :     SetTransactionIdLimit(checkPoint.oldestXid, checkPoint.oldestXidDB);
    9 heikki.linnakangas@i     5637                 :GNC         927 :     SetMultiXactIdLimit(checkPoint.oldestMulti, checkPoint.oldestMultiDB);
 3643 mail@joeconway.com       5638                 :CBC         927 :     SetCommitTsLimit(checkPoint.oldestCommitTsXid,
                               5639                 :                :                      checkPoint.newestCommitTsXid);
                               5640                 :                : 
                               5641                 :                :     /*
                               5642                 :                :      * Clear out any old relcache cache files.  This is *necessary* if we do
                               5643                 :                :      * any WAL replay, since that would probably result in the cache files
                               5644                 :                :      * being out of sync with database reality.  In theory we could leave them
                               5645                 :                :      * in place if the database had been cleanly shut down, but it seems
                               5646                 :                :      * safest to just remove them always and let them be rebuilt during the
                               5647                 :                :      * first backend startup.  These files needs to be removed from all
                               5648                 :                :      * directories including pg_tblspc, however the symlinks are created only
                               5649                 :                :      * after reading tablespace_map file in case of archive recovery from
                               5650                 :                :      * backup, so needs to clear old relcache files here after creating
                               5651                 :                :      * symlinks.
                               5652                 :                :      */
 1401 heikki.linnakangas@i     5653                 :            927 :     RelationCacheInitFileRemove();
                               5654                 :                : 
                               5655                 :                :     /*
                               5656                 :                :      * Initialize replication slots, before there's a chance to remove
                               5657                 :                :      * required resources.
                               5658                 :                :      */
 4207 andres@anarazel.de       5659                 :            927 :     StartupReplicationSlots();
                               5660                 :                : 
                               5661                 :                :     /*
                               5662                 :                :      * Startup logical state, needs to be setup now so we have proper data
                               5663                 :                :      * during crash recovery.
                               5664                 :                :      */
 4308 rhaas@postgresql.org     5665                 :            927 :     StartupReorderBuffer();
                               5666                 :                : 
                               5667                 :                :     /*
                               5668                 :                :      * Startup CLOG. This must be done after TransamVariables->nextXid has
                               5669                 :                :      * been initialized and before we accept connections or begin WAL replay.
                               5670                 :                :      */
 1786                          5671                 :            927 :     StartupCLOG();
                               5672                 :                : 
                               5673                 :                :     /*
                               5674                 :                :      * Startup MultiXact. We need to do this early to be able to replay
                               5675                 :                :      * truncations.
                               5676                 :                :      */
 4402 alvherre@alvh.no-ip.     5677                 :            927 :     StartupMultiXact();
                               5678                 :                : 
                               5679                 :                :     /*
                               5680                 :                :      * Ditto for commit timestamps.  Activate the facility if the setting is
                               5681                 :                :      * enabled in the control file, as there should be no tracking of commit
                               5682                 :                :      * timestamps done when the setting was disabled.  This facility can be
                               5683                 :                :      * started or stopped when replaying a XLOG_PARAMETER_CHANGE record.
                               5684                 :                :      */
 2640 michael@paquier.xyz      5685         [ +  + ]:            927 :     if (ControlFile->track_commit_timestamp)
 3660 alvherre@alvh.no-ip.     5686                 :             13 :         StartupCommitTs();
                               5687                 :                : 
                               5688                 :                :     /*
                               5689                 :                :      * Recover knowledge about replay progress of known replication partners.
                               5690                 :                :      */
 3886 andres@anarazel.de       5691                 :            927 :     StartupReplicationOrigin();
                               5692                 :                : 
                               5693                 :                :     /*
                               5694                 :                :      * Initialize unlogged LSN. On a clean shutdown, it's restored from the
                               5695                 :                :      * control file. On recovery, all unlogged relations are blown away, so
                               5696                 :                :      * the unlogged LSN counter can be reset too.
                               5697                 :                :      */
 4693 heikki.linnakangas@i     5698         [ +  + ]:            927 :     if (ControlFile->state == DB_SHUTDOWNED)
  658 nathan@postgresql.or     5699                 :            720 :         pg_atomic_write_membarrier_u64(&XLogCtl->unloggedLSN,
                               5700                 :            720 :                                        ControlFile->unloggedLSN);
                               5701                 :                :     else
                               5702                 :            207 :         pg_atomic_write_membarrier_u64(&XLogCtl->unloggedLSN,
                               5703                 :                :                                        FirstNormalUnloggedLSN);
                               5704                 :                : 
                               5705                 :                :     /*
                               5706                 :                :      * Copy any missing timeline history files between 'now' and the recovery
                               5707                 :                :      * target timeline from archive to pg_wal. While we don't need those files
                               5708                 :                :      * ourselves - the history file of the recovery target timeline covers all
                               5709                 :                :      * the previous timelines in the history too - a cascading standby server
                               5710                 :                :      * might be interested in them. Or, if you archive the WAL from this
                               5711                 :                :      * server to a different archive than the primary, it'd be good for all
                               5712                 :                :      * the history files to get archived there after failover, so that you can
                               5713                 :                :      * use one of the old timelines as a PITR target. Timeline history files
                               5714                 :                :      * are small, so it's better to copy them unnecessarily than not copy them
                               5715                 :                :      * and regret later.
                               5716                 :                :      */
 1401 heikki.linnakangas@i     5717                 :            927 :     restoreTimeLineHistoryFiles(checkPoint.ThisTimeLineID, recoveryTargetTLI);
                               5718                 :                : 
                               5719                 :                :     /*
                               5720                 :                :      * Before running in recovery, scan pg_twophase and fill in its status to
                               5721                 :                :      * be able to work on entries generated by redo.  Doing a scan before
                               5722                 :                :      * taking any recovery action has the merit to discard any 2PC files that
                               5723                 :                :      * are newer than the first record to replay, saving from any conflicts at
                               5724                 :                :      * replay.  This avoids as well any subsequent scans when doing recovery
                               5725                 :                :      * of the on-disk two-phase data.
                               5726                 :                :      */
 3180 simon@2ndQuadrant.co     5727                 :            927 :     restoreTwoPhaseData();
                               5728                 :                : 
                               5729                 :                :     /*
                               5730                 :                :      * When starting with crash recovery, reset pgstat data - it might not be
                               5731                 :                :      * valid. Otherwise restore pgstat data. It's safe to do this here,
                               5732                 :                :      * because postmaster will not yet have started any other processes.
                               5733                 :                :      *
                               5734                 :                :      * NB: Restoring replication slot stats relies on slot state to have
                               5735                 :                :      * already been restored from disk.
                               5736                 :                :      *
                               5737                 :                :      * TODO: With a bit of extra work we could just start with a pgstat file
                               5738                 :                :      * associated with the checkpoint redo location we're starting from.
                               5739                 :                :      */
 1352 andres@anarazel.de       5740         [ +  + ]:            927 :     if (didCrash)
                               5741                 :            172 :         pgstat_discard_stats();
                               5742                 :                :     else
  276 michael@paquier.xyz      5743                 :            755 :         pgstat_restore_stats();
                               5744                 :                : 
 5076 simon@2ndQuadrant.co     5745                 :            927 :     lastFullPageWrites = checkPoint.fullPageWrites;
                               5746                 :                : 
 4546 heikki.linnakangas@i     5747                 :            927 :     RedoRecPtr = XLogCtl->RedoRecPtr = XLogCtl->Insert.RedoRecPtr = checkPoint.redo;
 4060                          5748                 :            927 :     doPageWrites = lastFullPageWrites;
                               5749                 :                : 
                               5750                 :                :     /* REDO */
 7806 tgl@sss.pgh.pa.us        5751         [ +  + ]:            927 :     if (InRecovery)
                               5752                 :                :     {
                               5753                 :                :         /* Initialize state for RecoveryInProgress() */
 1401 heikki.linnakangas@i     5754         [ -  + ]:            207 :         SpinLockAcquire(&XLogCtl->info_lck);
                               5755         [ +  + ]:            207 :         if (InArchiveRecovery)
                               5756                 :            108 :             XLogCtl->SharedRecoveryState = RECOVERY_STATE_ARCHIVE;
                               5757                 :                :         else
                               5758                 :             99 :             XLogCtl->SharedRecoveryState = RECOVERY_STATE_CRASH;
                               5759                 :            207 :         SpinLockRelease(&XLogCtl->info_lck);
                               5760                 :                : 
                               5761                 :                :         /*
                               5762                 :                :          * Update pg_control to show that we are recovering and to show the
                               5763                 :                :          * selected checkpoint as the place we are starting from. We also mark
                               5764                 :                :          * pg_control with any minimum recovery stop point obtained from a
                               5765                 :                :          * backup history file.
                               5766                 :                :          *
                               5767                 :                :          * No need to hold ControlFileLock yet, we aren't up far enough.
                               5768                 :                :          */
                               5769                 :            207 :         UpdateControlFile();
                               5770                 :                : 
                               5771                 :                :         /*
                               5772                 :                :          * If there was a backup label file, it's done its job and the info
                               5773                 :                :          * has now been propagated into pg_control.  We must get rid of the
                               5774                 :                :          * label file so that if we crash during recovery, we'll pick up at
                               5775                 :                :          * the latest recovery restartpoint instead of going all the way back
                               5776                 :                :          * to the backup start point.  It seems prudent though to just rename
                               5777                 :                :          * the file out of the way rather than delete it completely.
                               5778                 :                :          */
                               5779         [ +  + ]:            207 :         if (haveBackupLabel)
                               5780                 :                :         {
                               5781                 :             71 :             unlink(BACKUP_LABEL_OLD);
                               5782                 :             71 :             durable_rename(BACKUP_LABEL_FILE, BACKUP_LABEL_OLD, FATAL);
                               5783                 :                :         }
                               5784                 :                : 
                               5785                 :                :         /*
                               5786                 :                :          * If there was a tablespace_map file, it's done its job and the
                               5787                 :                :          * symlinks have been created.  We must get rid of the map file so
                               5788                 :                :          * that if we crash during recovery, we don't create symlinks again.
                               5789                 :                :          * It seems prudent though to just rename the file out of the way
                               5790                 :                :          * rather than delete it completely.
                               5791                 :                :          */
                               5792         [ +  + ]:            207 :         if (haveTblspcMap)
                               5793                 :                :         {
                               5794                 :              2 :             unlink(TABLESPACE_MAP_OLD);
                               5795                 :              2 :             durable_rename(TABLESPACE_MAP, TABLESPACE_MAP_OLD, FATAL);
                               5796                 :                :         }
                               5797                 :                : 
                               5798                 :                :         /*
                               5799                 :                :          * Initialize our local copy of minRecoveryPoint.  When doing crash
                               5800                 :                :          * recovery we want to replay up to the end of WAL.  Particularly, in
                               5801                 :                :          * the case of a promoted standby minRecoveryPoint value in the
                               5802                 :                :          * control file is only updated after the first checkpoint.  However,
                               5803                 :                :          * if the instance crashes before the first post-recovery checkpoint
                               5804                 :                :          * is completed then recovery will use a stale location causing the
                               5805                 :                :          * startup process to think that there are still invalid page
                               5806                 :                :          * references when checking for data consistency.
                               5807                 :                :          */
 2723 michael@paquier.xyz      5808         [ +  + ]:            207 :         if (InArchiveRecovery)
                               5809                 :                :         {
 1401 heikki.linnakangas@i     5810                 :            108 :             LocalMinRecoveryPoint = ControlFile->minRecoveryPoint;
                               5811                 :            108 :             LocalMinRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
                               5812                 :                :         }
                               5813                 :                :         else
                               5814                 :                :         {
                               5815                 :             99 :             LocalMinRecoveryPoint = InvalidXLogRecPtr;
                               5816                 :             99 :             LocalMinRecoveryPointTLI = 0;
                               5817                 :                :         }
                               5818                 :                : 
                               5819                 :                :         /* Check that the GUCs used to generate the WAL allow recovery */
 5713                          5820                 :            207 :         CheckRequiredParameterValues();
                               5821                 :                : 
                               5822                 :                :         /*
                               5823                 :                :          * We're in recovery, so unlogged relations may be trashed and must be
                               5824                 :                :          * reset.  This should be done BEFORE allowing Hot Standby
                               5825                 :                :          * connections, so that read-only backends don't try to read whatever
                               5826                 :                :          * garbage is left over from before.
                               5827                 :                :          */
 5468 rhaas@postgresql.org     5828                 :            207 :         ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP);
                               5829                 :                : 
                               5830                 :                :         /*
                               5831                 :                :          * Likewise, delete any saved transaction snapshot files that got left
                               5832                 :                :          * behind by crashed backends.
                               5833                 :                :          */
 5171 tgl@sss.pgh.pa.us        5834                 :            207 :         DeleteAllExportedSnapshotFiles();
                               5835                 :                : 
                               5836                 :                :         /*
                               5837                 :                :          * Initialize for Hot Standby, if enabled. We won't let backends in
                               5838                 :                :          * yet, not until we've reached the min recovery point specified in
                               5839                 :                :          * control file and we've established a recovery snapshot from a
                               5840                 :                :          * running-xacts WAL record.
                               5841                 :                :          */
 4682 heikki.linnakangas@i     5842   [ +  +  +  + ]:            207 :         if (ArchiveRecoveryRequested && EnableHotStandby)
                               5843                 :                :         {
                               5844                 :                :             TransactionId *xids;
                               5845                 :                :             int         nxids;
                               5846                 :                : 
 5788                          5847         [ +  + ]:            102 :             ereport(DEBUG1,
                               5848                 :                :                     (errmsg_internal("initializing for hot standby")));
                               5849                 :                : 
 5843 simon@2ndQuadrant.co     5850                 :            102 :             InitRecoveryTransactionEnvironment();
                               5851                 :                : 
                               5852         [ +  + ]:            102 :             if (wasShutdown)
                               5853                 :             26 :                 oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
                               5854                 :                :             else
                               5855                 :             76 :                 oldestActiveXID = checkPoint.oldestActiveXid;
                               5856         [ -  + ]:            102 :             Assert(TransactionIdIsValid(oldestActiveXID));
                               5857                 :                : 
                               5858                 :                :             /* Tell procarray about the range of xids it has to deal with */
  741 heikki.linnakangas@i     5859                 :            102 :             ProcArrayInitRecovery(XidFromFullTransactionId(TransamVariables->nextXid));
                               5860                 :                : 
                               5861                 :                :             /*
                               5862                 :                :              * Startup subtrans only.  CLOG, MultiXact and commit timestamp
                               5863                 :                :              * have already been started up and other SLRUs are not maintained
                               5864                 :                :              * during recovery and need not be started yet.
                               5865                 :                :              */
 5843 simon@2ndQuadrant.co     5866                 :            102 :             StartupSUBTRANS(oldestActiveXID);
                               5867                 :                : 
                               5868                 :                :             /*
                               5869                 :                :              * If we're beginning at a shutdown checkpoint, we know that
                               5870                 :                :              * nothing was running on the primary at this point. So fake-up an
                               5871                 :                :              * empty running-xacts record and use that here and now. Recover
                               5872                 :                :              * additional standby state for prepared transactions.
                               5873                 :                :              */
 5728 heikki.linnakangas@i     5874         [ +  + ]:            102 :             if (wasShutdown)
                               5875                 :                :             {
                               5876                 :                :                 RunningTransactionsData running;
                               5877                 :                :                 TransactionId latestCompletedXid;
                               5878                 :                : 
                               5879                 :                :                 /* Update pg_subtrans entries for any prepared transactions */
  539                          5880                 :             26 :                 StandbyRecoverPreparedTransactions();
                               5881                 :                : 
                               5882                 :                :                 /*
                               5883                 :                :                  * Construct a RunningTransactions snapshot representing a
                               5884                 :                :                  * shut down server, with only prepared transactions still
                               5885                 :                :                  * alive. We're never overflowed at this point because all
                               5886                 :                :                  * subxids are listed with their parent prepared transactions.
                               5887                 :                :                  */
 5728                          5888                 :             26 :                 running.xcnt = nxids;
 4764 simon@2ndQuadrant.co     5889                 :             26 :                 running.subxcnt = 0;
  539 heikki.linnakangas@i     5890                 :             26 :                 running.subxid_status = SUBXIDS_IN_SUBTRANS;
 1955 andres@anarazel.de       5891                 :             26 :                 running.nextXid = XidFromFullTransactionId(checkPoint.nextXid);
 5728 heikki.linnakangas@i     5892                 :             26 :                 running.oldestRunningXid = oldestActiveXID;
 1955 andres@anarazel.de       5893                 :             26 :                 latestCompletedXid = XidFromFullTransactionId(checkPoint.nextXid);
 5698 simon@2ndQuadrant.co     5894         [ -  + ]:             26 :                 TransactionIdRetreat(latestCompletedXid);
 5697                          5895         [ -  + ]:             26 :                 Assert(TransactionIdIsNormal(latestCompletedXid));
 5698                          5896                 :             26 :                 running.latestCompletedXid = latestCompletedXid;
 5728 heikki.linnakangas@i     5897                 :             26 :                 running.xids = xids;
                               5898                 :                : 
                               5899                 :             26 :                 ProcArrayApplyRecoveryInfo(&running);
                               5900                 :                :             }
                               5901                 :                :         }
                               5902                 :                : 
                               5903                 :                :         /*
                               5904                 :                :          * We're all set for replaying the WAL now. Do it.
                               5905                 :                :          */
 1401                          5906                 :            207 :         PerformWalRecovery();
                               5907                 :            152 :         performedWalRecovery = true;
                               5908                 :                :     }
                               5909                 :                :     else
 1397                          5910                 :            720 :         performedWalRecovery = false;
                               5911                 :                : 
                               5912                 :                :     /*
                               5913                 :                :      * Finish WAL recovery.
                               5914                 :                :      */
 1401                          5915                 :            872 :     endOfRecoveryInfo = FinishWalRecovery();
                               5916                 :            872 :     EndOfLog = endOfRecoveryInfo->endOfLog;
                               5917                 :            872 :     EndOfLogTLI = endOfRecoveryInfo->endOfLogTLI;
                               5918                 :            872 :     abortedRecPtr = endOfRecoveryInfo->abortedRecPtr;
                               5919                 :            872 :     missingContrecPtr = endOfRecoveryInfo->missingContrecPtr;
                               5920                 :                : 
                               5921                 :                :     /*
                               5922                 :                :      * Reset ps status display, so as no information related to recovery shows
                               5923                 :                :      * up.
                               5924                 :                :      */
 1183 michael@paquier.xyz      5925                 :            872 :     set_ps_display("");
                               5926                 :                : 
                               5927                 :                :     /*
                               5928                 :                :      * When recovering from a backup (we are in recovery, and archive recovery
                               5929                 :                :      * was requested), complain if we did not roll forward far enough to reach
                               5930                 :                :      * the point where the database is consistent.  For regular online
                               5931                 :                :      * backup-from-primary, that means reaching the end-of-backup WAL record
                               5932                 :                :      * (at which point we reset backupStartPoint to be Invalid), for
                               5933                 :                :      * backup-from-replica (which can't inject records into the WAL stream),
                               5934                 :                :      * that point is when we reach the minRecoveryPoint in pg_control (which
                               5935                 :                :      * we purposefully copy last when backing up from a replica).  For
                               5936                 :                :      * pg_rewind (which creates a backup_label with a method of "pg_rewind")
                               5937                 :                :      * or snapshot-style backups (which don't), backupEndRequired will be set
                               5938                 :                :      * to false.
                               5939                 :                :      *
                               5940                 :                :      * Note: it is indeed okay to look at the local variable
                               5941                 :                :      * LocalMinRecoveryPoint here, even though ControlFile->minRecoveryPoint
                               5942                 :                :      * might be further ahead --- ControlFile->minRecoveryPoint cannot have
                               5943                 :                :      * been advanced beyond the WAL we processed.
                               5944                 :                :      */
 5377 heikki.linnakangas@i     5945         [ +  + ]:            872 :     if (InRecovery &&
 1401                          5946         [ +  - ]:            152 :         (EndOfLog < LocalMinRecoveryPoint ||
   42 alvherre@kurilemu.de     5947         [ -  + ]:GNC         152 :          XLogRecPtrIsValid(ControlFile->backupStartPoint)))
                               5948                 :                :     {
                               5949                 :                :         /*
                               5950                 :                :          * Ran off end of WAL before reaching end-of-backup WAL record, or
                               5951                 :                :          * minRecoveryPoint. That's a bad sign, indicating that you tried to
                               5952                 :                :          * recover from an online backup but never called pg_backup_stop(), or
                               5953                 :                :          * you didn't archive all the WAL needed.
                               5954                 :                :          */
 4682 heikki.linnakangas@i     5955   [ #  #  #  # ]:UBC           0 :         if (ArchiveRecoveryRequested || ControlFile->backupEndRequired)
                               5956                 :                :         {
   42 alvherre@kurilemu.de     5957   [ #  #  #  # ]:UNC           0 :             if (XLogRecPtrIsValid(ControlFile->backupStartPoint) || ControlFile->backupEndRequired)
 5244 heikki.linnakangas@i     5958         [ #  # ]:UBC           0 :                 ereport(FATAL,
                               5959                 :                :                         (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               5960                 :                :                          errmsg("WAL ends before end of online backup"),
                               5961                 :                :                          errhint("All WAL generated while online backup was taken must be available at recovery.")));
                               5962                 :                :             else
 5363                          5963         [ #  # ]:              0 :                 ereport(FATAL,
                               5964                 :                :                         (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               5965                 :                :                          errmsg("WAL ends before consistent recovery point")));
                               5966                 :                :         }
                               5967                 :                :     }
                               5968                 :                : 
                               5969                 :                :     /*
                               5970                 :                :      * Reset unlogged relations to the contents of their INIT fork. This is
                               5971                 :                :      * done AFTER recovery is complete so as to include any unlogged relations
                               5972                 :                :      * created during recovery, but BEFORE recovery is marked as having
                               5973                 :                :      * completed successfully. Otherwise we'd not retry if any of the post
                               5974                 :                :      * end-of-recovery steps fail.
                               5975                 :                :      */
 1401 heikki.linnakangas@i     5976         [ +  + ]:CBC         872 :     if (InRecovery)
                               5977                 :            152 :         ResetUnloggedRelations(UNLOGGED_RELATION_INIT);
                               5978                 :                : 
                               5979                 :                :     /*
                               5980                 :                :      * Pre-scan prepared transactions to find out the range of XIDs present.
                               5981                 :                :      * This information is not quite needed yet, but it is positioned here so
                               5982                 :                :      * as potential problems are detected before any on-disk change is done.
                               5983                 :                :      */
 2719 michael@paquier.xyz      5984                 :            872 :     oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
                               5985                 :                : 
                               5986                 :                :     /*
                               5987                 :                :      * Allow ordinary WAL segment creation before possibly switching to a new
                               5988                 :                :      * timeline, which creates a new segment, and after the last ReadRecord().
                               5989                 :                :      */
 1219                          5990                 :            872 :     SetInstallXLogFileSegmentActive();
                               5991                 :                : 
                               5992                 :                :     /*
                               5993                 :                :      * Consider whether we need to assign a new timeline ID.
                               5994                 :                :      *
                               5995                 :                :      * If we did archive recovery, we always assign a new ID.  This handles a
                               5996                 :                :      * couple of issues.  If we stopped short of the end of WAL during
                               5997                 :                :      * recovery, then we are clearly generating a new timeline and must assign
                               5998                 :                :      * it a unique new ID.  Even if we ran to the end, modifying the current
                               5999                 :                :      * last segment is problematic because it may result in trying to
                               6000                 :                :      * overwrite an already-archived copy of that segment, and we encourage
                               6001                 :                :      * DBAs to make their archive_commands reject that.  We can dodge the
                               6002                 :                :      * problem by making the new active segment have a new timeline ID.
                               6003                 :                :      *
                               6004                 :                :      * In a normal crash recovery, we can just extend the timeline we were in.
                               6005                 :                :      */
 1401 heikki.linnakangas@i     6006                 :            872 :     newTLI = endOfRecoveryInfo->lastRecTLI;
 4682                          6007         [ +  + ]:            872 :     if (ArchiveRecoveryRequested)
                               6008                 :                :     {
 1499 rhaas@postgresql.org     6009                 :             49 :         newTLI = findNewestTimeLine(recoveryTargetTLI) + 1;
 7820 tgl@sss.pgh.pa.us        6010         [ +  - ]:             49 :         ereport(LOG,
                               6011                 :                :                 (errmsg("selected new timeline ID: %u", newTLI)));
                               6012                 :                : 
                               6013                 :                :         /*
                               6014                 :                :          * Make a writable copy of the last WAL segment.  (Note that we also
                               6015                 :                :          * have a copy of the last block of the old WAL in
                               6016                 :                :          * endOfRecovery->lastPage; we will use that below.)
                               6017                 :                :          */
 1401 heikki.linnakangas@i     6018                 :             49 :         XLogInitNewTimeline(EndOfLogTLI, EndOfLog, newTLI);
                               6019                 :                : 
                               6020                 :                :         /*
                               6021                 :                :          * Remove the signal files out of the way, so that we don't
                               6022                 :                :          * accidentally re-enter archive recovery mode in a subsequent crash.
                               6023                 :                :          */
                               6024         [ +  + ]:             49 :         if (endOfRecoveryInfo->standby_signal_file_found)
                               6025                 :             46 :             durable_unlink(STANDBY_SIGNAL_FILE, FATAL);
                               6026                 :                : 
                               6027         [ +  + ]:             49 :         if (endOfRecoveryInfo->recovery_signal_file_found)
                               6028                 :              3 :             durable_unlink(RECOVERY_SIGNAL_FILE, FATAL);
                               6029                 :                : 
                               6030                 :                :         /*
                               6031                 :                :          * Write the timeline history file, and have it archived. After this
                               6032                 :                :          * point (or rather, as soon as the file is archived), the timeline
                               6033                 :                :          * will appear as "taken" in the WAL archive and to any standby
                               6034                 :                :          * servers.  If we crash before actually switching to the new
                               6035                 :                :          * timeline, standby servers will nevertheless think that we switched
                               6036                 :                :          * to the new timeline, and will try to connect to the new timeline.
                               6037                 :                :          * To minimize the window for that, try to do as little as possible
                               6038                 :                :          * between here and writing the end-of-recovery record.
                               6039                 :                :          */
 1499 rhaas@postgresql.org     6040                 :             49 :         writeTimeLineHistory(newTLI, recoveryTargetTLI,
                               6041                 :                :                              EndOfLog, endOfRecoveryInfo->recoveryStopReason);
                               6042                 :                : 
 1401 heikki.linnakangas@i     6043         [ +  - ]:             49 :         ereport(LOG,
                               6044                 :                :                 (errmsg("archive recovery complete")));
                               6045                 :                :     }
                               6046                 :                : 
                               6047                 :                :     /* Save the selected TimeLineID in shared memory, too */
  510 rhaas@postgresql.org     6048         [ -  + ]:            872 :     SpinLockAcquire(&XLogCtl->info_lck);
 1499                          6049                 :            872 :     XLogCtl->InsertTimeLineID = newTLI;
 1401 heikki.linnakangas@i     6050                 :            872 :     XLogCtl->PrevTimeLineID = endOfRecoveryInfo->lastRecTLI;
  510 rhaas@postgresql.org     6051                 :            872 :     SpinLockRelease(&XLogCtl->info_lck);
                               6052                 :                : 
                               6053                 :                :     /*
                               6054                 :                :      * Actually, if WAL ended in an incomplete record, skip the parts that
                               6055                 :                :      * made it through and start writing after the portion that persisted.
                               6056                 :                :      * (It's critical to first write an OVERWRITE_CONTRECORD message, which
                               6057                 :                :      * we'll do as soon as we're open for writing new WAL.)
                               6058                 :                :      */
   42 alvherre@kurilemu.de     6059         [ +  + ]:GNC         872 :     if (XLogRecPtrIsValid(missingContrecPtr))
                               6060                 :                :     {
                               6061                 :                :         /*
                               6062                 :                :          * We should only have a missingContrecPtr if we're not switching to a
                               6063                 :                :          * new timeline. When a timeline switch occurs, WAL is copied from the
                               6064                 :                :          * old timeline to the new only up to the end of the last complete
                               6065                 :                :          * record, so there can't be an incomplete WAL record that we need to
                               6066                 :                :          * disregard.
                               6067                 :                :          */
 1207 rhaas@postgresql.org     6068         [ -  + ]:CBC          10 :         Assert(newTLI == endOfRecoveryInfo->lastRecTLI);
   42 alvherre@kurilemu.de     6069         [ -  + ]:GNC          10 :         Assert(XLogRecPtrIsValid(abortedRecPtr));
 1541 alvherre@alvh.no-ip.     6070                 :CBC          10 :         EndOfLog = missingContrecPtr;
                               6071                 :                :     }
                               6072                 :                : 
                               6073                 :                :     /*
                               6074                 :                :      * Prepare to write WAL starting at EndOfLog location, and init xlog
                               6075                 :                :      * buffer cache using the block containing the last record from the
                               6076                 :                :      * previous incarnation.
                               6077                 :                :      */
 9182 vadim4o@yahoo.com        6078                 :            872 :     Insert = &XLogCtl->Insert;
 1401 heikki.linnakangas@i     6079                 :            872 :     Insert->PrevBytePos = XLogRecPtrToBytePos(endOfRecoveryInfo->lastRec);
 4537                          6080                 :            872 :     Insert->CurrBytePos = XLogRecPtrToBytePos(EndOfLog);
                               6081                 :                : 
                               6082                 :                :     /*
                               6083                 :                :      * Tricky point here: lastPage contains the *last* block that the LastRec
                               6084                 :                :      * record spans, not the one it starts in.  The last block is indeed the
                               6085                 :                :      * one we want to use.
                               6086                 :                :      */
                               6087         [ +  + ]:            872 :     if (EndOfLog % XLOG_BLCKSZ != 0)
                               6088                 :                :     {
                               6089                 :                :         char       *page;
                               6090                 :                :         int         len;
                               6091                 :                :         int         firstIdx;
                               6092                 :                : 
                               6093                 :            845 :         firstIdx = XLogRecPtrToBufIdx(EndOfLog);
 1401                          6094                 :            845 :         len = EndOfLog - endOfRecoveryInfo->lastPageBeginPtr;
                               6095         [ -  + ]:            845 :         Assert(len < XLOG_BLCKSZ);
                               6096                 :                : 
                               6097                 :                :         /* Copy the valid part of the last block, and zero the rest */
 4537                          6098                 :            845 :         page = &XLogCtl->pages[firstIdx * XLOG_BLCKSZ];
 1401                          6099                 :            845 :         memcpy(page, endOfRecoveryInfo->lastPage, len);
 4537                          6100                 :            845 :         memset(page + len, 0, XLOG_BLCKSZ - len);
                               6101                 :                : 
  730 jdavis@postgresql.or     6102                 :            845 :         pg_atomic_write_u64(&XLogCtl->xlblocks[firstIdx], endOfRecoveryInfo->lastPageBeginPtr + XLOG_BLCKSZ);
  118 akorotkov@postgresql     6103                 :            845 :         XLogCtl->InitializedUpTo = endOfRecoveryInfo->lastPageBeginPtr + XLOG_BLCKSZ;
                               6104                 :                :     }
                               6105                 :                :     else
                               6106                 :                :     {
                               6107                 :                :         /*
                               6108                 :                :          * There is no partial block to copy. Just set InitializedUpTo, and
                               6109                 :                :          * let the first attempt to insert a log record to initialize the next
                               6110                 :                :          * buffer.
                               6111                 :                :          */
                               6112                 :             27 :         XLogCtl->InitializedUpTo = EndOfLog;
                               6113                 :                :     }
                               6114                 :                : 
                               6115                 :                :     /*
                               6116                 :                :      * Update local and shared status.  This is OK to do without any locks
                               6117                 :                :      * because no other process can be reading or writing WAL yet.
                               6118                 :                :      */
 4537 heikki.linnakangas@i     6119                 :            872 :     LogwrtResult.Write = LogwrtResult.Flush = EndOfLog;
  620 alvherre@alvh.no-ip.     6120                 :            872 :     pg_atomic_write_u64(&XLogCtl->logInsertResult, EndOfLog);
  622                          6121                 :            872 :     pg_atomic_write_u64(&XLogCtl->logWriteResult, EndOfLog);
                               6122                 :            872 :     pg_atomic_write_u64(&XLogCtl->logFlushResult, EndOfLog);
 4537 heikki.linnakangas@i     6123                 :            872 :     XLogCtl->LogwrtRqst.Write = EndOfLog;
                               6124                 :            872 :     XLogCtl->LogwrtRqst.Flush = EndOfLog;
                               6125                 :                : 
                               6126                 :                :     /*
                               6127                 :                :      * Preallocate additional log files, if wanted.
                               6128                 :                :      */
 1499 rhaas@postgresql.org     6129                 :            872 :     PreallocXlogFiles(EndOfLog, newTLI);
                               6130                 :                : 
                               6131                 :                :     /*
                               6132                 :                :      * Okay, we're officially UP.
                               6133                 :                :      */
 9182 vadim4o@yahoo.com        6134                 :            872 :     InRecovery = false;
                               6135                 :                : 
                               6136                 :                :     /* start the archive_timeout timer and LSN running */
 4537 heikki.linnakangas@i     6137                 :            872 :     XLogCtl->lastSegSwitchTime = (pg_time_t) time(NULL);
 3283 andres@anarazel.de       6138                 :            872 :     XLogCtl->lastSegSwitchLSN = EndOfLog;
                               6139                 :                : 
                               6140                 :                :     /* also initialize latestCompletedXid, to nextXid - 1 */
 5064 tgl@sss.pgh.pa.us        6141                 :            872 :     LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
  741 heikki.linnakangas@i     6142                 :            872 :     TransamVariables->latestCompletedXid = TransamVariables->nextXid;
                               6143                 :            872 :     FullTransactionIdRetreat(&TransamVariables->latestCompletedXid);
 5064 tgl@sss.pgh.pa.us        6144                 :            872 :     LWLockRelease(ProcArrayLock);
                               6145                 :                : 
                               6146                 :                :     /*
                               6147                 :                :      * Start up subtrans, if not already done for hot standby.  (commit
                               6148                 :                :      * timestamps are started below, if necessary.)
                               6149                 :                :      */
 5843 simon@2ndQuadrant.co     6150         [ +  + ]:            872 :     if (standbyState == STANDBY_DISABLED)
                               6151                 :            823 :         StartupSUBTRANS(oldestActiveXID);
                               6152                 :                : 
                               6153                 :                :     /*
                               6154                 :                :      * Perform end of recovery actions for any SLRUs that need it.
                               6155                 :                :      */
 5160                          6156                 :            872 :     TrimCLOG();
 4402 alvherre@alvh.no-ip.     6157                 :            872 :     TrimMultiXact();
                               6158                 :                : 
                               6159                 :                :     /*
                               6160                 :                :      * Reload shared-memory state for prepared transactions.  This needs to
                               6161                 :                :      * happen before renaming the last partial segment of the old timeline as
                               6162                 :                :      * it may be possible that we have to recover some transactions from it.
                               6163                 :                :      */
 7489 tgl@sss.pgh.pa.us        6164                 :            872 :     RecoverPreparedTransactions();
                               6165                 :                : 
                               6166                 :                :     /* Shut down xlogreader */
 1401 heikki.linnakangas@i     6167                 :            872 :     ShutdownWalRecovery();
                               6168                 :                : 
                               6169                 :                :     /* Enable WAL writes for this backend only. */
 1526 rhaas@postgresql.org     6170                 :            872 :     LocalSetXLogInsertAllowed();
                               6171                 :                : 
                               6172                 :                :     /* If necessary, write overwrite-contrecord before doing anything else */
   42 alvherre@kurilemu.de     6173         [ +  + ]:GNC         872 :     if (XLogRecPtrIsValid(abortedRecPtr))
                               6174                 :                :     {
                               6175         [ -  + ]:             10 :         Assert(XLogRecPtrIsValid(missingContrecPtr));
 1401 heikki.linnakangas@i     6176                 :CBC          10 :         CreateOverwriteContrecordRecord(abortedRecPtr, missingContrecPtr, newTLI);
                               6177                 :                :     }
                               6178                 :                : 
                               6179                 :                :     /*
                               6180                 :                :      * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE
                               6181                 :                :      * record before resource manager writes cleanup WAL records or checkpoint
                               6182                 :                :      * record is written.
                               6183                 :                :      */
 1526 rhaas@postgresql.org     6184                 :            872 :     Insert->fullPageWrites = lastFullPageWrites;
                               6185                 :            872 :     UpdateFullPageWrites();
                               6186                 :                : 
                               6187                 :                :     /*
                               6188                 :                :      * Emit checkpoint or end-of-recovery record in XLOG, if required.
                               6189                 :                :      */
 1401 heikki.linnakangas@i     6190         [ +  + ]:            872 :     if (performedWalRecovery)
 1526 rhaas@postgresql.org     6191                 :            152 :         promoted = PerformRecoveryXLogAction();
                               6192                 :                : 
                               6193                 :                :     /*
                               6194                 :                :      * If any of the critical GUCs have changed, log them before we allow
                               6195                 :                :      * backends to write WAL.
                               6196                 :                :      */
 5713 heikki.linnakangas@i     6197                 :            872 :     XLogReportParameters();
                               6198                 :                : 
                               6199                 :                :     /* If this is archive recovery, perform post-recovery cleanup actions. */
 1515 rhaas@postgresql.org     6200         [ +  + ]:            872 :     if (ArchiveRecoveryRequested)
 1499                          6201                 :             49 :         CleanupAfterArchiveRecovery(EndOfLogTLI, EndOfLog, newTLI);
                               6202                 :                : 
                               6203                 :                :     /*
                               6204                 :                :      * Local WAL inserts enabled, so it's time to finish initialization of
                               6205                 :                :      * commit timestamp.
                               6206                 :                :      */
 4033 alvherre@alvh.no-ip.     6207                 :            872 :     CompleteCommitTsInitialization();
                               6208                 :                : 
                               6209                 :                :     /* Clean up EndOfWalRecoveryInfo data to appease Valgrind leak checking */
  138 tgl@sss.pgh.pa.us        6210         [ +  + ]:GNC         872 :     if (endOfRecoveryInfo->lastPage)
                               6211                 :            855 :         pfree(endOfRecoveryInfo->lastPage);
                               6212                 :            872 :     pfree(endOfRecoveryInfo->recoveryStopReason);
                               6213                 :            872 :     pfree(endOfRecoveryInfo);
                               6214                 :                : 
                               6215                 :                :     /*
                               6216                 :                :      * All done with end-of-recovery actions.
                               6217                 :                :      *
                               6218                 :                :      * Now allow backends to write WAL and update the control file status in
                               6219                 :                :      * consequence.  SharedRecoveryState, that controls if backends can write
                               6220                 :                :      * WAL, is updated while holding ControlFileLock to prevent other backends
                               6221                 :                :      * to look at an inconsistent state of the control file in shared memory.
                               6222                 :                :      * There is still a small window during which backends can write WAL and
                               6223                 :                :      * the control file is still referring to a system not in DB_IN_PRODUCTION
                               6224                 :                :      * state while looking at the on-disk control file.
                               6225                 :                :      *
                               6226                 :                :      * Also, we use info_lck to update SharedRecoveryState to ensure that
                               6227                 :                :      * there are no race conditions concerning visibility of other recent
                               6228                 :                :      * updates to shared memory.
                               6229                 :                :      */
 3432 peter_e@gmx.net          6230                 :CBC         872 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               6231                 :            872 :     ControlFile->state = DB_IN_PRODUCTION;
                               6232                 :                : 
 4105 andres@anarazel.de       6233         [ -  + ]:            872 :     SpinLockAcquire(&XLogCtl->info_lck);
 2064 michael@paquier.xyz      6234                 :            872 :     XLogCtl->SharedRecoveryState = RECOVERY_STATE_DONE;
 4105 andres@anarazel.de       6235                 :            872 :     SpinLockRelease(&XLogCtl->info_lck);
                               6236                 :                : 
 3432 peter_e@gmx.net          6237                 :            872 :     UpdateControlFile();
                               6238                 :            872 :     LWLockRelease(ControlFileLock);
                               6239                 :                : 
                               6240                 :                :     /*
                               6241                 :                :      * Wake up all waiters for replay LSN.  They need to report an error that
                               6242                 :                :      * recovery was ended before reaching the target LSN.
                               6243                 :                :      */
   43 akorotkov@postgresql     6244                 :GNC         872 :     WaitLSNWakeup(WAIT_LSN_TYPE_REPLAY, InvalidXLogRecPtr);
                               6245                 :                : 
                               6246                 :                :     /*
                               6247                 :                :      * Shutdown the recovery environment.  This must occur after
                               6248                 :                :      * RecoverPreparedTransactions() (see notes in lock_twophase_recover())
                               6249                 :                :      * and after switching SharedRecoveryState to RECOVERY_STATE_DONE so as
                               6250                 :                :      * any session building a snapshot will not rely on KnownAssignedXids as
                               6251                 :                :      * RecoveryInProgress() would return false at this stage.  This is
                               6252                 :                :      * particularly critical for prepared 2PC transactions, that would still
                               6253                 :                :      * need to be included in snapshots once recovery has ended.
                               6254                 :                :      */
 1536 michael@paquier.xyz      6255         [ +  + ]:CBC         872 :     if (standbyState != STANDBY_DISABLED)
                               6256                 :             49 :         ShutdownRecoveryTransactionEnvironment();
                               6257                 :                : 
                               6258                 :                :     /*
                               6259                 :                :      * If there were cascading standby servers connected to us, nudge any wal
                               6260                 :                :      * sender processes to notice that we've been promoted.
                               6261                 :                :      */
  985 andres@anarazel.de       6262                 :            872 :     WalSndWakeup(true, true);
                               6263                 :                : 
                               6264                 :                :     /*
                               6265                 :                :      * If this was a promotion, request an (online) checkpoint now. This isn't
                               6266                 :                :      * required for consistency, but the last restartpoint might be far back,
                               6267                 :                :      * and in case of a crash, recovering from it might take a longer than is
                               6268                 :                :      * appropriate now that we're not in standby mode anymore.
                               6269                 :                :      */
 1968 fujii@postgresql.org     6270         [ +  + ]:            872 :     if (promoted)
 4594 simon@2ndQuadrant.co     6271                 :             42 :         RequestCheckpoint(CHECKPOINT_FORCE);
 6147 heikki.linnakangas@i     6272                 :            872 : }
                               6273                 :                : 
                               6274                 :                : /*
                               6275                 :                :  * Callback from PerformWalRecovery(), called when we switch from crash
                               6276                 :                :  * recovery to archive recovery mode.  Updates the control file accordingly.
                               6277                 :                :  */
                               6278                 :                : void
 1401                          6279                 :              2 : SwitchIntoArchiveRecovery(XLogRecPtr EndRecPtr, TimeLineID replayTLI)
                               6280                 :                : {
                               6281                 :                :     /* initialize minRecoveryPoint to this record */
                               6282                 :              2 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               6283                 :              2 :     ControlFile->state = DB_IN_ARCHIVE_RECOVERY;
                               6284         [ +  - ]:              2 :     if (ControlFile->minRecoveryPoint < EndRecPtr)
                               6285                 :                :     {
                               6286                 :              2 :         ControlFile->minRecoveryPoint = EndRecPtr;
                               6287                 :              2 :         ControlFile->minRecoveryPointTLI = replayTLI;
                               6288                 :                :     }
                               6289                 :                :     /* update local copy */
                               6290                 :              2 :     LocalMinRecoveryPoint = ControlFile->minRecoveryPoint;
                               6291                 :              2 :     LocalMinRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
                               6292                 :                : 
                               6293                 :                :     /*
                               6294                 :                :      * The startup process can update its local copy of minRecoveryPoint from
                               6295                 :                :      * this point.
                               6296                 :                :      */
                               6297                 :              2 :     updateMinRecoveryPoint = true;
                               6298                 :                : 
                               6299                 :              2 :     UpdateControlFile();
                               6300                 :                : 
                               6301                 :                :     /*
                               6302                 :                :      * We update SharedRecoveryState while holding the lock on ControlFileLock
                               6303                 :                :      * so both states are consistent in shared memory.
                               6304                 :                :      */
                               6305         [ -  + ]:              2 :     SpinLockAcquire(&XLogCtl->info_lck);
                               6306                 :              2 :     XLogCtl->SharedRecoveryState = RECOVERY_STATE_ARCHIVE;
                               6307                 :              2 :     SpinLockRelease(&XLogCtl->info_lck);
                               6308                 :                : 
                               6309                 :              2 :     LWLockRelease(ControlFileLock);
                               6310                 :              2 : }
                               6311                 :                : 
                               6312                 :                : /*
                               6313                 :                :  * Callback from PerformWalRecovery(), called when we reach the end of backup.
                               6314                 :                :  * Updates the control file accordingly.
                               6315                 :                :  */
                               6316                 :                : void
                               6317                 :             71 : ReachedEndOfBackup(XLogRecPtr EndRecPtr, TimeLineID tli)
                               6318                 :                : {
                               6319                 :                :     /*
                               6320                 :                :      * We have reached the end of base backup, as indicated by pg_control. The
                               6321                 :                :      * data on disk is now consistent (unless minRecoveryPoint is further
                               6322                 :                :      * ahead, which can happen if we crashed during previous recovery).  Reset
                               6323                 :                :      * backupStartPoint and backupEndPoint, and update minRecoveryPoint to
                               6324                 :                :      * make sure we don't allow starting up at an earlier point even if
                               6325                 :                :      * recovery is stopped and restarted soon after this.
                               6326                 :                :      */
                               6327                 :             71 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               6328                 :                : 
                               6329         [ +  + ]:             71 :     if (ControlFile->minRecoveryPoint < EndRecPtr)
                               6330                 :                :     {
                               6331                 :             67 :         ControlFile->minRecoveryPoint = EndRecPtr;
                               6332                 :             67 :         ControlFile->minRecoveryPointTLI = tli;
                               6333                 :                :     }
                               6334                 :                : 
                               6335                 :             71 :     ControlFile->backupStartPoint = InvalidXLogRecPtr;
                               6336                 :             71 :     ControlFile->backupEndPoint = InvalidXLogRecPtr;
                               6337                 :             71 :     ControlFile->backupEndRequired = false;
                               6338                 :             71 :     UpdateControlFile();
                               6339                 :                : 
                               6340                 :             71 :     LWLockRelease(ControlFileLock);
 5728                          6341                 :             71 : }
                               6342                 :                : 
                               6343                 :                : /*
                               6344                 :                :  * Perform whatever XLOG actions are necessary at end of REDO.
                               6345                 :                :  *
                               6346                 :                :  * The goal here is to make sure that we'll be able to recover properly if
                               6347                 :                :  * we crash again. If we choose to write a checkpoint, we'll write a shutdown
                               6348                 :                :  * checkpoint rather than an on-line one. This is not particularly critical,
                               6349                 :                :  * but since we may be assigning a new TLI, using a shutdown checkpoint allows
                               6350                 :                :  * us to have the rule that TLI only changes in shutdown checkpoints, which
                               6351                 :                :  * allows some extra error checking in xlog_redo.
                               6352                 :                :  */
                               6353                 :                : static bool
 1527 rhaas@postgresql.org     6354                 :            152 : PerformRecoveryXLogAction(void)
                               6355                 :                : {
                               6356                 :            152 :     bool        promoted = false;
                               6357                 :                : 
                               6358                 :                :     /*
                               6359                 :                :      * Perform a checkpoint to update all our recovery activity to disk.
                               6360                 :                :      *
                               6361                 :                :      * Note that we write a shutdown checkpoint rather than an on-line one.
                               6362                 :                :      * This is not particularly critical, but since we may be assigning a new
                               6363                 :                :      * TLI, using a shutdown checkpoint allows us to have the rule that TLI
                               6364                 :                :      * only changes in shutdown checkpoints, which allows some extra error
                               6365                 :                :      * checking in xlog_redo.
                               6366                 :                :      *
                               6367                 :                :      * In promotion, only create a lightweight end-of-recovery record instead
                               6368                 :                :      * of a full checkpoint. A checkpoint is requested later, after we're
                               6369                 :                :      * fully out of recovery mode and already accepting queries.
                               6370                 :                :      */
                               6371   [ +  +  +  -  :            201 :     if (ArchiveRecoveryRequested && IsUnderPostmaster &&
                                              +  + ]
 1401 heikki.linnakangas@i     6372                 :             49 :         PromoteIsTriggered())
                               6373                 :                :     {
 1527 rhaas@postgresql.org     6374                 :             42 :         promoted = true;
                               6375                 :                : 
                               6376                 :                :         /*
                               6377                 :                :          * Insert a special WAL record to mark the end of recovery, since we
                               6378                 :                :          * aren't doing a checkpoint. That means that the checkpointer process
                               6379                 :                :          * may likely be in the middle of a time-smoothed restartpoint and
                               6380                 :                :          * could continue to be for minutes after this.  That sounds strange,
                               6381                 :                :          * but the effect is roughly the same and it would be stranger to try
                               6382                 :                :          * to come out of the restartpoint and then checkpoint. We request a
                               6383                 :                :          * checkpoint later anyway, just for safety.
                               6384                 :                :          */
                               6385                 :             42 :         CreateEndOfRecoveryRecord();
                               6386                 :                :     }
                               6387                 :                :     else
                               6388                 :                :     {
                               6389                 :            110 :         RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
                               6390                 :                :                           CHECKPOINT_FAST |
                               6391                 :                :                           CHECKPOINT_WAIT);
                               6392                 :                :     }
                               6393                 :                : 
                               6394                 :            152 :     return promoted;
                               6395                 :                : }
                               6396                 :                : 
                               6397                 :                : /*
                               6398                 :                :  * Is the system still in recovery?
                               6399                 :                :  *
                               6400                 :                :  * Unlike testing InRecovery, this works in any process that's connected to
                               6401                 :                :  * shared memory.
                               6402                 :                :  */
                               6403                 :                : bool
 6147 heikki.linnakangas@i     6404                 :       53346111 : RecoveryInProgress(void)
                               6405                 :                : {
                               6406                 :                :     /*
                               6407                 :                :      * We check shared state each time only until we leave recovery mode. We
                               6408                 :                :      * can't re-enter recovery, so there's no need to keep checking after the
                               6409                 :                :      * shared variable has once been seen false.
                               6410                 :                :      */
                               6411         [ +  + ]:       53346111 :     if (!LocalRecoveryInProgress)
                               6412                 :       51180432 :         return false;
                               6413                 :                :     else
                               6414                 :                :     {
                               6415                 :                :         /*
                               6416                 :                :          * use volatile pointer to make sure we make a fresh read of the
                               6417                 :                :          * shared variable.
                               6418                 :                :          */
                               6419                 :        2165679 :         volatile XLogCtlData *xlogctl = XLogCtl;
                               6420                 :                : 
 2064 michael@paquier.xyz      6421                 :        2165679 :         LocalRecoveryInProgress = (xlogctl->SharedRecoveryState != RECOVERY_STATE_DONE);
                               6422                 :                : 
                               6423                 :                :         /*
                               6424                 :                :          * Note: We don't need a memory barrier when we're still in recovery.
                               6425                 :                :          * We might exit recovery immediately after return, so the caller
                               6426                 :                :          * can't rely on 'true' meaning that we're still in recovery anyway.
                               6427                 :                :          */
                               6428                 :                : 
 6147 heikki.linnakangas@i     6429                 :        2165679 :         return LocalRecoveryInProgress;
                               6430                 :                :     }
                               6431                 :                : }
                               6432                 :                : 
                               6433                 :                : /*
                               6434                 :                :  * Returns current recovery state from shared memory.
                               6435                 :                :  *
                               6436                 :                :  * This returned state is kept consistent with the contents of the control
                               6437                 :                :  * file.  See details about the possible values of RecoveryState in xlog.h.
                               6438                 :                :  */
                               6439                 :                : RecoveryState
 2064 michael@paquier.xyz      6440                 :          31119 : GetRecoveryState(void)
                               6441                 :                : {
                               6442                 :                :     RecoveryState retval;
                               6443                 :                : 
                               6444         [ -  + ]:          31119 :     SpinLockAcquire(&XLogCtl->info_lck);
                               6445                 :          31119 :     retval = XLogCtl->SharedRecoveryState;
                               6446                 :          31119 :     SpinLockRelease(&XLogCtl->info_lck);
                               6447                 :                : 
                               6448                 :          31119 :     return retval;
                               6449                 :                : }
                               6450                 :                : 
                               6451                 :                : /*
                               6452                 :                :  * Is this process allowed to insert new WAL records?
                               6453                 :                :  *
                               6454                 :                :  * Ordinarily this is essentially equivalent to !RecoveryInProgress().
                               6455                 :                :  * But we also have provisions for forcing the result "true" or "false"
                               6456                 :                :  * within specific processes regardless of the global state.
                               6457                 :                :  */
                               6458                 :                : bool
 6019 tgl@sss.pgh.pa.us        6459                 :       38545451 : XLogInsertAllowed(void)
                               6460                 :                : {
                               6461                 :                :     /*
                               6462                 :                :      * If value is "unconditionally true" or "unconditionally false", just
                               6463                 :                :      * return it.  This provides the normal fast path once recovery is known
                               6464                 :                :      * done.
                               6465                 :                :      */
                               6466         [ +  + ]:       38545451 :     if (LocalXLogInsertAllowed >= 0)
                               6467                 :       37810347 :         return (bool) LocalXLogInsertAllowed;
                               6468                 :                : 
                               6469                 :                :     /*
                               6470                 :                :      * Else, must check to see if we're still in recovery.
                               6471                 :                :      */
                               6472         [ +  + ]:         735104 :     if (RecoveryInProgress())
                               6473                 :         726346 :         return false;
                               6474                 :                : 
                               6475                 :                :     /*
                               6476                 :                :      * On exit from recovery, reset to "unconditionally true", since there is
                               6477                 :                :      * no need to keep checking.
                               6478                 :                :      */
                               6479                 :           8758 :     LocalXLogInsertAllowed = 1;
                               6480                 :           8758 :     return true;
                               6481                 :                : }
                               6482                 :                : 
                               6483                 :                : /*
                               6484                 :                :  * Make XLogInsertAllowed() return true in the current process only.
                               6485                 :                :  *
                               6486                 :                :  * Note: it is allowed to switch LocalXLogInsertAllowed back to -1 later,
                               6487                 :                :  * and even call LocalSetXLogInsertAllowed() again after that.
                               6488                 :                :  *
                               6489                 :                :  * Returns the previous value of LocalXLogInsertAllowed.
                               6490                 :                :  */
                               6491                 :                : static int
                               6492                 :            900 : LocalSetXLogInsertAllowed(void)
                               6493                 :                : {
 1401 heikki.linnakangas@i     6494                 :            900 :     int         oldXLogAllowed = LocalXLogInsertAllowed;
                               6495                 :                : 
 6019 tgl@sss.pgh.pa.us        6496                 :            900 :     LocalXLogInsertAllowed = 1;
                               6497                 :                : 
 1515 rhaas@postgresql.org     6498                 :            900 :     return oldXLogAllowed;
                               6499                 :                : }
                               6500                 :                : 
                               6501                 :                : /*
                               6502                 :                :  * Return the current Redo pointer from shared memory.
                               6503                 :                :  *
                               6504                 :                :  * As a side-effect, the local RedoRecPtr copy is updated.
                               6505                 :                :  */
                               6506                 :                : XLogRecPtr
 9121 vadim4o@yahoo.com        6507                 :         213108 : GetRedoRecPtr(void)
                               6508                 :                : {
                               6509                 :                :     XLogRecPtr  ptr;
                               6510                 :                : 
                               6511                 :                :     /*
                               6512                 :                :      * The possibly not up-to-date copy in XlogCtl is enough. Even if we
                               6513                 :                :      * grabbed a WAL insertion lock to read the authoritative value in
                               6514                 :                :      * Insert->RedoRecPtr, someone might update it just after we've released
                               6515                 :                :      * the lock.
                               6516                 :                :      */
 4105 andres@anarazel.de       6517         [ +  + ]:         213108 :     SpinLockAcquire(&XLogCtl->info_lck);
                               6518                 :         213108 :     ptr = XLogCtl->RedoRecPtr;
                               6519                 :         213108 :     SpinLockRelease(&XLogCtl->info_lck);
                               6520                 :                : 
 4546 heikki.linnakangas@i     6521         [ +  + ]:         213108 :     if (RedoRecPtr < ptr)
                               6522                 :           1612 :         RedoRecPtr = ptr;
                               6523                 :                : 
 8679 tgl@sss.pgh.pa.us        6524                 :         213108 :     return RedoRecPtr;
                               6525                 :                : }
                               6526                 :                : 
                               6527                 :                : /*
                               6528                 :                :  * Return information needed to decide whether a modified block needs a
                               6529                 :                :  * full-page image to be included in the WAL record.
                               6530                 :                :  *
                               6531                 :                :  * The returned values are cached copies from backend-private memory, and
                               6532                 :                :  * possibly out-of-date or, indeed, uninitialized, in which case they will
                               6533                 :                :  * be InvalidXLogRecPtr and false, respectively.  XLogInsertRecord will
                               6534                 :                :  * re-check them against up-to-date values, while holding the WAL insert lock.
                               6535                 :                :  */
                               6536                 :                : void
 4060 heikki.linnakangas@i     6537                 :       14318113 : GetFullPageWriteInfo(XLogRecPtr *RedoRecPtr_p, bool *doPageWrites_p)
                               6538                 :                : {
                               6539                 :       14318113 :     *RedoRecPtr_p = RedoRecPtr;
                               6540                 :       14318113 :     *doPageWrites_p = doPageWrites;
                               6541                 :       14318113 : }
                               6542                 :                : 
                               6543                 :                : /*
                               6544                 :                :  * GetInsertRecPtr -- Returns the current insert position.
                               6545                 :                :  *
                               6546                 :                :  * NOTE: The value *actually* returned is the position of the last full
                               6547                 :                :  * xlog page. It lags behind the real insert position by at most 1 page.
                               6548                 :                :  * For that, we don't need to scan through WAL insertion locks, and an
                               6549                 :                :  * approximation is enough for the current usage of this function.
                               6550                 :                :  */
                               6551                 :                : XLogRecPtr
 6748 tgl@sss.pgh.pa.us        6552                 :           7248 : GetInsertRecPtr(void)
                               6553                 :                : {
                               6554                 :                :     XLogRecPtr  recptr;
                               6555                 :                : 
 4105 andres@anarazel.de       6556         [ +  + ]:           7248 :     SpinLockAcquire(&XLogCtl->info_lck);
                               6557                 :           7248 :     recptr = XLogCtl->LogwrtRqst.Write;
                               6558                 :           7248 :     SpinLockRelease(&XLogCtl->info_lck);
                               6559                 :                : 
 6748 tgl@sss.pgh.pa.us        6560                 :           7248 :     return recptr;
                               6561                 :                : }
                               6562                 :                : 
                               6563                 :                : /*
                               6564                 :                :  * GetFlushRecPtr -- Returns the current flush position, ie, the last WAL
                               6565                 :                :  * position known to be fsync'd to disk. This should only be used on a
                               6566                 :                :  * system that is known not to be in recovery.
                               6567                 :                :  */
                               6568                 :                : XLogRecPtr
 1504 rhaas@postgresql.org     6569                 :         182804 : GetFlushRecPtr(TimeLineID *insertTLI)
                               6570                 :                : {
 1499                          6571         [ -  + ]:         182804 :     Assert(XLogCtl->SharedRecoveryState == RECOVERY_STATE_DONE);
                               6572                 :                : 
  624 alvherre@alvh.no-ip.     6573                 :         182804 :     RefreshXLogWriteResult(LogwrtResult);
                               6574                 :                : 
                               6575                 :                :     /*
                               6576                 :                :      * If we're writing and flushing WAL, the time line can't be changing, so
                               6577                 :                :      * no lock is required.
                               6578                 :                :      */
 1504 rhaas@postgresql.org     6579         [ +  + ]:         182804 :     if (insertTLI)
 1499                          6580                 :          23109 :         *insertTLI = XLogCtl->InsertTimeLineID;
                               6581                 :                : 
 3628 simon@2ndQuadrant.co     6582                 :         182804 :     return LogwrtResult.Flush;
                               6583                 :                : }
                               6584                 :                : 
                               6585                 :                : /*
                               6586                 :                :  * GetWALInsertionTimeLine -- Returns the current timeline of a system that
                               6587                 :                :  * is not in recovery.
                               6588                 :                :  */
                               6589                 :                : TimeLineID
 1504 rhaas@postgresql.org     6590                 :         111328 : GetWALInsertionTimeLine(void)
                               6591                 :                : {
                               6592         [ -  + ]:         111328 :     Assert(XLogCtl->SharedRecoveryState == RECOVERY_STATE_DONE);
                               6593                 :                : 
                               6594                 :                :     /* Since the value can't be changing, no lock is required. */
 1499                          6595                 :         111328 :     return XLogCtl->InsertTimeLineID;
                               6596                 :                : }
                               6597                 :                : 
                               6598                 :                : /*
                               6599                 :                :  * GetWALInsertionTimeLineIfSet -- If the system is not in recovery, returns
                               6600                 :                :  * the WAL insertion timeline; else, returns 0. Wherever possible, use
                               6601                 :                :  * GetWALInsertionTimeLine() instead, since it's cheaper. Note that this
                               6602                 :                :  * function decides recovery has ended as soon as the insert TLI is set, which
                               6603                 :                :  * happens before we set XLogCtl->SharedRecoveryState to RECOVERY_STATE_DONE.
                               6604                 :                :  */
                               6605                 :                : TimeLineID
  510 rhaas@postgresql.org     6606                 :UBC           0 : GetWALInsertionTimeLineIfSet(void)
                               6607                 :                : {
                               6608                 :                :     TimeLineID  insertTLI;
                               6609                 :                : 
                               6610         [ #  # ]:              0 :     SpinLockAcquire(&XLogCtl->info_lck);
                               6611                 :              0 :     insertTLI = XLogCtl->InsertTimeLineID;
                               6612                 :              0 :     SpinLockRelease(&XLogCtl->info_lck);
                               6613                 :                : 
                               6614                 :              0 :     return insertTLI;
                               6615                 :                : }
                               6616                 :                : 
                               6617                 :                : /*
                               6618                 :                :  * GetLastImportantRecPtr -- Returns the LSN of the last important record
                               6619                 :                :  * inserted. All records not explicitly marked as unimportant are considered
                               6620                 :                :  * important.
                               6621                 :                :  *
                               6622                 :                :  * The LSN is determined by computing the maximum of
                               6623                 :                :  * WALInsertLocks[i].lastImportantAt.
                               6624                 :                :  */
                               6625                 :                : XLogRecPtr
 3283 andres@anarazel.de       6626                 :CBC        1578 : GetLastImportantRecPtr(void)
                               6627                 :                : {
                               6628                 :           1578 :     XLogRecPtr  res = InvalidXLogRecPtr;
                               6629                 :                :     int         i;
                               6630                 :                : 
                               6631         [ +  + ]:          14202 :     for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
                               6632                 :                :     {
                               6633                 :                :         XLogRecPtr  last_important;
                               6634                 :                : 
                               6635                 :                :         /*
                               6636                 :                :          * Need to take a lock to prevent torn reads of the LSN, which are
                               6637                 :                :          * possible on some of the supported platforms. WAL insert locks only
                               6638                 :                :          * support exclusive mode, so we have to use that.
                               6639                 :                :          */
                               6640                 :          12624 :         LWLockAcquire(&WALInsertLocks[i].l.lock, LW_EXCLUSIVE);
                               6641                 :          12624 :         last_important = WALInsertLocks[i].l.lastImportantAt;
                               6642                 :          12624 :         LWLockRelease(&WALInsertLocks[i].l.lock);
                               6643                 :                : 
                               6644         [ +  + ]:          12624 :         if (res < last_important)
                               6645                 :           2740 :             res = last_important;
                               6646                 :                :     }
                               6647                 :                : 
                               6648                 :           1578 :     return res;
                               6649                 :                : }
                               6650                 :                : 
                               6651                 :                : /*
                               6652                 :                :  * Get the time and LSN of the last xlog segment switch
                               6653                 :                :  */
                               6654                 :                : pg_time_t
 3283 andres@anarazel.de       6655                 :UBC           0 : GetLastSegSwitchData(XLogRecPtr *lastSwitchLSN)
                               6656                 :                : {
                               6657                 :                :     pg_time_t   result;
                               6658                 :                : 
                               6659                 :                :     /* Need WALWriteLock, but shared lock is sufficient */
 7063 tgl@sss.pgh.pa.us        6660                 :              0 :     LWLockAcquire(WALWriteLock, LW_SHARED);
 4537 heikki.linnakangas@i     6661                 :              0 :     result = XLogCtl->lastSegSwitchTime;
 3283 andres@anarazel.de       6662                 :              0 :     *lastSwitchLSN = XLogCtl->lastSegSwitchLSN;
 7063 tgl@sss.pgh.pa.us        6663                 :              0 :     LWLockRelease(WALWriteLock);
                               6664                 :                : 
                               6665                 :              0 :     return result;
                               6666                 :                : }
                               6667                 :                : 
                               6668                 :                : /*
                               6669                 :                :  * This must be called ONCE during postmaster or standalone-backend shutdown
                               6670                 :                :  */
                               6671                 :                : void
 8042 peter_e@gmx.net          6672                 :CBC         641 : ShutdownXLOG(int code, Datum arg)
                               6673                 :                : {
                               6674                 :                :     /*
                               6675                 :                :      * We should have an aux process resource owner to use, and we should not
                               6676                 :                :      * be in a transaction that's installed some other resowner.
                               6677                 :                :      */
 2710 tgl@sss.pgh.pa.us        6678         [ -  + ]:            641 :     Assert(AuxProcessResourceOwner != NULL);
                               6679   [ +  +  -  + ]:            641 :     Assert(CurrentResourceOwner == NULL ||
                               6680                 :                :            CurrentResourceOwner == AuxProcessResourceOwner);
                               6681                 :            641 :     CurrentResourceOwner = AuxProcessResourceOwner;
                               6682                 :                : 
                               6683                 :                :     /* Don't be chatty in standalone mode */
 4571                          6684   [ +  +  +  + ]:            641 :     ereport(IsPostmasterEnvironment ? LOG : NOTICE,
                               6685                 :                :             (errmsg("shutting down")));
                               6686                 :                : 
                               6687                 :                :     /*
                               6688                 :                :      * Signal walsenders to move to stopping state.
                               6689                 :                :      */
 3118 andres@anarazel.de       6690                 :            641 :     WalSndInitStopping();
                               6691                 :                : 
                               6692                 :                :     /*
                               6693                 :                :      * Wait for WAL senders to be in stopping state.  This prevents commands
                               6694                 :                :      * from writing new WAL.
                               6695                 :                :      */
                               6696                 :            641 :     WalSndWaitStopping();
                               6697                 :                : 
 6147 heikki.linnakangas@i     6698         [ +  + ]:            641 :     if (RecoveryInProgress())
  160 nathan@postgresql.or     6699                 :GNC          52 :         CreateRestartPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_FAST);
                               6700                 :                :     else
                               6701                 :                :     {
                               6702                 :                :         /*
                               6703                 :                :          * If archiving is enabled, rotate the last XLOG file so that all the
                               6704                 :                :          * remaining records are archived (postmaster wakes up the archiver
                               6705                 :                :          * process one more time at the end of shutdown). The checkpoint
                               6706                 :                :          * record will go to the next XLOG file and won't be archived (yet).
                               6707                 :                :          */
 1414 rhaas@postgresql.org     6708   [ +  +  -  +  :CBC         589 :         if (XLogArchivingActive())
                                              +  + ]
 3283 andres@anarazel.de       6709                 :             12 :             RequestXLogSwitch(false);
                               6710                 :                : 
  160 nathan@postgresql.or     6711                 :GNC         589 :         CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_FAST);
                               6712                 :                :     }
 9570 vadim4o@yahoo.com        6713                 :CBC         641 : }
                               6714                 :                : 
                               6715                 :                : /*
                               6716                 :                :  * Log start of a checkpoint.
                               6717                 :                :  */
                               6718                 :                : static void
 6147 heikki.linnakangas@i     6719                 :           1434 : LogCheckpointStart(int flags, bool restartpoint)
                               6720                 :                : {
 1840 peter@eisentraut.org     6721         [ +  + ]:           1434 :     if (restartpoint)
                               6722   [ +  -  -  +  :            200 :         ereport(LOG,
                                     -  +  +  +  +  
                                     +  +  +  +  +  
                                        -  +  +  + ]
                               6723                 :                :         /* translator: the placeholders show checkpoint options */
                               6724                 :                :                 (errmsg("restartpoint starting:%s%s%s%s%s%s%s%s",
                               6725                 :                :                         (flags & CHECKPOINT_IS_SHUTDOWN) ? " shutdown" : "",
                               6726                 :                :                         (flags & CHECKPOINT_END_OF_RECOVERY) ? " end-of-recovery" : "",
                               6727                 :                :                         (flags & CHECKPOINT_FAST) ? " fast" : "",
                               6728                 :                :                         (flags & CHECKPOINT_FORCE) ? " force" : "",
                               6729                 :                :                         (flags & CHECKPOINT_WAIT) ? " wait" : "",
                               6730                 :                :                         (flags & CHECKPOINT_CAUSE_XLOG) ? " wal" : "",
                               6731                 :                :                         (flags & CHECKPOINT_CAUSE_TIME) ? " time" : "",
                               6732                 :                :                         (flags & CHECKPOINT_FLUSH_UNLOGGED) ? " flush-unlogged" : "")));
                               6733                 :                :     else
                               6734   [ +  -  +  +  :           1234 :         ereport(LOG,
                                     -  +  +  +  +  
                                     +  +  +  +  +  
                                        +  +  +  + ]
                               6735                 :                :         /* translator: the placeholders show checkpoint options */
                               6736                 :                :                 (errmsg("checkpoint starting:%s%s%s%s%s%s%s%s",
                               6737                 :                :                         (flags & CHECKPOINT_IS_SHUTDOWN) ? " shutdown" : "",
                               6738                 :                :                         (flags & CHECKPOINT_END_OF_RECOVERY) ? " end-of-recovery" : "",
                               6739                 :                :                         (flags & CHECKPOINT_FAST) ? " fast" : "",
                               6740                 :                :                         (flags & CHECKPOINT_FORCE) ? " force" : "",
                               6741                 :                :                         (flags & CHECKPOINT_WAIT) ? " wait" : "",
                               6742                 :                :                         (flags & CHECKPOINT_CAUSE_XLOG) ? " wal" : "",
                               6743                 :                :                         (flags & CHECKPOINT_CAUSE_TIME) ? " time" : "",
                               6744                 :                :                         (flags & CHECKPOINT_FLUSH_UNLOGGED) ? " flush-unlogged" : "")));
 6746 tgl@sss.pgh.pa.us        6745                 :           1434 : }
                               6746                 :                : 
                               6747                 :                : /*
                               6748                 :                :  * Log end of a checkpoint.
                               6749                 :                :  */
                               6750                 :                : static void
 6147 heikki.linnakangas@i     6751                 :           1732 : LogCheckpointEnd(bool restartpoint)
                               6752                 :                : {
                               6753                 :                :     long        write_msecs,
                               6754                 :                :                 sync_msecs,
                               6755                 :                :                 total_msecs,
                               6756                 :                :                 longest_msecs,
                               6757                 :                :                 average_msecs;
                               6758                 :                :     uint64      average_sync_time;
                               6759                 :                : 
 6746 tgl@sss.pgh.pa.us        6760                 :           1732 :     CheckpointStats.ckpt_end_t = GetCurrentTimestamp();
                               6761                 :                : 
 1864                          6762                 :           1732 :     write_msecs = TimestampDifferenceMilliseconds(CheckpointStats.ckpt_write_t,
                               6763                 :                :                                                   CheckpointStats.ckpt_sync_t);
                               6764                 :                : 
                               6765                 :           1732 :     sync_msecs = TimestampDifferenceMilliseconds(CheckpointStats.ckpt_sync_t,
                               6766                 :                :                                                  CheckpointStats.ckpt_sync_end_t);
                               6767                 :                : 
                               6768                 :                :     /* Accumulate checkpoint timing summary data, in milliseconds. */
  780 michael@paquier.xyz      6769                 :           1732 :     PendingCheckpointerStats.write_time += write_msecs;
                               6770                 :           1732 :     PendingCheckpointerStats.sync_time += sync_msecs;
                               6771                 :                : 
                               6772                 :                :     /*
                               6773                 :                :      * All of the published timing statistics are accounted for.  Only
                               6774                 :                :      * continue if a log message is to be written.
                               6775                 :                :      */
 5005 rhaas@postgresql.org     6776         [ +  + ]:           1732 :     if (!log_checkpoints)
                               6777                 :            298 :         return;
                               6778                 :                : 
 1864 tgl@sss.pgh.pa.us        6779                 :           1434 :     total_msecs = TimestampDifferenceMilliseconds(CheckpointStats.ckpt_start_t,
                               6780                 :                :                                                   CheckpointStats.ckpt_end_t);
                               6781                 :                : 
                               6782                 :                :     /*
                               6783                 :                :      * Timing values returned from CheckpointStats are in microseconds.
                               6784                 :                :      * Convert to milliseconds for consistent printing.
                               6785                 :                :      */
                               6786                 :           1434 :     longest_msecs = (long) ((CheckpointStats.ckpt_longest_sync + 999) / 1000);
                               6787                 :                : 
 5483 rhaas@postgresql.org     6788                 :           1434 :     average_sync_time = 0;
 5366 bruce@momjian.us         6789         [ -  + ]:           1434 :     if (CheckpointStats.ckpt_sync_rels > 0)
 5483 rhaas@postgresql.org     6790                 :UBC           0 :         average_sync_time = CheckpointStats.ckpt_agg_sync_time /
                               6791                 :              0 :             CheckpointStats.ckpt_sync_rels;
 1864 tgl@sss.pgh.pa.us        6792                 :CBC        1434 :     average_msecs = (long) ((average_sync_time + 999) / 1000);
                               6793                 :                : 
                               6794                 :                :     /*
                               6795                 :                :      * ControlFileLock is not required to see ControlFile->checkPoint and
                               6796                 :                :      * ->checkPointCopy here as we are the only updator of those variables at
                               6797                 :                :      * this moment.
                               6798                 :                :      */
 1840 peter@eisentraut.org     6799         [ +  + ]:           1434 :     if (restartpoint)
                               6800         [ +  - ]:            200 :         ereport(LOG,
                               6801                 :                :                 (errmsg("restartpoint complete: wrote %d buffers (%.1f%%), "
                               6802                 :                :                         "wrote %d SLRU buffers; %d WAL file(s) added, "
                               6803                 :                :                         "%d removed, %d recycled; write=%ld.%03d s, "
                               6804                 :                :                         "sync=%ld.%03d s, total=%ld.%03d s; sync files=%d, "
                               6805                 :                :                         "longest=%ld.%03d s, average=%ld.%03d s; distance=%d kB, "
                               6806                 :                :                         "estimate=%d kB; lsn=%X/%08X, redo lsn=%X/%08X",
                               6807                 :                :                         CheckpointStats.ckpt_bufs_written,
                               6808                 :                :                         (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
                               6809                 :                :                         CheckpointStats.ckpt_slru_written,
                               6810                 :                :                         CheckpointStats.ckpt_segs_added,
                               6811                 :                :                         CheckpointStats.ckpt_segs_removed,
                               6812                 :                :                         CheckpointStats.ckpt_segs_recycled,
                               6813                 :                :                         write_msecs / 1000, (int) (write_msecs % 1000),
                               6814                 :                :                         sync_msecs / 1000, (int) (sync_msecs % 1000),
                               6815                 :                :                         total_msecs / 1000, (int) (total_msecs % 1000),
                               6816                 :                :                         CheckpointStats.ckpt_sync_rels,
                               6817                 :                :                         longest_msecs / 1000, (int) (longest_msecs % 1000),
                               6818                 :                :                         average_msecs / 1000, (int) (average_msecs % 1000),
                               6819                 :                :                         (int) (PrevCheckPointDistance / 1024.0),
                               6820                 :                :                         (int) (CheckPointDistanceEstimate / 1024.0),
                               6821                 :                :                         LSN_FORMAT_ARGS(ControlFile->checkPoint),
                               6822                 :                :                         LSN_FORMAT_ARGS(ControlFile->checkPointCopy.redo))));
                               6823                 :                :     else
                               6824         [ +  - ]:           1234 :         ereport(LOG,
                               6825                 :                :                 (errmsg("checkpoint complete: wrote %d buffers (%.1f%%), "
                               6826                 :                :                         "wrote %d SLRU buffers; %d WAL file(s) added, "
                               6827                 :                :                         "%d removed, %d recycled; write=%ld.%03d s, "
                               6828                 :                :                         "sync=%ld.%03d s, total=%ld.%03d s; sync files=%d, "
                               6829                 :                :                         "longest=%ld.%03d s, average=%ld.%03d s; distance=%d kB, "
                               6830                 :                :                         "estimate=%d kB; lsn=%X/%08X, redo lsn=%X/%08X",
                               6831                 :                :                         CheckpointStats.ckpt_bufs_written,
                               6832                 :                :                         (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
                               6833                 :                :                         CheckpointStats.ckpt_slru_written,
                               6834                 :                :                         CheckpointStats.ckpt_segs_added,
                               6835                 :                :                         CheckpointStats.ckpt_segs_removed,
                               6836                 :                :                         CheckpointStats.ckpt_segs_recycled,
                               6837                 :                :                         write_msecs / 1000, (int) (write_msecs % 1000),
                               6838                 :                :                         sync_msecs / 1000, (int) (sync_msecs % 1000),
                               6839                 :                :                         total_msecs / 1000, (int) (total_msecs % 1000),
                               6840                 :                :                         CheckpointStats.ckpt_sync_rels,
                               6841                 :                :                         longest_msecs / 1000, (int) (longest_msecs % 1000),
                               6842                 :                :                         average_msecs / 1000, (int) (average_msecs % 1000),
                               6843                 :                :                         (int) (PrevCheckPointDistance / 1024.0),
                               6844                 :                :                         (int) (CheckPointDistanceEstimate / 1024.0),
                               6845                 :                :                         LSN_FORMAT_ARGS(ControlFile->checkPoint),
                               6846                 :                :                         LSN_FORMAT_ARGS(ControlFile->checkPointCopy.redo))));
                               6847                 :                : }
                               6848                 :                : 
                               6849                 :                : /*
                               6850                 :                :  * Update the estimate of distance between checkpoints.
                               6851                 :                :  *
                               6852                 :                :  * The estimate is used to calculate the number of WAL segments to keep
                               6853                 :                :  * preallocated, see XLOGfileslop().
                               6854                 :                :  */
                               6855                 :                : static void
 3951 heikki.linnakangas@i     6856                 :           1732 : UpdateCheckPointDistanceEstimate(uint64 nbytes)
                               6857                 :                : {
                               6858                 :                :     /*
                               6859                 :                :      * To estimate the number of segments consumed between checkpoints, keep a
                               6860                 :                :      * moving average of the amount of WAL generated in previous checkpoint
                               6861                 :                :      * cycles. However, if the load is bursty, with quiet periods and busy
                               6862                 :                :      * periods, we want to cater for the peak load. So instead of a plain
                               6863                 :                :      * moving average, let the average decline slowly if the previous cycle
                               6864                 :                :      * used less WAL than estimated, but bump it up immediately if it used
                               6865                 :                :      * more.
                               6866                 :                :      *
                               6867                 :                :      * When checkpoints are triggered by max_wal_size, this should converge to
                               6868                 :                :      * CheckpointSegments * wal_segment_size,
                               6869                 :                :      *
                               6870                 :                :      * Note: This doesn't pay any attention to what caused the checkpoint.
                               6871                 :                :      * Checkpoints triggered manually with CHECKPOINT command, or by e.g.
                               6872                 :                :      * starting a base backup, are counted the same as those created
                               6873                 :                :      * automatically. The slow-decline will largely mask them out, if they are
                               6874                 :                :      * not frequent. If they are frequent, it seems reasonable to count them
                               6875                 :                :      * in as any others; if you issue a manual checkpoint every 5 minutes and
                               6876                 :                :      * never let a timed checkpoint happen, it makes sense to base the
                               6877                 :                :      * preallocation on that 5 minute interval rather than whatever
                               6878                 :                :      * checkpoint_timeout is set to.
                               6879                 :                :      */
                               6880                 :           1732 :     PrevCheckPointDistance = nbytes;
                               6881         [ +  + ]:           1732 :     if (CheckPointDistanceEstimate < nbytes)
                               6882                 :            727 :         CheckPointDistanceEstimate = nbytes;
                               6883                 :                :     else
                               6884                 :           1005 :         CheckPointDistanceEstimate =
                               6885                 :           1005 :             (0.90 * CheckPointDistanceEstimate + 0.10 * (double) nbytes);
 6746 tgl@sss.pgh.pa.us        6886                 :           1732 : }
                               6887                 :                : 
                               6888                 :                : /*
                               6889                 :                :  * Update the ps display for a process running a checkpoint.  Note that
                               6890                 :                :  * this routine should not do any allocations so as it can be called
                               6891                 :                :  * from a critical section.
                               6892                 :                :  */
                               6893                 :                : static void
 1830 michael@paquier.xyz      6894                 :           3464 : update_checkpoint_display(int flags, bool restartpoint, bool reset)
                               6895                 :                : {
                               6896                 :                :     /*
                               6897                 :                :      * The status is reported only for end-of-recovery and shutdown
                               6898                 :                :      * checkpoints or shutdown restartpoints.  Updating the ps display is
                               6899                 :                :      * useful in those situations as it may not be possible to rely on
                               6900                 :                :      * pg_stat_activity to see the status of the checkpointer or the startup
                               6901                 :                :      * process.
                               6902                 :                :      */
                               6903         [ +  + ]:           3464 :     if ((flags & (CHECKPOINT_END_OF_RECOVERY | CHECKPOINT_IS_SHUTDOWN)) == 0)
                               6904                 :           2188 :         return;
                               6905                 :                : 
                               6906         [ +  + ]:           1276 :     if (reset)
                               6907                 :            638 :         set_ps_display("");
                               6908                 :                :     else
                               6909                 :                :     {
                               6910                 :                :         char        activitymsg[128];
                               6911                 :                : 
                               6912         [ +  + ]:           1914 :         snprintf(activitymsg, sizeof(activitymsg), "performing %s%s%s",
                               6913         [ +  + ]:            638 :                  (flags & CHECKPOINT_END_OF_RECOVERY) ? "end-of-recovery " : "",
                               6914         [ +  + ]:            638 :                  (flags & CHECKPOINT_IS_SHUTDOWN) ? "shutdown " : "",
                               6915                 :                :                  restartpoint ? "restartpoint" : "checkpoint");
                               6916                 :            638 :         set_ps_display(activitymsg);
                               6917                 :                :     }
                               6918                 :                : }
                               6919                 :                : 
                               6920                 :                : 
                               6921                 :                : /*
                               6922                 :                :  * Perform a checkpoint --- either during shutdown, or on-the-fly
                               6923                 :                :  *
                               6924                 :                :  * flags is a bitwise OR of the following:
                               6925                 :                :  *  CHECKPOINT_IS_SHUTDOWN: checkpoint is for database shutdown.
                               6926                 :                :  *  CHECKPOINT_END_OF_RECOVERY: checkpoint is for end of WAL recovery.
                               6927                 :                :  *  CHECKPOINT_FAST: finish the checkpoint ASAP, ignoring
                               6928                 :                :  *      checkpoint_completion_target parameter.
                               6929                 :                :  *  CHECKPOINT_FORCE: force a checkpoint even if no XLOG activity has occurred
                               6930                 :                :  *      since the last one (implied by CHECKPOINT_IS_SHUTDOWN or
                               6931                 :                :  *      CHECKPOINT_END_OF_RECOVERY).
                               6932                 :                :  *  CHECKPOINT_FLUSH_UNLOGGED: also flush buffers of unlogged tables.
                               6933                 :                :  *
                               6934                 :                :  * Note: flags contains other bits, of interest here only for logging purposes.
                               6935                 :                :  * In particular note that this routine is synchronous and does not pay
                               6936                 :                :  * attention to CHECKPOINT_WAIT.
                               6937                 :                :  *
                               6938                 :                :  * If !shutdown then we are writing an online checkpoint. An XLOG_CHECKPOINT_REDO
                               6939                 :                :  * record is inserted into WAL at the logical location of the checkpoint, before
                               6940                 :                :  * flushing anything to disk, and when the checkpoint is eventually completed,
                               6941                 :                :  * and it is from this point that WAL replay will begin in the case of a recovery
                               6942                 :                :  * from this checkpoint. Once everything is written to disk, an
                               6943                 :                :  * XLOG_CHECKPOINT_ONLINE record is written to complete the checkpoint, and
                               6944                 :                :  * points back to the earlier XLOG_CHECKPOINT_REDO record. This mechanism allows
                               6945                 :                :  * other write-ahead log records to be written while the checkpoint is in
                               6946                 :                :  * progress, but we must be very careful about order of operations. This function
                               6947                 :                :  * may take many minutes to execute on a busy system.
                               6948                 :                :  *
                               6949                 :                :  * On the other hand, when shutdown is true, concurrent insertion into the
                               6950                 :                :  * write-ahead log is impossible, so there is no need for two separate records.
                               6951                 :                :  * In this case, we only insert an XLOG_CHECKPOINT_SHUTDOWN record, and it's
                               6952                 :                :  * both the record marking the completion of the checkpoint and the location
                               6953                 :                :  * from which WAL replay would begin if needed.
                               6954                 :                :  *
                               6955                 :                :  * Returns true if a new checkpoint was performed, or false if it was skipped
                               6956                 :                :  * because the system was idle.
                               6957                 :                :  */
                               6958                 :                : bool
 6748 tgl@sss.pgh.pa.us        6959                 :           1532 : CreateCheckPoint(int flags)
                               6960                 :                : {
                               6961                 :                :     bool        shutdown;
                               6962                 :                :     CheckPoint  checkPoint;
                               6963                 :                :     XLogRecPtr  recptr;
                               6964                 :                :     XLogSegNo   _logSegNo;
 9381 bruce@momjian.us         6965                 :           1532 :     XLogCtlInsert *Insert = &XLogCtl->Insert;
                               6966                 :                :     uint32      freespace;
                               6967                 :                :     XLogRecPtr  PriorRedoPtr;
                               6968                 :                :     XLogRecPtr  last_important_lsn;
                               6969                 :                :     VirtualTransactionId *vxids;
                               6970                 :                :     int         nvxids;
 1515 rhaas@postgresql.org     6971                 :           1532 :     int         oldXLogAllowed = 0;
                               6972                 :                : 
                               6973                 :                :     /*
                               6974                 :                :      * An end-of-recovery checkpoint is really a shutdown checkpoint, just
                               6975                 :                :      * issued at a different time.
                               6976                 :                :      */
 6019 tgl@sss.pgh.pa.us        6977         [ +  + ]:           1532 :     if (flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY))
 6020 heikki.linnakangas@i     6978                 :            617 :         shutdown = true;
                               6979                 :                :     else
                               6980                 :            915 :         shutdown = false;
                               6981                 :                : 
                               6982                 :                :     /* sanity check */
 6019 tgl@sss.pgh.pa.us        6983   [ +  +  -  + ]:           1532 :     if (RecoveryInProgress() && (flags & CHECKPOINT_END_OF_RECOVERY) == 0)
 6019 tgl@sss.pgh.pa.us        6984         [ #  # ]:UBC           0 :         elog(ERROR, "can't create a checkpoint during recovery");
                               6985                 :                : 
                               6986                 :                :     /*
                               6987                 :                :      * Prepare to accumulate statistics.
                               6988                 :                :      *
                               6989                 :                :      * Note: because it is possible for log_checkpoints to change while a
                               6990                 :                :      * checkpoint proceeds, we always accumulate stats, even if
                               6991                 :                :      * log_checkpoints is currently off.
                               6992                 :                :      */
 6746 tgl@sss.pgh.pa.us        6993   [ +  -  +  -  :CBC       16852 :     MemSet(&CheckpointStats, 0, sizeof(CheckpointStats));
                                     +  -  +  -  +  
                                                 + ]
                               6994                 :           1532 :     CheckpointStats.ckpt_start_t = GetCurrentTimestamp();
                               6995                 :                : 
                               6996                 :                :     /*
                               6997                 :                :      * Let smgr prepare for checkpoint; this has to happen outside the
                               6998                 :                :      * critical section and before we determine the REDO pointer.  Note that
                               6999                 :                :      * smgr must not do anything that'd have to be undone if we decide no
                               7000                 :                :      * checkpoint is needed.
                               7001                 :                :      */
 1373 tmunro@postgresql.or     7002                 :           1532 :     SyncPreCheckpoint();
                               7003                 :                : 
                               7004                 :                :     /* Run these points outside the critical section. */
                               7005                 :                :     INJECTION_POINT("create-checkpoint-initial", NULL);
                               7006                 :                :     INJECTION_POINT_LOAD("create-checkpoint-run");
                               7007                 :                : 
                               7008                 :                :     /*
                               7009                 :                :      * Use a critical section to force system panic if we have trouble.
                               7010                 :                :      */
 8846 tgl@sss.pgh.pa.us        7011                 :           1532 :     START_CRIT_SECTION();
                               7012                 :                : 
 9579 vadim4o@yahoo.com        7013         [ +  + ]:           1532 :     if (shutdown)
                               7014                 :                :     {
 6147 heikki.linnakangas@i     7015                 :            617 :         LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 9579 vadim4o@yahoo.com        7016                 :            617 :         ControlFile->state = DB_SHUTDOWNING;
                               7017                 :            617 :         UpdateControlFile();
 6147 heikki.linnakangas@i     7018                 :            617 :         LWLockRelease(ControlFileLock);
                               7019                 :                :     }
                               7020                 :                : 
                               7021                 :                :     /* Begin filling in the checkpoint WAL record */
 8246 tgl@sss.pgh.pa.us        7022   [ +  -  +  -  :          18384 :     MemSet(&checkPoint, 0, sizeof(checkPoint));
                                     +  -  +  -  +  
                                                 + ]
 6514                          7023                 :           1532 :     checkPoint.time = (pg_time_t) time(NULL);
                               7024                 :                : 
                               7025                 :                :     /*
                               7026                 :                :      * For Hot Standby, derive the oldestActiveXid before we fix the redo
                               7027                 :                :      * pointer. This allows us to begin accumulating changes to assemble our
                               7028                 :                :      * starting snapshot of locks and transactions.
                               7029                 :                :      */
 5160 simon@2ndQuadrant.co     7030   [ +  +  +  + ]:           1532 :     if (!shutdown && XLogStandbyInfoActive())
  148 akapila@postgresql.o     7031                 :GNC         864 :         checkPoint.oldestActiveXid = GetOldestActiveTransactionId(false, true);
                               7032                 :                :     else
 5160 simon@2ndQuadrant.co     7033                 :CBC         668 :         checkPoint.oldestActiveXid = InvalidTransactionId;
                               7034                 :                : 
                               7035                 :                :     /*
                               7036                 :                :      * Get location of last important record before acquiring insert locks (as
                               7037                 :                :      * GetLastImportantRecPtr() also locks WAL locks).
                               7038                 :                :      */
 3283 andres@anarazel.de       7039                 :           1532 :     last_important_lsn = GetLastImportantRecPtr();
                               7040                 :                : 
                               7041                 :                :     /*
                               7042                 :                :      * If this isn't a shutdown or forced checkpoint, and if there has been no
                               7043                 :                :      * WAL activity requiring a checkpoint, skip it.  The idea here is to
                               7044                 :                :      * avoid inserting duplicate checkpoints when the system is idle.
                               7045                 :                :      */
 6020 heikki.linnakangas@i     7046         [ +  + ]:           1532 :     if ((flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY |
                               7047                 :                :                   CHECKPOINT_FORCE)) == 0)
                               7048                 :                :     {
 3283 andres@anarazel.de       7049         [ -  + ]:            190 :         if (last_important_lsn == ControlFile->checkPoint)
                               7050                 :                :         {
 9046 tgl@sss.pgh.pa.us        7051         [ #  # ]:LBC         (1) :             END_CRIT_SECTION();
 3283 andres@anarazel.de       7052         [ #  # ]:            (1) :             ereport(DEBUG1,
                               7053                 :                :                     (errmsg_internal("checkpoint skipped because system is idle")));
  444 fujii@postgresql.org     7054                 :            (1) :             return false;
                               7055                 :                :         }
                               7056                 :                :     }
                               7057                 :                : 
                               7058                 :                :     /*
                               7059                 :                :      * An end-of-recovery checkpoint is created before anyone is allowed to
                               7060                 :                :      * write WAL. To allow us to write the checkpoint record, temporarily
                               7061                 :                :      * enable XLogInsertAllowed.
                               7062                 :                :      */
 5957 heikki.linnakangas@i     7063         [ +  + ]:CBC        1532 :     if (flags & CHECKPOINT_END_OF_RECOVERY)
 1515 rhaas@postgresql.org     7064                 :             28 :         oldXLogAllowed = LocalSetXLogInsertAllowed();
                               7065                 :                : 
 1499                          7066                 :           1532 :     checkPoint.ThisTimeLineID = XLogCtl->InsertTimeLineID;
 4693 heikki.linnakangas@i     7067         [ +  + ]:           1532 :     if (flags & CHECKPOINT_END_OF_RECOVERY)
                               7068                 :             28 :         checkPoint.PrevTimeLineID = XLogCtl->PrevTimeLineID;
                               7069                 :                :     else
 1504 rhaas@postgresql.org     7070                 :           1504 :         checkPoint.PrevTimeLineID = checkPoint.ThisTimeLineID;
                               7071                 :                : 
                               7072                 :                :     /*
                               7073                 :                :      * We must block concurrent insertions while examining insert state.
                               7074                 :                :      */
  791                          7075                 :           1532 :     WALInsertLockAcquireExclusive();
                               7076                 :                : 
                               7077                 :           1532 :     checkPoint.fullPageWrites = Insert->fullPageWrites;
  518                          7078                 :           1532 :     checkPoint.wal_level = wal_level;
                               7079                 :                : 
  791                          7080         [ +  + ]:           1532 :     if (shutdown)
                               7081                 :                :     {
                               7082                 :            617 :         XLogRecPtr  curInsert = XLogBytePosToRecPtr(Insert->CurrBytePos);
                               7083                 :                : 
                               7084                 :                :         /*
                               7085                 :                :          * Compute new REDO record ptr = location of next XLOG record.
                               7086                 :                :          *
                               7087                 :                :          * Since this is a shutdown checkpoint, there can't be any concurrent
                               7088                 :                :          * WAL insertion.
                               7089                 :                :          */
                               7090         [ +  - ]:            617 :         freespace = INSERT_FREESPACE(curInsert);
                               7091         [ -  + ]:            617 :         if (freespace == 0)
                               7092                 :                :         {
  791 rhaas@postgresql.org     7093         [ #  # ]:UBC           0 :             if (XLogSegmentOffset(curInsert, wal_segment_size) == 0)
                               7094                 :              0 :                 curInsert += SizeOfXLogLongPHD;
                               7095                 :                :             else
                               7096                 :              0 :                 curInsert += SizeOfXLogShortPHD;
                               7097                 :                :         }
  791 rhaas@postgresql.org     7098                 :CBC         617 :         checkPoint.redo = curInsert;
                               7099                 :                : 
                               7100                 :                :         /*
                               7101                 :                :          * Here we update the shared RedoRecPtr for future XLogInsert calls;
                               7102                 :                :          * this must be done while holding all the insertion locks.
                               7103                 :                :          *
                               7104                 :                :          * Note: if we fail to complete the checkpoint, RedoRecPtr will be
                               7105                 :                :          * left pointing past where it really needs to point.  This is okay;
                               7106                 :                :          * the only consequence is that XLogInsert might back up whole buffers
                               7107                 :                :          * that it didn't really need to.  We can't postpone advancing
                               7108                 :                :          * RedoRecPtr because XLogInserts that happen while we are dumping
                               7109                 :                :          * buffers must assume that their buffer changes are not included in
                               7110                 :                :          * the checkpoint.
                               7111                 :                :          */
                               7112                 :            617 :         RedoRecPtr = XLogCtl->Insert.RedoRecPtr = checkPoint.redo;
                               7113                 :                :     }
                               7114                 :                : 
                               7115                 :                :     /*
                               7116                 :                :      * Now we can release the WAL insertion locks, allowing other xacts to
                               7117                 :                :      * proceed while we are flushing disk buffers.
                               7118                 :                :      */
 4290 heikki.linnakangas@i     7119                 :           1532 :     WALInsertLockRelease();
                               7120                 :                : 
                               7121                 :                :     /*
                               7122                 :                :      * If this is an online checkpoint, we have not yet determined the redo
                               7123                 :                :      * point. We do so now by inserting the special XLOG_CHECKPOINT_REDO
                               7124                 :                :      * record; the LSN at which it starts becomes the new redo pointer. We
                               7125                 :                :      * don't do this for a shutdown checkpoint, because in that case no WAL
                               7126                 :                :      * can be written between the redo point and the insertion of the
                               7127                 :                :      * checkpoint record itself, so the checkpoint record itself serves to
                               7128                 :                :      * mark the redo point.
                               7129                 :                :      */
  791 rhaas@postgresql.org     7130         [ +  + ]:           1532 :     if (!shutdown)
                               7131                 :                :     {
                               7132                 :                :         /* Include WAL level in record for WAL summarizer's benefit. */
                               7133                 :            915 :         XLogBeginInsert();
  310 peter@eisentraut.org     7134                 :            915 :         XLogRegisterData(&wal_level, sizeof(wal_level));
  791 rhaas@postgresql.org     7135                 :            915 :         (void) XLogInsert(RM_XLOG_ID, XLOG_CHECKPOINT_REDO);
                               7136                 :                : 
                               7137                 :                :         /*
                               7138                 :                :          * XLogInsertRecord will have updated XLogCtl->Insert.RedoRecPtr in
                               7139                 :                :          * shared memory and RedoRecPtr in backend-local memory, but we need
                               7140                 :                :          * to copy that into the record that will be inserted when the
                               7141                 :                :          * checkpoint is complete.
                               7142                 :                :          */
                               7143                 :            915 :         checkPoint.redo = RedoRecPtr;
                               7144                 :                :     }
                               7145                 :                : 
                               7146                 :                :     /* Update the info_lck-protected copy of RedoRecPtr as well */
 4105 andres@anarazel.de       7147         [ -  + ]:           1532 :     SpinLockAcquire(&XLogCtl->info_lck);
                               7148                 :           1532 :     XLogCtl->RedoRecPtr = checkPoint.redo;
                               7149                 :           1532 :     SpinLockRelease(&XLogCtl->info_lck);
                               7150                 :                : 
                               7151                 :                :     /*
                               7152                 :                :      * If enabled, log checkpoint start.  We postpone this until now so as not
                               7153                 :                :      * to log anything if we decided to skip the checkpoint.
                               7154                 :                :      */
 6746 tgl@sss.pgh.pa.us        7155         [ +  + ]:           1532 :     if (log_checkpoints)
 6147 heikki.linnakangas@i     7156                 :           1234 :         LogCheckpointStart(flags, false);
                               7157                 :                : 
                               7158                 :                :     INJECTION_POINT_CACHED("create-checkpoint-run", NULL);
                               7159                 :                : 
                               7160                 :                :     /* Update the process title */
 1830 michael@paquier.xyz      7161                 :           1532 :     update_checkpoint_display(flags, false, false);
                               7162                 :                : 
                               7163                 :                :     TRACE_POSTGRESQL_CHECKPOINT_START(flags);
                               7164                 :                : 
                               7165                 :                :     /*
                               7166                 :                :      * Get the other info we need for the checkpoint record.
                               7167                 :                :      *
                               7168                 :                :      * We don't need to save oldestClogXid in the checkpoint, it only matters
                               7169                 :                :      * for the short period in which clog is being truncated, and if we crash
                               7170                 :                :      * during that we'll redo the clog truncation and fix up oldestClogXid
                               7171                 :                :      * there.
                               7172                 :                :      */
 4276 heikki.linnakangas@i     7173                 :           1532 :     LWLockAcquire(XidGenLock, LW_SHARED);
  741                          7174                 :           1532 :     checkPoint.nextXid = TransamVariables->nextXid;
                               7175                 :           1532 :     checkPoint.oldestXid = TransamVariables->oldestXid;
                               7176                 :           1532 :     checkPoint.oldestXidDB = TransamVariables->oldestXidDB;
 4276                          7177                 :           1532 :     LWLockRelease(XidGenLock);
                               7178                 :                : 
 4033 alvherre@alvh.no-ip.     7179                 :           1532 :     LWLockAcquire(CommitTsLock, LW_SHARED);
  741 heikki.linnakangas@i     7180                 :           1532 :     checkPoint.oldestCommitTsXid = TransamVariables->oldestCommitTsXid;
                               7181                 :           1532 :     checkPoint.newestCommitTsXid = TransamVariables->newestCommitTsXid;
 4033 alvherre@alvh.no-ip.     7182                 :           1532 :     LWLockRelease(CommitTsLock);
                               7183                 :                : 
 4276 heikki.linnakangas@i     7184                 :           1532 :     LWLockAcquire(OidGenLock, LW_SHARED);
  741                          7185                 :           1532 :     checkPoint.nextOid = TransamVariables->nextOid;
 4276                          7186         [ +  + ]:           1532 :     if (!shutdown)
  741                          7187                 :            915 :         checkPoint.nextOid += TransamVariables->oidCount;
 4276                          7188                 :           1532 :     LWLockRelease(OidGenLock);
                               7189                 :                : 
                               7190                 :           1532 :     MultiXactGetCheckptMulti(shutdown,
                               7191                 :                :                              &checkPoint.nextMulti,
                               7192                 :                :                              &checkPoint.nextMultiOffset,
                               7193                 :                :                              &checkPoint.oldestMulti,
                               7194                 :                :                              &checkPoint.oldestMultiDB);
                               7195                 :                : 
                               7196                 :                :     /*
                               7197                 :                :      * Having constructed the checkpoint record, ensure all shmem disk buffers
                               7198                 :                :      * and commit-log buffers are flushed to disk.
                               7199                 :                :      *
                               7200                 :                :      * This I/O could fail for various reasons.  If so, we will fail to
                               7201                 :                :      * complete the checkpoint, but there is no reason to force a system
                               7202                 :                :      * panic. Accordingly, exit critical section while doing it.
                               7203                 :                :      */
                               7204         [ -  + ]:           1532 :     END_CRIT_SECTION();
                               7205                 :                : 
                               7206                 :                :     /*
                               7207                 :                :      * In some cases there are groups of actions that must all occur on one
                               7208                 :                :      * side or the other of a checkpoint record. Before flushing the
                               7209                 :                :      * checkpoint record we must explicitly wait for any backend currently
                               7210                 :                :      * performing those groups of actions.
                               7211                 :                :      *
                               7212                 :                :      * One example is end of transaction, so we must wait for any transactions
                               7213                 :                :      * that are currently in commit critical sections.  If an xact inserted
                               7214                 :                :      * its commit record into XLOG just before the REDO point, then a crash
                               7215                 :                :      * restart from the REDO point would not replay that record, which means
                               7216                 :                :      * that our flushing had better include the xact's update of pg_xact.  So
                               7217                 :                :      * we wait till he's out of his commit critical section before proceeding.
                               7218                 :                :      * See notes in RecordTransactionCommit().
                               7219                 :                :      *
                               7220                 :                :      * Because we've already released the insertion locks, this test is a bit
                               7221                 :                :      * fuzzy: it is possible that we will wait for xacts we didn't really need
                               7222                 :                :      * to wait for.  But the delay should be short and it seems better to make
                               7223                 :                :      * checkpoint take a bit longer than to hold off insertions longer than
                               7224                 :                :      * necessary. (In fact, the whole reason we have this issue is that xact.c
                               7225                 :                :      * does commit record XLOG insertion and clog update as two separate steps
                               7226                 :                :      * protected by different locks, but again that seems best on grounds of
                               7227                 :                :      * minimizing lock contention.)
                               7228                 :                :      *
                               7229                 :                :      * A transaction that has not yet set delayChkptFlags when we look cannot
                               7230                 :                :      * be at risk, since it has not inserted its commit record yet; and one
                               7231                 :                :      * that's already cleared it is not at risk either, since it's done fixing
                               7232                 :                :      * clog and we will correctly flush the update below.  So we cannot miss
                               7233                 :                :      * any xacts we need to wait for.
                               7234                 :                :      */
 1365 rhaas@postgresql.org     7235                 :           1532 :     vxids = GetVirtualXIDsDelayingChkpt(&nvxids, DELAY_CHKPT_START);
 4763 simon@2ndQuadrant.co     7236         [ +  + ]:           1532 :     if (nvxids > 0)
                               7237                 :                :     {
                               7238                 :                :         do
                               7239                 :                :         {
                               7240                 :                :             /*
                               7241                 :                :              * Keep absorbing fsync requests while we wait. There could even
                               7242                 :                :              * be a deadlock if we don't, if the process that prevents the
                               7243                 :                :              * checkpoint is trying to add a request to the queue.
                               7244                 :                :              */
  545 heikki.linnakangas@i     7245                 :             45 :             AbsorbSyncRequests();
                               7246                 :                : 
  797 tmunro@postgresql.or     7247                 :             45 :             pgstat_report_wait_start(WAIT_EVENT_CHECKPOINT_DELAY_START);
 6608 bruce@momjian.us         7248                 :             45 :             pg_usleep(10000L);  /* wait for 10 msec */
  797 tmunro@postgresql.or     7249                 :             45 :             pgstat_report_wait_end();
 1365 rhaas@postgresql.org     7250         [ +  + ]:             45 :         } while (HaveVirtualXIDsDelayingChkpt(vxids, nvxids,
                               7251                 :                :                                               DELAY_CHKPT_START));
                               7252                 :                :     }
 4763 simon@2ndQuadrant.co     7253                 :           1532 :     pfree(vxids);
                               7254                 :                : 
 6748 tgl@sss.pgh.pa.us        7255                 :           1532 :     CheckPointGuts(checkPoint.redo, flags);
                               7256                 :                : 
 1365 rhaas@postgresql.org     7257                 :           1532 :     vxids = GetVirtualXIDsDelayingChkpt(&nvxids, DELAY_CHKPT_COMPLETE);
                               7258         [ -  + ]:           1532 :     if (nvxids > 0)
                               7259                 :                :     {
                               7260                 :                :         do
                               7261                 :                :         {
  545 heikki.linnakangas@i     7262                 :UBC           0 :             AbsorbSyncRequests();
                               7263                 :                : 
  797 tmunro@postgresql.or     7264                 :              0 :             pgstat_report_wait_start(WAIT_EVENT_CHECKPOINT_DELAY_COMPLETE);
 1365 rhaas@postgresql.org     7265                 :              0 :             pg_usleep(10000L);  /* wait for 10 msec */
  797 tmunro@postgresql.or     7266                 :              0 :             pgstat_report_wait_end();
 1365 rhaas@postgresql.org     7267         [ #  # ]:              0 :         } while (HaveVirtualXIDsDelayingChkpt(vxids, nvxids,
                               7268                 :                :                                               DELAY_CHKPT_COMPLETE));
                               7269                 :                :     }
 1365 rhaas@postgresql.org     7270                 :CBC        1532 :     pfree(vxids);
                               7271                 :                : 
                               7272                 :                :     /*
                               7273                 :                :      * Take a snapshot of running transactions and write this to WAL. This
                               7274                 :                :      * allows us to reconstruct the state of running transactions during
                               7275                 :                :      * archive recovery, if required. Skip, if this info disabled.
                               7276                 :                :      *
                               7277                 :                :      * If we are shutting down, or Startup process is completing crash
                               7278                 :                :      * recovery we don't need to write running xact data.
                               7279                 :                :      */
 5843 simon@2ndQuadrant.co     7280   [ +  +  +  + ]:           1532 :     if (!shutdown && XLogStandbyInfoActive())
 4764 tgl@sss.pgh.pa.us        7281                 :            864 :         LogStandbySnapshot();
                               7282                 :                : 
 8258                          7283                 :           1532 :     START_CRIT_SECTION();
                               7284                 :                : 
                               7285                 :                :     /*
                               7286                 :                :      * Now insert the checkpoint record into XLOG.
                               7287                 :                :      */
 4046 heikki.linnakangas@i     7288                 :           1532 :     XLogBeginInsert();
  310 peter@eisentraut.org     7289                 :           1532 :     XLogRegisterData(&checkPoint, sizeof(checkPoint));
 9046 tgl@sss.pgh.pa.us        7290         [ +  + ]:           1532 :     recptr = XLogInsert(RM_XLOG_ID,
                               7291                 :                :                         shutdown ? XLOG_CHECKPOINT_SHUTDOWN :
                               7292                 :                :                         XLOG_CHECKPOINT_ONLINE);
                               7293                 :                : 
                               7294                 :           1532 :     XLogFlush(recptr);
                               7295                 :                : 
                               7296                 :                :     /*
                               7297                 :                :      * We mustn't write any new WAL after a shutdown checkpoint, or it will be
                               7298                 :                :      * overwritten at next startup.  No-one should even try, this just allows
                               7299                 :                :      * sanity-checking.  In the case of an end-of-recovery checkpoint, we want
                               7300                 :                :      * to just temporarily disable writing until the system has exited
                               7301                 :                :      * recovery.
                               7302                 :                :      */
 6019                          7303         [ +  + ]:           1532 :     if (shutdown)
                               7304                 :                :     {
                               7305         [ +  + ]:            617 :         if (flags & CHECKPOINT_END_OF_RECOVERY)
 1515 rhaas@postgresql.org     7306                 :             28 :             LocalXLogInsertAllowed = oldXLogAllowed;
                               7307                 :                :         else
 5774 bruce@momjian.us         7308                 :            589 :             LocalXLogInsertAllowed = 0; /* never again write WAL */
                               7309                 :                :     }
                               7310                 :                : 
                               7311                 :                :     /*
                               7312                 :                :      * We now have ProcLastRecPtr = start of actual checkpoint record, recptr
                               7313                 :                :      * = end of actual checkpoint record.
                               7314                 :                :      */
 4738 alvherre@alvh.no-ip.     7315   [ +  +  -  + ]:           1532 :     if (shutdown && checkPoint.redo != ProcLastRecPtr)
 8186 tgl@sss.pgh.pa.us        7316         [ #  # ]:UBC           0 :         ereport(PANIC,
                               7317                 :                :                 (errmsg("concurrent write-ahead log activity while database system is shutting down")));
                               7318                 :                : 
                               7319                 :                :     /*
                               7320                 :                :      * Remember the prior checkpoint's redo ptr for
                               7321                 :                :      * UpdateCheckPointDistanceEstimate()
                               7322                 :                :      */
 3951 heikki.linnakangas@i     7323                 :CBC        1532 :     PriorRedoPtr = ControlFile->checkPointCopy.redo;
                               7324                 :                : 
                               7325                 :                :     /*
                               7326                 :                :      * Update the control file.
                               7327                 :                :      */
 8846 tgl@sss.pgh.pa.us        7328                 :           1532 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 9579 vadim4o@yahoo.com        7329         [ +  + ]:           1532 :     if (shutdown)
                               7330                 :            617 :         ControlFile->state = DB_SHUTDOWNED;
 9046 tgl@sss.pgh.pa.us        7331                 :           1532 :     ControlFile->checkPoint = ProcLastRecPtr;
                               7332                 :           1532 :     ControlFile->checkPointCopy = checkPoint;
                               7333                 :                :     /* crash recovery should always recover to the end of WAL */
 4739 alvherre@alvh.no-ip.     7334                 :           1532 :     ControlFile->minRecoveryPoint = InvalidXLogRecPtr;
 4762 heikki.linnakangas@i     7335                 :           1532 :     ControlFile->minRecoveryPointTLI = 0;
                               7336                 :                : 
                               7337                 :                :     /*
                               7338                 :                :      * Persist unloggedLSN value. It's reset on crash recovery, so this goes
                               7339                 :                :      * unused on non-shutdown checkpoints, but seems useful to store it always
                               7340                 :                :      * for debugging purposes.
                               7341                 :                :      */
  658 nathan@postgresql.or     7342                 :           1532 :     ControlFile->unloggedLSN = pg_atomic_read_membarrier_u64(&XLogCtl->unloggedLSN);
                               7343                 :                : 
 9579 vadim4o@yahoo.com        7344                 :           1532 :     UpdateControlFile();
 8846 tgl@sss.pgh.pa.us        7345                 :           1532 :     LWLockRelease(ControlFileLock);
                               7346                 :                : 
                               7347                 :                :     /*
                               7348                 :                :      * We are now done with critical updates; no need for system panic if we
                               7349                 :                :      * have trouble while fooling with old log segments.
                               7350                 :                :      */
 8258                          7351         [ -  + ]:           1532 :     END_CRIT_SECTION();
                               7352                 :                : 
                               7353                 :                :     /*
                               7354                 :                :      * WAL summaries end when the next XLOG_CHECKPOINT_REDO or
                               7355                 :                :      * XLOG_CHECKPOINT_SHUTDOWN record is reached. This is the first point
                               7356                 :                :      * where (a) we're not inside of a critical section and (b) we can be
                               7357                 :                :      * certain that the relevant record has been flushed to disk, which must
                               7358                 :                :      * happen before it can be summarized.
                               7359                 :                :      *
                               7360                 :                :      * If this is a shutdown checkpoint, then this happens reasonably
                               7361                 :                :      * promptly: we've only just inserted and flushed the
                               7362                 :                :      * XLOG_CHECKPOINT_SHUTDOWN record. If this is not a shutdown checkpoint,
                               7363                 :                :      * then this might not be very prompt at all: the XLOG_CHECKPOINT_REDO
                               7364                 :                :      * record was written before we began flushing data to disk, and that
                               7365                 :                :      * could be many minutes ago at this point. However, we don't XLogFlush()
                               7366                 :                :      * after inserting that record, so we're not guaranteed that it's on disk
                               7367                 :                :      * until after the above call that flushes the XLOG_CHECKPOINT_ONLINE
                               7368                 :                :      * record.
                               7369                 :                :      */
  412 heikki.linnakangas@i     7370                 :           1532 :     WakeupWalSummarizer();
                               7371                 :                : 
                               7372                 :                :     /*
                               7373                 :                :      * Let smgr do post-checkpoint cleanup (eg, deleting old files).
                               7374                 :                :      */
 2450 tmunro@postgresql.or     7375                 :           1532 :     SyncPostCheckpoint();
                               7376                 :                : 
                               7377                 :                :     /*
                               7378                 :                :      * Update the average distance between checkpoints if the prior checkpoint
                               7379                 :                :      * exists.
                               7380                 :                :      */
   42 alvherre@kurilemu.de     7381         [ +  - ]:GNC        1532 :     if (XLogRecPtrIsValid(PriorRedoPtr))
 3951 heikki.linnakangas@i     7382                 :CBC        1532 :         UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr);
                               7383                 :                : 
                               7384                 :                :     INJECTION_POINT("checkpoint-before-old-wal-removal", NULL);
                               7385                 :                : 
                               7386                 :                :     /*
                               7387                 :                :      * Delete old log files, those no longer needed for last checkpoint to
                               7388                 :                :      * prevent the disk holding the xlog from growing full.
                               7389                 :                :      */
 2704 michael@paquier.xyz      7390                 :           1532 :     XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
                               7391                 :           1532 :     KeepLogSeg(recptr, &_logSegNo);
  302 akapila@postgresql.o     7392         [ +  + ]:           1532 :     if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT,
                               7393                 :                :                                            _logSegNo, InvalidOid,
                               7394                 :                :                                            InvalidTransactionId))
                               7395                 :                :     {
                               7396                 :                :         /*
                               7397                 :                :          * Some slots have been invalidated; recalculate the old-segment
                               7398                 :                :          * horizon, starting again from RedoRecPtr.
                               7399                 :                :          */
 1616 alvherre@alvh.no-ip.     7400                 :              3 :         XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
                               7401                 :              3 :         KeepLogSeg(recptr, &_logSegNo);
                               7402                 :                :     }
 2704 michael@paquier.xyz      7403                 :           1532 :     _logSegNo--;
 1504 rhaas@postgresql.org     7404                 :           1532 :     RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr,
                               7405                 :                :                        checkPoint.ThisTimeLineID);
                               7406                 :                : 
                               7407                 :                :     /*
                               7408                 :                :      * Make more log segments if needed.  (Do this after recycling old log
                               7409                 :                :      * segments, since that may supply some of the needed files.)
                               7410                 :                :      */
 9046 tgl@sss.pgh.pa.us        7411         [ +  + ]:           1532 :     if (!shutdown)
 1504 rhaas@postgresql.org     7412                 :            915 :         PreallocXlogFiles(recptr, checkPoint.ThisTimeLineID);
                               7413                 :                : 
                               7414                 :                :     /*
                               7415                 :                :      * Truncate pg_subtrans if possible.  We can throw away all data before
                               7416                 :                :      * the oldest XMIN of any running transaction.  No future transaction will
                               7417                 :                :      * attempt to reference any pg_subtrans entry older than that (see Asserts
                               7418                 :                :      * in subtrans.c).  During recovery, though, we mustn't do this because
                               7419                 :                :      * StartupSUBTRANS hasn't been called yet.
                               7420                 :                :      */
 6019 tgl@sss.pgh.pa.us        7421         [ +  + ]:           1532 :     if (!RecoveryInProgress())
 1954 andres@anarazel.de       7422                 :           1504 :         TruncateSUBTRANS(GetOldestTransactionIdConsideredRunning());
                               7423                 :                : 
                               7424                 :                :     /* Real work is done; log and update stats. */
 5005 rhaas@postgresql.org     7425                 :           1532 :     LogCheckpointEnd(false);
                               7426                 :                : 
                               7427                 :                :     /* Reset the process title */
 1830 michael@paquier.xyz      7428                 :           1532 :     update_checkpoint_display(flags, false, true);
                               7429                 :                : 
                               7430                 :                :     TRACE_POSTGRESQL_CHECKPOINT_DONE(CheckpointStats.ckpt_bufs_written,
                               7431                 :                :                                      NBuffers,
                               7432                 :                :                                      CheckpointStats.ckpt_segs_added,
                               7433                 :                :                                      CheckpointStats.ckpt_segs_removed,
                               7434                 :                :                                      CheckpointStats.ckpt_segs_recycled);
                               7435                 :                : 
  444 fujii@postgresql.org     7436                 :           1532 :     return true;
                               7437                 :                : }
                               7438                 :                : 
                               7439                 :                : /*
                               7440                 :                :  * Mark the end of recovery in WAL though without running a full checkpoint.
                               7441                 :                :  * We can expect that a restartpoint is likely to be in progress as we
                               7442                 :                :  * do this, though we are unwilling to wait for it to complete.
                               7443                 :                :  *
                               7444                 :                :  * CreateRestartPoint() allows for the case where recovery may end before
                               7445                 :                :  * the restartpoint completes so there is no concern of concurrent behaviour.
                               7446                 :                :  */
                               7447                 :                : static void
 4706 simon@2ndQuadrant.co     7448                 :             42 : CreateEndOfRecoveryRecord(void)
                               7449                 :                : {
                               7450                 :                :     xl_end_of_recovery xlrec;
                               7451                 :                :     XLogRecPtr  recptr;
                               7452                 :                : 
                               7453                 :                :     /* sanity check */
                               7454         [ -  + ]:             42 :     if (!RecoveryInProgress())
 4706 simon@2ndQuadrant.co     7455         [ #  # ]:UBC           0 :         elog(ERROR, "can only be used to end recovery");
                               7456                 :                : 
 4017 heikki.linnakangas@i     7457                 :CBC          42 :     xlrec.end_time = GetCurrentTimestamp();
  518 rhaas@postgresql.org     7458                 :             42 :     xlrec.wal_level = wal_level;
                               7459                 :                : 
 4290 heikki.linnakangas@i     7460                 :             42 :     WALInsertLockAcquireExclusive();
 1499 rhaas@postgresql.org     7461                 :             42 :     xlrec.ThisTimeLineID = XLogCtl->InsertTimeLineID;
 4693 heikki.linnakangas@i     7462                 :             42 :     xlrec.PrevTimeLineID = XLogCtl->PrevTimeLineID;
 4290                          7463                 :             42 :     WALInsertLockRelease();
                               7464                 :                : 
 4706 simon@2ndQuadrant.co     7465                 :             42 :     START_CRIT_SECTION();
                               7466                 :                : 
 4046 heikki.linnakangas@i     7467                 :             42 :     XLogBeginInsert();
  310 peter@eisentraut.org     7468                 :             42 :     XLogRegisterData(&xlrec, sizeof(xl_end_of_recovery));
 4046 heikki.linnakangas@i     7469                 :             42 :     recptr = XLogInsert(RM_XLOG_ID, XLOG_END_OF_RECOVERY);
                               7470                 :                : 
 4704 simon@2ndQuadrant.co     7471                 :             42 :     XLogFlush(recptr);
                               7472                 :                : 
                               7473                 :                :     /*
                               7474                 :                :      * Update the control file so that crash recovery can follow the timeline
                               7475                 :                :      * changes to this point.
                               7476                 :                :      */
                               7477                 :             42 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               7478                 :             42 :     ControlFile->minRecoveryPoint = recptr;
 1504 rhaas@postgresql.org     7479                 :             42 :     ControlFile->minRecoveryPointTLI = xlrec.ThisTimeLineID;
 4704 simon@2ndQuadrant.co     7480                 :             42 :     UpdateControlFile();
                               7481                 :             42 :     LWLockRelease(ControlFileLock);
                               7482                 :                : 
 4706                          7483         [ -  + ]:             42 :     END_CRIT_SECTION();
                               7484                 :             42 : }
                               7485                 :                : 
                               7486                 :                : /*
                               7487                 :                :  * Write an OVERWRITE_CONTRECORD message.
                               7488                 :                :  *
                               7489                 :                :  * When on WAL replay we expect a continuation record at the start of a page
                               7490                 :                :  * that is not there, recovery ends and WAL writing resumes at that point.
                               7491                 :                :  * But it's wrong to resume writing new WAL back at the start of the record
                               7492                 :                :  * that was broken, because downstream consumers of that WAL (physical
                               7493                 :                :  * replicas) are not prepared to "rewind".  So the first action after
                               7494                 :                :  * finishing replay of all valid WAL must be to write a record of this type
                               7495                 :                :  * at the point where the contrecord was missing; to support xlogreader
                               7496                 :                :  * detecting the special case, XLP_FIRST_IS_OVERWRITE_CONTRECORD is also added
                               7497                 :                :  * to the page header where the record occurs.  xlogreader has an ad-hoc
                               7498                 :                :  * mechanism to report metadata about the broken record, which is what we
                               7499                 :                :  * use here.
                               7500                 :                :  *
                               7501                 :                :  * At replay time, XLP_FIRST_IS_OVERWRITE_CONTRECORD instructs xlogreader to
                               7502                 :                :  * skip the record it was reading, and pass back the LSN of the skipped
                               7503                 :                :  * record, so that its caller can verify (on "replay" of that record) that the
                               7504                 :                :  * XLOG_OVERWRITE_CONTRECORD matches what was effectively overwritten.
                               7505                 :                :  *
                               7506                 :                :  * 'aborted_lsn' is the beginning position of the record that was incomplete.
                               7507                 :                :  * It is included in the WAL record.  'pagePtr' and 'newTLI' point to the
                               7508                 :                :  * beginning of the XLOG page where the record is to be inserted.  They must
                               7509                 :                :  * match the current WAL insert position, they're passed here just so that we
                               7510                 :                :  * can verify that.
                               7511                 :                :  */
                               7512                 :                : static XLogRecPtr
 1401 heikki.linnakangas@i     7513                 :             10 : CreateOverwriteContrecordRecord(XLogRecPtr aborted_lsn, XLogRecPtr pagePtr,
                               7514                 :                :                                 TimeLineID newTLI)
                               7515                 :                : {
                               7516                 :                :     xl_overwrite_contrecord xlrec;
                               7517                 :                :     XLogRecPtr  recptr;
                               7518                 :                :     XLogPageHeader pagehdr;
                               7519                 :                :     XLogRecPtr  startPos;
                               7520                 :                : 
                               7521                 :                :     /* sanity checks */
 1541 alvherre@alvh.no-ip.     7522         [ -  + ]:             10 :     if (!RecoveryInProgress())
 1541 alvherre@alvh.no-ip.     7523         [ #  # ]:UBC           0 :         elog(ERROR, "can only be used at end of recovery");
 1401 heikki.linnakangas@i     7524         [ -  + ]:CBC          10 :     if (pagePtr % XLOG_BLCKSZ != 0)
  164 alvherre@kurilemu.de     7525         [ #  # ]:UNC           0 :         elog(ERROR, "invalid position for missing continuation record %X/%08X",
                               7526                 :                :              LSN_FORMAT_ARGS(pagePtr));
                               7527                 :                : 
                               7528                 :                :     /* The current WAL insert position should be right after the page header */
 1401 heikki.linnakangas@i     7529                 :CBC          10 :     startPos = pagePtr;
                               7530         [ +  + ]:             10 :     if (XLogSegmentOffset(startPos, wal_segment_size) == 0)
                               7531                 :              1 :         startPos += SizeOfXLogLongPHD;
                               7532                 :                :     else
                               7533                 :              9 :         startPos += SizeOfXLogShortPHD;
                               7534                 :             10 :     recptr = GetXLogInsertRecPtr();
                               7535         [ -  + ]:             10 :     if (recptr != startPos)
  164 alvherre@kurilemu.de     7536         [ #  # ]:UNC           0 :         elog(ERROR, "invalid WAL insert position %X/%08X for OVERWRITE_CONTRECORD",
                               7537                 :                :              LSN_FORMAT_ARGS(recptr));
                               7538                 :                : 
 1541 alvherre@alvh.no-ip.     7539                 :CBC          10 :     START_CRIT_SECTION();
                               7540                 :                : 
                               7541                 :                :     /*
                               7542                 :                :      * Initialize the XLOG page header (by GetXLogBuffer), and set the
                               7543                 :                :      * XLP_FIRST_IS_OVERWRITE_CONTRECORD flag.
                               7544                 :                :      *
                               7545                 :                :      * No other backend is allowed to write WAL yet, so acquiring the WAL
                               7546                 :                :      * insertion lock is just pro forma.
                               7547                 :                :      */
 1401 heikki.linnakangas@i     7548                 :             10 :     WALInsertLockAcquire();
                               7549                 :             10 :     pagehdr = (XLogPageHeader) GetXLogBuffer(pagePtr, newTLI);
                               7550                 :             10 :     pagehdr->xlp_info |= XLP_FIRST_IS_OVERWRITE_CONTRECORD;
                               7551                 :             10 :     WALInsertLockRelease();
                               7552                 :                : 
                               7553                 :                :     /*
                               7554                 :                :      * Insert the XLOG_OVERWRITE_CONTRECORD record as the first record on the
                               7555                 :                :      * page.  We know it becomes the first record, because no other backend is
                               7556                 :                :      * allowed to write WAL yet.
                               7557                 :                :      */
 1541 alvherre@alvh.no-ip.     7558                 :             10 :     XLogBeginInsert();
 1401 heikki.linnakangas@i     7559                 :             10 :     xlrec.overwritten_lsn = aborted_lsn;
                               7560                 :             10 :     xlrec.overwrite_time = GetCurrentTimestamp();
  310 peter@eisentraut.org     7561                 :             10 :     XLogRegisterData(&xlrec, sizeof(xl_overwrite_contrecord));
 1541 alvherre@alvh.no-ip.     7562                 :             10 :     recptr = XLogInsert(RM_XLOG_ID, XLOG_OVERWRITE_CONTRECORD);
                               7563                 :                : 
                               7564                 :                :     /* check that the record was inserted to the right place */
 1401 heikki.linnakangas@i     7565         [ -  + ]:             10 :     if (ProcLastRecPtr != startPos)
  164 alvherre@kurilemu.de     7566         [ #  # ]:UNC           0 :         elog(ERROR, "OVERWRITE_CONTRECORD was inserted to unexpected position %X/%08X",
                               7567                 :                :              LSN_FORMAT_ARGS(ProcLastRecPtr));
                               7568                 :                : 
 1541 alvherre@alvh.no-ip.     7569                 :CBC          10 :     XLogFlush(recptr);
                               7570                 :                : 
                               7571         [ -  + ]:             10 :     END_CRIT_SECTION();
                               7572                 :                : 
                               7573                 :             10 :     return recptr;
                               7574                 :                : }
                               7575                 :                : 
                               7576                 :                : /*
                               7577                 :                :  * Flush all data in shared memory to disk, and fsync
                               7578                 :                :  *
                               7579                 :                :  * This is the common code shared between regular checkpoints and
                               7580                 :                :  * recovery restartpoints.
                               7581                 :                :  */
                               7582                 :                : static void
 6748 tgl@sss.pgh.pa.us        7583                 :           1732 : CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
                               7584                 :                : {
 5793                          7585                 :           1732 :     CheckPointRelationMap();
  826 akapila@postgresql.o     7586                 :           1732 :     CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 4308 rhaas@postgresql.org     7587                 :           1732 :     CheckPointSnapBuild();
                               7588                 :           1732 :     CheckPointLogicalRewriteHeap();
 3886 andres@anarazel.de       7589                 :           1732 :     CheckPointReplicationOrigin();
                               7590                 :                : 
                               7591                 :                :     /* Write out all dirty data in SLRUs and the main buffer pool */
                               7592                 :                :     TRACE_POSTGRESQL_BUFFER_CHECKPOINT_START(flags);
 1910 tmunro@postgresql.or     7593                 :           1732 :     CheckpointStats.ckpt_write_t = GetCurrentTimestamp();
                               7594                 :           1732 :     CheckPointCLOG();
                               7595                 :           1732 :     CheckPointCommitTs();
                               7596                 :           1732 :     CheckPointSUBTRANS();
                               7597                 :           1732 :     CheckPointMultiXact();
                               7598                 :           1732 :     CheckPointPredicate();
                               7599                 :           1732 :     CheckPointBuffers(flags);
                               7600                 :                : 
                               7601                 :                :     /* Perform all queued up fsyncs */
                               7602                 :                :     TRACE_POSTGRESQL_BUFFER_CHECKPOINT_SYNC_START();
                               7603                 :           1732 :     CheckpointStats.ckpt_sync_t = GetCurrentTimestamp();
                               7604                 :           1732 :     ProcessSyncRequests();
                               7605                 :           1732 :     CheckpointStats.ckpt_sync_end_t = GetCurrentTimestamp();
                               7606                 :                :     TRACE_POSTGRESQL_BUFFER_CHECKPOINT_DONE();
                               7607                 :                : 
                               7608                 :                :     /* We deliberately delay 2PC checkpointing as long as possible */
 7073 tgl@sss.pgh.pa.us        7609                 :           1732 :     CheckPointTwoPhase(checkPointRedo);
                               7610                 :           1732 : }
                               7611                 :                : 
                               7612                 :                : /*
                               7613                 :                :  * Save a checkpoint for recovery restart if appropriate
                               7614                 :                :  *
                               7615                 :                :  * This function is called each time a checkpoint record is read from XLOG.
                               7616                 :                :  * It must determine whether the checkpoint represents a safe restartpoint or
                               7617                 :                :  * not.  If so, the checkpoint record is stashed in shared memory so that
                               7618                 :                :  * CreateRestartPoint can consult it.  (Note that the latter function is
                               7619                 :                :  * executed by the checkpointer, while this one will be executed by the
                               7620                 :                :  * startup process.)
                               7621                 :                :  */
                               7622                 :                : static void
 1485 rhaas@postgresql.org     7623                 :            702 : RecoveryRestartPoint(const CheckPoint *checkPoint, XLogReaderState *record)
                               7624                 :                : {
                               7625                 :                :     /*
                               7626                 :                :      * Also refrain from creating a restartpoint if we have seen any
                               7627                 :                :      * references to non-existent pages. Restarting recovery from the
                               7628                 :                :      * restartpoint would not see the references, so we would lose the
                               7629                 :                :      * cross-check that the pages belonged to a relation that was dropped
                               7630                 :                :      * later.
                               7631                 :                :      */
 5130 heikki.linnakangas@i     7632         [ -  + ]:            702 :     if (XLogHaveInvalidPages())
                               7633                 :                :     {
  738 michael@paquier.xyz      7634         [ #  # ]:UBC           0 :         elog(DEBUG2,
                               7635                 :                :              "could not record restart point at %X/%08X because there are unresolved references to invalid pages",
                               7636                 :                :              LSN_FORMAT_ARGS(checkPoint->redo));
 5130 heikki.linnakangas@i     7637                 :              0 :         return;
                               7638                 :                :     }
                               7639                 :                : 
                               7640                 :                :     /*
                               7641                 :                :      * Copy the checkpoint record to shared memory, so that checkpointer can
                               7642                 :                :      * work out the next time it wants to perform a restartpoint.
                               7643                 :                :      */
 4105 andres@anarazel.de       7644         [ -  + ]:CBC         702 :     SpinLockAcquire(&XLogCtl->info_lck);
 1485 rhaas@postgresql.org     7645                 :            702 :     XLogCtl->lastCheckPointRecPtr = record->ReadRecPtr;
                               7646                 :            702 :     XLogCtl->lastCheckPointEndPtr = record->EndRecPtr;
 4105 andres@anarazel.de       7647                 :            702 :     XLogCtl->lastCheckPoint = *checkPoint;
                               7648                 :            702 :     SpinLockRelease(&XLogCtl->info_lck);
                               7649                 :                : }
                               7650                 :                : 
                               7651                 :                : /*
                               7652                 :                :  * Establish a restartpoint if possible.
                               7653                 :                :  *
                               7654                 :                :  * This is similar to CreateCheckPoint, but is used during WAL recovery
                               7655                 :                :  * to establish a point from which recovery can roll forward without
                               7656                 :                :  * replaying the entire recovery log.
                               7657                 :                :  *
                               7658                 :                :  * Returns true if a new restartpoint was established. We can only establish
                               7659                 :                :  * a restartpoint if we have replayed a safe checkpoint record since last
                               7660                 :                :  * restartpoint.
                               7661                 :                :  */
                               7662                 :                : bool
 6147 heikki.linnakangas@i     7663                 :            601 : CreateRestartPoint(int flags)
                               7664                 :                : {
                               7665                 :                :     XLogRecPtr  lastCheckPointRecPtr;
                               7666                 :                :     XLogRecPtr  lastCheckPointEndPtr;
                               7667                 :                :     CheckPoint  lastCheckPoint;
                               7668                 :                :     XLogRecPtr  PriorRedoPtr;
                               7669                 :                :     XLogRecPtr  receivePtr;
                               7670                 :                :     XLogRecPtr  replayPtr;
                               7671                 :                :     TimeLineID  replayTLI;
                               7672                 :                :     XLogRecPtr  endptr;
                               7673                 :                :     XLogSegNo   _logSegNo;
                               7674                 :                :     TimestampTz xtime;
                               7675                 :                : 
                               7676                 :                :     /* Concurrent checkpoint/restartpoint cannot happen */
 1319 michael@paquier.xyz      7677   [ +  -  -  + ]:            601 :     Assert(!IsUnderPostmaster || MyBackendType == B_CHECKPOINTER);
                               7678                 :                : 
                               7679                 :                :     /* Get a local copy of the last safe checkpoint record. */
 4105 andres@anarazel.de       7680         [ -  + ]:            601 :     SpinLockAcquire(&XLogCtl->info_lck);
                               7681                 :            601 :     lastCheckPointRecPtr = XLogCtl->lastCheckPointRecPtr;
 3339 rhaas@postgresql.org     7682                 :            601 :     lastCheckPointEndPtr = XLogCtl->lastCheckPointEndPtr;
 4105 andres@anarazel.de       7683                 :            601 :     lastCheckPoint = XLogCtl->lastCheckPoint;
                               7684                 :            601 :     SpinLockRelease(&XLogCtl->info_lck);
                               7685                 :                : 
                               7686                 :                :     /*
                               7687                 :                :      * Check that we're still in recovery mode. It's ok if we exit recovery
                               7688                 :                :      * mode after this check, the restart point is valid anyway.
                               7689                 :                :      */
 6147 heikki.linnakangas@i     7690         [ -  + ]:            601 :     if (!RecoveryInProgress())
                               7691                 :                :     {
 6147 heikki.linnakangas@i     7692         [ #  # ]:UBC           0 :         ereport(DEBUG2,
                               7693                 :                :                 (errmsg_internal("skipping restartpoint, recovery has already ended")));
                               7694                 :              0 :         return false;
                               7695                 :                :     }
                               7696                 :                : 
                               7697                 :                :     /*
                               7698                 :                :      * If the last checkpoint record we've replayed is already our last
                               7699                 :                :      * restartpoint, we can't perform a new restart point. We still update
                               7700                 :                :      * minRecoveryPoint in that case, so that if this is a shutdown restart
                               7701                 :                :      * point, we won't start up earlier than before. That's not strictly
                               7702                 :                :      * necessary, but when hot standby is enabled, it would be rather weird if
                               7703                 :                :      * the database opened up for read-only connections at a point-in-time
                               7704                 :                :      * before the last shutdown. Such time travel is still possible in case of
                               7705                 :                :      * immediate shutdown, though.
                               7706                 :                :      *
                               7707                 :                :      * We don't explicitly advance minRecoveryPoint when we do create a
                               7708                 :                :      * restartpoint. It's assumed that flushing the buffers will do that as a
                               7709                 :                :      * side-effect.
                               7710                 :                :      */
   42 alvherre@kurilemu.de     7711         [ +  + ]:GNC         601 :     if (!XLogRecPtrIsValid(lastCheckPointRecPtr) ||
 4738 alvherre@alvh.no-ip.     7712         [ +  + ]:CBC         270 :         lastCheckPoint.redo <= ControlFile->checkPointCopy.redo)
                               7713                 :                :     {
 6147 heikki.linnakangas@i     7714         [ -  + ]:            401 :         ereport(DEBUG2,
                               7715                 :                :                 errmsg_internal("skipping restartpoint, already performed at %X/%08X",
                               7716                 :                :                                 LSN_FORMAT_ARGS(lastCheckPoint.redo)));
                               7717                 :                : 
                               7718                 :            401 :         UpdateMinRecoveryPoint(InvalidXLogRecPtr, true);
 5677 rhaas@postgresql.org     7719         [ +  + ]:            401 :         if (flags & CHECKPOINT_IS_SHUTDOWN)
                               7720                 :                :         {
                               7721                 :             31 :             LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               7722                 :             31 :             ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY;
                               7723                 :             31 :             UpdateControlFile();
                               7724                 :             31 :             LWLockRelease(ControlFileLock);
                               7725                 :                :         }
 6147 heikki.linnakangas@i     7726                 :            401 :         return false;
                               7727                 :                :     }
                               7728                 :                : 
                               7729                 :                :     /*
                               7730                 :                :      * Update the shared RedoRecPtr so that the startup process can calculate
                               7731                 :                :      * the number of segments replayed since last restartpoint, and request a
                               7732                 :                :      * restartpoint if it exceeds CheckPointSegments.
                               7733                 :                :      *
                               7734                 :                :      * Like in CreateCheckPoint(), hold off insertions to update it, although
                               7735                 :                :      * during recovery this is just pro forma, because no WAL insertions are
                               7736                 :                :      * happening.
                               7737                 :                :      */
 4290                          7738                 :            200 :     WALInsertLockAcquireExclusive();
 3951                          7739                 :            200 :     RedoRecPtr = XLogCtl->Insert.RedoRecPtr = lastCheckPoint.redo;
 4290                          7740                 :            200 :     WALInsertLockRelease();
                               7741                 :                : 
                               7742                 :                :     /* Also update the info_lck-protected copy */
 4105 andres@anarazel.de       7743         [ -  + ]:            200 :     SpinLockAcquire(&XLogCtl->info_lck);
                               7744                 :            200 :     XLogCtl->RedoRecPtr = lastCheckPoint.redo;
                               7745                 :            200 :     SpinLockRelease(&XLogCtl->info_lck);
                               7746                 :                : 
                               7747                 :                :     /*
                               7748                 :                :      * Prepare to accumulate statistics.
                               7749                 :                :      *
                               7750                 :                :      * Note: because it is possible for log_checkpoints to change while a
                               7751                 :                :      * checkpoint proceeds, we always accumulate stats, even if
                               7752                 :                :      * log_checkpoints is currently off.
                               7753                 :                :      */
 5433 rhaas@postgresql.org     7754   [ +  -  +  -  :           2200 :     MemSet(&CheckpointStats, 0, sizeof(CheckpointStats));
                                     +  -  +  -  +  
                                                 + ]
                               7755                 :            200 :     CheckpointStats.ckpt_start_t = GetCurrentTimestamp();
                               7756                 :                : 
                               7757         [ +  - ]:            200 :     if (log_checkpoints)
 6147 heikki.linnakangas@i     7758                 :            200 :         LogCheckpointStart(flags, true);
                               7759                 :                : 
                               7760                 :                :     /* Update the process title */
 1830 michael@paquier.xyz      7761                 :            200 :     update_checkpoint_display(flags, true, false);
                               7762                 :                : 
 6147 heikki.linnakangas@i     7763                 :            200 :     CheckPointGuts(lastCheckPoint.redo, flags);
                               7764                 :                : 
                               7765                 :                :     /*
                               7766                 :                :      * This location needs to be after CheckPointGuts() to ensure that some
                               7767                 :                :      * work has already happened during this checkpoint.
                               7768                 :                :      */
                               7769                 :                :     INJECTION_POINT("create-restart-point", NULL);
                               7770                 :                : 
                               7771                 :                :     /*
                               7772                 :                :      * Remember the prior checkpoint's redo ptr for
                               7773                 :                :      * UpdateCheckPointDistanceEstimate()
                               7774                 :                :      */
 3951                          7775                 :            200 :     PriorRedoPtr = ControlFile->checkPointCopy.redo;
                               7776                 :                : 
                               7777                 :                :     /*
                               7778                 :                :      * Update pg_control, using current time.  Check that it still shows an
                               7779                 :                :      * older checkpoint, else do nothing; this is a quick hack to make sure
                               7780                 :                :      * nothing really bad happens if somehow we get here after the
                               7781                 :                :      * end-of-recovery checkpoint.
                               7782                 :                :      */
 6147                          7783                 :            200 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 1319 michael@paquier.xyz      7784         [ +  - ]:            200 :     if (ControlFile->checkPointCopy.redo < lastCheckPoint.redo)
                               7785                 :                :     {
                               7786                 :                :         /*
                               7787                 :                :          * Update the checkpoint information.  We do this even if the cluster
                               7788                 :                :          * does not show DB_IN_ARCHIVE_RECOVERY to match with the set of WAL
                               7789                 :                :          * segments recycled below.
                               7790                 :                :          */
 6019 tgl@sss.pgh.pa.us        7791                 :            200 :         ControlFile->checkPoint = lastCheckPointRecPtr;
                               7792                 :            200 :         ControlFile->checkPointCopy = lastCheckPoint;
                               7793                 :                : 
                               7794                 :                :         /*
                               7795                 :                :          * Ensure minRecoveryPoint is past the checkpoint record and update it
                               7796                 :                :          * if the control file still shows DB_IN_ARCHIVE_RECOVERY.  Normally,
                               7797                 :                :          * this will have happened already while writing out dirty buffers,
                               7798                 :                :          * but not necessarily - e.g. because no buffers were dirtied.  We do
                               7799                 :                :          * this because a backup performed in recovery uses minRecoveryPoint
                               7800                 :                :          * to determine which WAL files must be included in the backup, and
                               7801                 :                :          * the file (or files) containing the checkpoint record must be
                               7802                 :                :          * included, at a minimum.  Note that for an ordinary restart of
                               7803                 :                :          * recovery there's no value in having the minimum recovery point any
                               7804                 :                :          * earlier than this anyway, because redo will begin just after the
                               7805                 :                :          * checkpoint record.
                               7806                 :                :          */
 1319 michael@paquier.xyz      7807         [ +  - ]:            200 :         if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY)
                               7808                 :                :         {
                               7809         [ +  + ]:            200 :             if (ControlFile->minRecoveryPoint < lastCheckPointEndPtr)
                               7810                 :                :             {
                               7811                 :             18 :                 ControlFile->minRecoveryPoint = lastCheckPointEndPtr;
                               7812                 :             18 :                 ControlFile->minRecoveryPointTLI = lastCheckPoint.ThisTimeLineID;
                               7813                 :                : 
                               7814                 :                :                 /* update local copy */
                               7815                 :             18 :                 LocalMinRecoveryPoint = ControlFile->minRecoveryPoint;
                               7816                 :             18 :                 LocalMinRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
                               7817                 :                :             }
                               7818         [ +  + ]:            200 :             if (flags & CHECKPOINT_IS_SHUTDOWN)
                               7819                 :             21 :                 ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY;
                               7820                 :                :         }
 6019 tgl@sss.pgh.pa.us        7821                 :            200 :         UpdateControlFile();
                               7822                 :                :     }
 6147 heikki.linnakangas@i     7823                 :            200 :     LWLockRelease(ControlFileLock);
                               7824                 :                : 
                               7825                 :                :     /*
                               7826                 :                :      * Update the average distance between checkpoints/restartpoints if the
                               7827                 :                :      * prior checkpoint exists.
                               7828                 :                :      */
   42 alvherre@kurilemu.de     7829         [ +  - ]:GNC         200 :     if (XLogRecPtrIsValid(PriorRedoPtr))
 3951 heikki.linnakangas@i     7830                 :CBC         200 :         UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr);
                               7831                 :                : 
                               7832                 :                :     /*
                               7833                 :                :      * Delete old log files, those no longer needed for last restartpoint to
                               7834                 :                :      * prevent the disk holding the xlog from growing full.
                               7835                 :                :      */
 2704 michael@paquier.xyz      7836                 :            200 :     XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
                               7837                 :                : 
                               7838                 :                :     /*
                               7839                 :                :      * Retreat _logSegNo using the current end of xlog replayed or received,
                               7840                 :                :      * whichever is later.
                               7841                 :                :      */
 2080 tmunro@postgresql.or     7842                 :            200 :     receivePtr = GetWalRcvFlushRecPtr(NULL, NULL);
 2704 michael@paquier.xyz      7843                 :            200 :     replayPtr = GetXLogReplayRecPtr(&replayTLI);
                               7844                 :            200 :     endptr = (receivePtr < replayPtr) ? replayPtr : receivePtr;
                               7845                 :            200 :     KeepLogSeg(endptr, &_logSegNo);
  302 akapila@postgresql.o     7846         [ +  + ]:            200 :     if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT,
                               7847                 :                :                                            _logSegNo, InvalidOid,
                               7848                 :                :                                            InvalidTransactionId))
                               7849                 :                :     {
                               7850                 :                :         /*
                               7851                 :                :          * Some slots have been invalidated; recalculate the old-segment
                               7852                 :                :          * horizon, starting again from RedoRecPtr.
                               7853                 :                :          */
 1616 alvherre@alvh.no-ip.     7854                 :              1 :         XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
                               7855                 :              1 :         KeepLogSeg(endptr, &_logSegNo);
                               7856                 :                :     }
 2704 michael@paquier.xyz      7857                 :            200 :     _logSegNo--;
                               7858                 :                : 
                               7859                 :                :     /*
                               7860                 :                :      * Try to recycle segments on a useful timeline. If we've been promoted
                               7861                 :                :      * since the beginning of this restartpoint, use the new timeline chosen
                               7862                 :                :      * at end of recovery.  If we're still in recovery, use the timeline we're
                               7863                 :                :      * currently replaying.
                               7864                 :                :      *
                               7865                 :                :      * There is no guarantee that the WAL segments will be useful on the
                               7866                 :                :      * current timeline; if recovery proceeds to a new timeline right after
                               7867                 :                :      * this, the pre-allocated WAL segments on this timeline will not be used,
                               7868                 :                :      * and will go wasted until recycled on the next restartpoint. We'll live
                               7869                 :                :      * with that.
                               7870                 :                :      */
 1504 rhaas@postgresql.org     7871         [ -  + ]:            200 :     if (!RecoveryInProgress())
 1499 rhaas@postgresql.org     7872                 :UBC           0 :         replayTLI = XLogCtl->InsertTimeLineID;
                               7873                 :                : 
 1504 rhaas@postgresql.org     7874                 :CBC         200 :     RemoveOldXlogFiles(_logSegNo, RedoRecPtr, endptr, replayTLI);
                               7875                 :                : 
                               7876                 :                :     /*
                               7877                 :                :      * Make more log segments if needed.  (Do this after recycling old log
                               7878                 :                :      * segments, since that may supply some of the needed files.)
                               7879                 :                :      */
                               7880                 :            200 :     PreallocXlogFiles(endptr, replayTLI);
                               7881                 :                : 
                               7882                 :                :     /*
                               7883                 :                :      * Truncate pg_subtrans if possible.  We can throw away all data before
                               7884                 :                :      * the oldest XMIN of any running transaction.  No future transaction will
                               7885                 :                :      * attempt to reference any pg_subtrans entry older than that (see Asserts
                               7886                 :                :      * in subtrans.c).  When hot standby is disabled, though, we mustn't do
                               7887                 :                :      * this because StartupSUBTRANS hasn't been called yet.
                               7888                 :                :      */
 5589 simon@2ndQuadrant.co     7889         [ +  - ]:            200 :     if (EnableHotStandby)
 1954 andres@anarazel.de       7890                 :            200 :         TruncateSUBTRANS(GetOldestTransactionIdConsideredRunning());
                               7891                 :                : 
                               7892                 :                :     /* Real work is done; log and update stats. */
 5005 rhaas@postgresql.org     7893                 :            200 :     LogCheckpointEnd(true);
                               7894                 :                : 
                               7895                 :                :     /* Reset the process title */
 1830 michael@paquier.xyz      7896                 :            200 :     update_checkpoint_display(flags, true, true);
                               7897                 :                : 
 5647 tgl@sss.pgh.pa.us        7898                 :            200 :     xtime = GetLatestXTime();
 6147 heikki.linnakangas@i     7899   [ +  -  +  -  :            200 :     ereport((log_checkpoints ? LOG : DEBUG2),
                                              +  + ]
                               7900                 :                :             errmsg("recovery restart point at %X/%08X",
                               7901                 :                :                    LSN_FORMAT_ARGS(lastCheckPoint.redo)),
                               7902                 :                :             xtime ? errdetail("Last completed transaction was at log time %s.",
                               7903                 :                :                               timestamptz_to_str(xtime)) : 0);
                               7904                 :                : 
                               7905                 :                :     /*
                               7906                 :                :      * Finally, execute archive_cleanup_command, if any.
                               7907                 :                :      */
 2580 peter_e@gmx.net          7908   [ +  -  -  + ]:            200 :     if (archiveCleanupCommand && strcmp(archiveCleanupCommand, "") != 0)
 1046 michael@paquier.xyz      7909                 :UBC           0 :         ExecuteRecoveryCommand(archiveCleanupCommand,
                               7910                 :                :                                "archive_cleanup_command",
                               7911                 :                :                                false,
                               7912                 :                :                                WAIT_EVENT_ARCHIVE_CLEANUP_COMMAND);
                               7913                 :                : 
 6147 heikki.linnakangas@i     7914                 :CBC         200 :     return true;
                               7915                 :                : }
                               7916                 :                : 
                               7917                 :                : /*
                               7918                 :                :  * Report availability of WAL for the given target LSN
                               7919                 :                :  *      (typically a slot's restart_lsn)
                               7920                 :                :  *
                               7921                 :                :  * Returns one of the following enum values:
                               7922                 :                :  *
                               7923                 :                :  * * WALAVAIL_RESERVED means targetLSN is available and it is in the range of
                               7924                 :                :  *   max_wal_size.
                               7925                 :                :  *
                               7926                 :                :  * * WALAVAIL_EXTENDED means it is still available by preserving extra
                               7927                 :                :  *   segments beyond max_wal_size. If max_slot_wal_keep_size is smaller
                               7928                 :                :  *   than max_wal_size, this state is not returned.
                               7929                 :                :  *
                               7930                 :                :  * * WALAVAIL_UNRESERVED means it is being lost and the next checkpoint will
                               7931                 :                :  *   remove reserved segments. The walsender using this slot may return to the
                               7932                 :                :  *   above.
                               7933                 :                :  *
                               7934                 :                :  * * WALAVAIL_REMOVED means it has been removed. A replication stream on
                               7935                 :                :  *   a slot with this LSN cannot continue.  (Any associated walsender
                               7936                 :                :  *   processes should have been terminated already.)
                               7937                 :                :  *
                               7938                 :                :  * * WALAVAIL_INVALID_LSN means the slot hasn't been set to reserve WAL.
                               7939                 :                :  */
                               7940                 :                : WALAvailability
 2081 alvherre@alvh.no-ip.     7941                 :            391 : GetWALAvailability(XLogRecPtr targetLSN)
                               7942                 :                : {
                               7943                 :                :     XLogRecPtr  currpos;        /* current write LSN */
                               7944                 :                :     XLogSegNo   currSeg;        /* segid of currpos */
                               7945                 :                :     XLogSegNo   targetSeg;      /* segid of targetLSN */
                               7946                 :                :     XLogSegNo   oldestSeg;      /* actual oldest segid */
                               7947                 :                :     XLogSegNo   oldestSegMaxWalSize;    /* oldest segid kept by max_wal_size */
                               7948                 :                :     XLogSegNo   oldestSlotSeg;  /* oldest segid kept by slot */
                               7949                 :                :     uint64      keepSegs;
                               7950                 :                : 
                               7951                 :                :     /*
                               7952                 :                :      * slot does not reserve WAL. Either deactivated, or has never been active
                               7953                 :                :      */
   42 alvherre@kurilemu.de     7954         [ +  + ]:GNC         391 :     if (!XLogRecPtrIsValid(targetLSN))
 2081 alvherre@alvh.no-ip.     7955                 :CBC          23 :         return WALAVAIL_INVALID_LSN;
                               7956                 :                : 
                               7957                 :                :     /*
                               7958                 :                :      * Calculate the oldest segment currently reserved by all slots,
                               7959                 :                :      * considering wal_keep_size and max_slot_wal_keep_size.  Initialize
                               7960                 :                :      * oldestSlotSeg to the current segment.
                               7961                 :                :      */
 1984                          7962                 :            368 :     currpos = GetXLogWriteRecPtr();
                               7963                 :            368 :     XLByteToSeg(currpos, oldestSlotSeg, wal_segment_size);
 2081                          7964                 :            368 :     KeepLogSeg(currpos, &oldestSlotSeg);
                               7965                 :                : 
                               7966                 :                :     /*
                               7967                 :                :      * Find the oldest extant segment file. We get 1 until checkpoint removes
                               7968                 :                :      * the first WAL segment file since startup, which causes the status being
                               7969                 :                :      * wrong under certain abnormal conditions but that doesn't actually harm.
                               7970                 :                :      */
                               7971                 :            368 :     oldestSeg = XLogGetLastRemovedSegno() + 1;
                               7972                 :                : 
                               7973                 :                :     /* calculate oldest segment by max_wal_size */
                               7974                 :            368 :     XLByteToSeg(currpos, currSeg, wal_segment_size);
 2003                          7975                 :            368 :     keepSegs = ConvertToXSegs(max_wal_size_mb, wal_segment_size) + 1;
                               7976                 :                : 
 2081                          7977         [ +  + ]:            368 :     if (currSeg > keepSegs)
                               7978                 :              8 :         oldestSegMaxWalSize = currSeg - keepSegs;
                               7979                 :                :     else
                               7980                 :            360 :         oldestSegMaxWalSize = 1;
                               7981                 :                : 
                               7982                 :                :     /* the segment we care about */
 1984                          7983                 :            368 :     XLByteToSeg(targetLSN, targetSeg, wal_segment_size);
                               7984                 :                : 
                               7985                 :                :     /*
                               7986                 :                :      * No point in returning reserved or extended status values if the
                               7987                 :                :      * targetSeg is known to be lost.
                               7988                 :                :      */
 2003                          7989         [ +  + ]:            368 :     if (targetSeg >= oldestSlotSeg)
                               7990                 :                :     {
                               7991                 :                :         /* show "reserved" when targetSeg is within max_wal_size */
                               7992         [ +  + ]:            367 :         if (targetSeg >= oldestSegMaxWalSize)
 2081                          7993                 :            365 :             return WALAVAIL_RESERVED;
                               7994                 :                : 
                               7995                 :                :         /* being retained by slots exceeding max_wal_size */
 2003                          7996                 :              2 :         return WALAVAIL_EXTENDED;
                               7997                 :                :     }
                               7998                 :                : 
                               7999                 :                :     /* WAL segments are no longer retained but haven't been removed yet */
                               8000         [ +  - ]:              1 :     if (targetSeg >= oldestSeg)
                               8001                 :              1 :         return WALAVAIL_UNRESERVED;
                               8002                 :                : 
                               8003                 :                :     /* Definitely lost */
 2081 alvherre@alvh.no-ip.     8004                 :UBC           0 :     return WALAVAIL_REMOVED;
                               8005                 :                : }
                               8006                 :                : 
                               8007                 :                : 
                               8008                 :                : /*
                               8009                 :                :  * Retreat *logSegNo to the last segment that we need to retain because of
                               8010                 :                :  * either wal_keep_size or replication slots.
                               8011                 :                :  *
                               8012                 :                :  * This is calculated by subtracting wal_keep_size from the given xlog
                               8013                 :                :  * location, recptr and by making sure that that result is below the
                               8014                 :                :  * requirement of replication slots.  For the latter criterion we do consider
                               8015                 :                :  * the effects of max_slot_wal_keep_size: reserve at most that much space back
                               8016                 :                :  * from recptr.
                               8017                 :                :  *
                               8018                 :                :  * Note about replication slots: if this function calculates a value
                               8019                 :                :  * that's further ahead than what slots need reserved, then affected
                               8020                 :                :  * slots need to be invalidated and this function invoked again.
                               8021                 :                :  * XXX it might be a good idea to rewrite this function so that
                               8022                 :                :  * invalidation is optionally done here, instead.
                               8023                 :                :  */
                               8024                 :                : static void
 4925 heikki.linnakangas@i     8025                 :CBC        2104 : KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
                               8026                 :                : {
                               8027                 :                :     XLogSegNo   currSegNo;
                               8028                 :                :     XLogSegNo   segno;
                               8029                 :                :     XLogRecPtr  keep;
                               8030                 :                : 
 2081 alvherre@alvh.no-ip.     8031                 :           2104 :     XLByteToSeg(recptr, currSegNo, wal_segment_size);
                               8032                 :           2104 :     segno = currSegNo;
                               8033                 :                : 
                               8034                 :                :     /* Calculate how many segments are kept by slots. */
                               8035                 :           2104 :     keep = XLogGetReplicationSlotMinimumLSN();
   42 alvherre@kurilemu.de     8036   [ +  +  +  + ]:GNC        2104 :     if (XLogRecPtrIsValid(keep) && keep < recptr)
                               8037                 :                :     {
 2081 alvherre@alvh.no-ip.     8038                 :CBC         627 :         XLByteToSeg(keep, segno, wal_segment_size);
                               8039                 :                : 
                               8040                 :                :         /*
                               8041                 :                :          * Account for max_slot_wal_keep_size to avoid keeping more than
                               8042                 :                :          * configured.  However, don't do that during a binary upgrade: if
                               8043                 :                :          * slots were to be invalidated because of this, it would not be
                               8044                 :                :          * possible to preserve logical ones during the upgrade.
                               8045                 :                :          */
  160 akapila@postgresql.o     8046   [ +  +  +  - ]:            627 :         if (max_slot_wal_keep_size_mb >= 0 && !IsBinaryUpgrade)
                               8047                 :                :         {
                               8048                 :                :             uint64      slot_keep_segs;
                               8049                 :                : 
 2081 alvherre@alvh.no-ip.     8050                 :             20 :             slot_keep_segs =
                               8051                 :             20 :                 ConvertToXSegs(max_slot_wal_keep_size_mb, wal_segment_size);
                               8052                 :                : 
                               8053         [ +  + ]:             20 :             if (currSegNo - segno > slot_keep_segs)
                               8054                 :              5 :                 segno = currSegNo - slot_keep_segs;
                               8055                 :                :         }
                               8056                 :                :     }
                               8057                 :                : 
                               8058                 :                :     /*
                               8059                 :                :      * If WAL summarization is in use, don't remove WAL that has yet to be
                               8060                 :                :      * summarized.
                               8061                 :                :      */
  541 rhaas@postgresql.org     8062                 :           2104 :     keep = GetOldestUnsummarizedLSN(NULL, NULL);
   42 alvherre@kurilemu.de     8063         [ +  + ]:GNC        2104 :     if (XLogRecPtrIsValid(keep))
                               8064                 :                :     {
                               8065                 :                :         XLogSegNo   unsummarized_segno;
                               8066                 :                : 
  729 rhaas@postgresql.org     8067                 :CBC           2 :         XLByteToSeg(keep, unsummarized_segno, wal_segment_size);
                               8068         [ +  + ]:              2 :         if (unsummarized_segno < segno)
                               8069                 :              1 :             segno = unsummarized_segno;
                               8070                 :                :     }
                               8071                 :                : 
                               8072                 :                :     /* but, keep at least wal_keep_size if that's set */
 1977 fujii@postgresql.org     8073         [ +  + ]:           2104 :     if (wal_keep_size_mb > 0)
                               8074                 :                :     {
                               8075                 :                :         uint64      keep_segs;
                               8076                 :                : 
                               8077                 :             74 :         keep_segs = ConvertToXSegs(wal_keep_size_mb, wal_segment_size);
                               8078         [ +  - ]:             74 :         if (currSegNo - segno < keep_segs)
                               8079                 :                :         {
                               8080                 :                :             /* avoid underflow, don't go below 1 */
                               8081         [ +  + ]:             74 :             if (currSegNo <= keep_segs)
                               8082                 :             70 :                 segno = 1;
                               8083                 :                :             else
                               8084                 :              4 :                 segno = currSegNo - keep_segs;
                               8085                 :                :         }
                               8086                 :                :     }
                               8087                 :                : 
                               8088                 :                :     /* don't delete WAL segments newer than the calculated segment */
 1984 alvherre@alvh.no-ip.     8089         [ +  + ]:           2104 :     if (segno < *logSegNo)
 4925 heikki.linnakangas@i     8090                 :            313 :         *logSegNo = segno;
 5266 simon@2ndQuadrant.co     8091                 :           2104 : }
                               8092                 :                : 
                               8093                 :                : /*
                               8094                 :                :  * Write a NEXTOID log record
                               8095                 :                :  */
                               8096                 :                : void
 9176 vadim4o@yahoo.com        8097                 :            606 : XLogPutNextOid(Oid nextOid)
                               8098                 :                : {
 4046 heikki.linnakangas@i     8099                 :            606 :     XLogBeginInsert();
  310 peter@eisentraut.org     8100                 :            606 :     XLogRegisterData(&nextOid, sizeof(Oid));
 4046 heikki.linnakangas@i     8101                 :            606 :     (void) XLogInsert(RM_XLOG_ID, XLOG_NEXTOID);
                               8102                 :                : 
                               8103                 :                :     /*
                               8104                 :                :      * We need not flush the NEXTOID record immediately, because any of the
                               8105                 :                :      * just-allocated OIDs could only reach disk as part of a tuple insert or
                               8106                 :                :      * update that would have its own XLOG record that must follow the NEXTOID
                               8107                 :                :      * record.  Therefore, the standard buffer LSN interlock applied to those
                               8108                 :                :      * records will ensure no such OID reaches disk before the NEXTOID record
                               8109                 :                :      * does.
                               8110                 :                :      *
                               8111                 :                :      * Note, however, that the above statement only covers state "within" the
                               8112                 :                :      * database.  When we use a generated OID as a file or directory name, we
                               8113                 :                :      * are in a sense violating the basic WAL rule, because that filesystem
                               8114                 :                :      * change may reach disk before the NEXTOID WAL record does.  The impact
                               8115                 :                :      * of this is that if a database crash occurs immediately afterward, we
                               8116                 :                :      * might after restart re-generate the same OID and find that it conflicts
                               8117                 :                :      * with the leftover file or directory.  But since for safety's sake we
                               8118                 :                :      * always loop until finding a nonconflicting filename, this poses no real
                               8119                 :                :      * problem in practice. See pgsql-hackers discussion 27-Sep-2006.
                               8120                 :                :      */
 7539 tgl@sss.pgh.pa.us        8121                 :            606 : }
                               8122                 :                : 
                               8123                 :                : /*
                               8124                 :                :  * Write an XLOG SWITCH record.
                               8125                 :                :  *
                               8126                 :                :  * Here we just blindly issue an XLogInsert request for the record.
                               8127                 :                :  * All the magic happens inside XLogInsert.
                               8128                 :                :  *
                               8129                 :                :  * The return value is either the end+1 address of the switch record,
                               8130                 :                :  * or the end+1 address of the prior segment if we did not need to
                               8131                 :                :  * write a switch record because we are already at segment start.
                               8132                 :                :  */
                               8133                 :                : XLogRecPtr
 3283 andres@anarazel.de       8134                 :            710 : RequestXLogSwitch(bool mark_unimportant)
                               8135                 :                : {
                               8136                 :                :     XLogRecPtr  RecPtr;
                               8137                 :                : 
                               8138                 :                :     /* XLOG SWITCH has no data */
 4046 heikki.linnakangas@i     8139                 :            710 :     XLogBeginInsert();
                               8140                 :                : 
 3283 andres@anarazel.de       8141         [ -  + ]:            710 :     if (mark_unimportant)
 3283 andres@anarazel.de       8142                 :UBC           0 :         XLogSetRecordFlags(XLOG_MARK_UNIMPORTANT);
 4046 heikki.linnakangas@i     8143                 :CBC         710 :     RecPtr = XLogInsert(RM_XLOG_ID, XLOG_SWITCH);
                               8144                 :                : 
 7074 tgl@sss.pgh.pa.us        8145                 :            710 :     return RecPtr;
                               8146                 :                : }
                               8147                 :                : 
                               8148                 :                : /*
                               8149                 :                :  * Write a RESTORE POINT record
                               8150                 :                :  */
                               8151                 :                : XLogRecPtr
 5427 simon@2ndQuadrant.co     8152                 :              3 : XLogRestorePoint(const char *rpName)
                               8153                 :                : {
                               8154                 :                :     XLogRecPtr  RecPtr;
                               8155                 :                :     xl_restore_point xlrec;
                               8156                 :                : 
                               8157                 :              3 :     xlrec.rp_time = GetCurrentTimestamp();
 4322 tgl@sss.pgh.pa.us        8158                 :              3 :     strlcpy(xlrec.rp_name, rpName, MAXFNAMELEN);
                               8159                 :                : 
 4046 heikki.linnakangas@i     8160                 :              3 :     XLogBeginInsert();
  310 peter@eisentraut.org     8161                 :              3 :     XLogRegisterData(&xlrec, sizeof(xl_restore_point));
                               8162                 :                : 
 4046 heikki.linnakangas@i     8163                 :              3 :     RecPtr = XLogInsert(RM_XLOG_ID, XLOG_RESTORE_POINT);
                               8164                 :                : 
 5411 rhaas@postgresql.org     8165         [ +  - ]:              3 :     ereport(LOG,
                               8166                 :                :             errmsg("restore point \"%s\" created at %X/%08X",
                               8167                 :                :                    rpName, LSN_FORMAT_ARGS(RecPtr)));
                               8168                 :                : 
 5427 simon@2ndQuadrant.co     8169                 :              3 :     return RecPtr;
                               8170                 :                : }
                               8171                 :                : 
                               8172                 :                : /*
                               8173                 :                :  * Check if any of the GUC parameters that are critical for hot standby
                               8174                 :                :  * have changed, and update the value in pg_control file if necessary.
                               8175                 :                :  */
                               8176                 :                : static void
 5713 heikki.linnakangas@i     8177                 :            872 : XLogReportParameters(void)
                               8178                 :                : {
                               8179         [ +  + ]:            872 :     if (wal_level != ControlFile->wal_level ||
 4369 rhaas@postgresql.org     8180         [ +  + ]:            647 :         wal_log_hints != ControlFile->wal_log_hints ||
 5713 heikki.linnakangas@i     8181         [ +  + ]:            563 :         MaxConnections != ControlFile->MaxConnections ||
 4550 rhaas@postgresql.org     8182         [ +  + ]:            562 :         max_worker_processes != ControlFile->max_worker_processes ||
 2501 michael@paquier.xyz      8183         [ +  + ]:            561 :         max_wal_senders != ControlFile->max_wal_senders ||
 5713 heikki.linnakangas@i     8184         [ +  + ]:            538 :         max_prepared_xacts != ControlFile->max_prepared_xacts ||
 4033 alvherre@alvh.no-ip.     8185         [ +  - ]:            448 :         max_locks_per_xact != ControlFile->max_locks_per_xact ||
                               8186         [ +  + ]:            448 :         track_commit_timestamp != ControlFile->track_commit_timestamp)
                               8187                 :                :     {
                               8188                 :                :         /*
                               8189                 :                :          * The change in number of backend slots doesn't need to be WAL-logged
                               8190                 :                :          * if archiving is not enabled, as you can't start archive recovery
                               8191                 :                :          * with wal_level=minimal anyway. We don't really care about the
                               8192                 :                :          * values in pg_control either if wal_level=minimal, but seems better
                               8193                 :                :          * to keep them up-to-date to avoid confusion.
                               8194                 :                :          */
 5713 heikki.linnakangas@i     8195   [ +  +  +  + ]:            435 :         if (wal_level != ControlFile->wal_level || XLogIsNeeded())
                               8196                 :                :         {
                               8197                 :                :             xl_parameter_change xlrec;
                               8198                 :                :             XLogRecPtr  recptr;
                               8199                 :                : 
                               8200                 :            412 :             xlrec.MaxConnections = MaxConnections;
 4550 rhaas@postgresql.org     8201                 :            412 :             xlrec.max_worker_processes = max_worker_processes;
 2501 michael@paquier.xyz      8202                 :            412 :             xlrec.max_wal_senders = max_wal_senders;
 5713 heikki.linnakangas@i     8203                 :            412 :             xlrec.max_prepared_xacts = max_prepared_xacts;
                               8204                 :            412 :             xlrec.max_locks_per_xact = max_locks_per_xact;
                               8205                 :            412 :             xlrec.wal_level = wal_level;
 4369 rhaas@postgresql.org     8206                 :            412 :             xlrec.wal_log_hints = wal_log_hints;
 4033 alvherre@alvh.no-ip.     8207                 :            412 :             xlrec.track_commit_timestamp = track_commit_timestamp;
                               8208                 :                : 
 4046 heikki.linnakangas@i     8209                 :            412 :             XLogBeginInsert();
  310 peter@eisentraut.org     8210                 :            412 :             XLogRegisterData(&xlrec, sizeof(xlrec));
                               8211                 :                : 
 4046 heikki.linnakangas@i     8212                 :            412 :             recptr = XLogInsert(RM_XLOG_ID, XLOG_PARAMETER_CHANGE);
 4285 fujii@postgresql.org     8213                 :            412 :             XLogFlush(recptr);
                               8214                 :                :         }
                               8215                 :                : 
 2019 tmunro@postgresql.or     8216                 :            435 :         LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               8217                 :                : 
 5713 heikki.linnakangas@i     8218                 :            435 :         ControlFile->MaxConnections = MaxConnections;
 4550 rhaas@postgresql.org     8219                 :            435 :         ControlFile->max_worker_processes = max_worker_processes;
 2501 michael@paquier.xyz      8220                 :            435 :         ControlFile->max_wal_senders = max_wal_senders;
 5713 heikki.linnakangas@i     8221                 :            435 :         ControlFile->max_prepared_xacts = max_prepared_xacts;
                               8222                 :            435 :         ControlFile->max_locks_per_xact = max_locks_per_xact;
                               8223                 :            435 :         ControlFile->wal_level = wal_level;
 4369 rhaas@postgresql.org     8224                 :            435 :         ControlFile->wal_log_hints = wal_log_hints;
 4033 alvherre@alvh.no-ip.     8225                 :            435 :         ControlFile->track_commit_timestamp = track_commit_timestamp;
 5713 heikki.linnakangas@i     8226                 :            435 :         UpdateControlFile();
                               8227                 :                : 
 2019 tmunro@postgresql.or     8228                 :            435 :         LWLockRelease(ControlFileLock);
                               8229                 :                :     }
 5811 heikki.linnakangas@i     8230                 :            872 : }
                               8231                 :                : 
                               8232                 :                : /*
                               8233                 :                :  * Update full_page_writes in shared memory, and write an
                               8234                 :                :  * XLOG_FPW_CHANGE record if necessary.
                               8235                 :                :  *
                               8236                 :                :  * Note: this function assumes there is no other process running
                               8237                 :                :  * concurrently that could update it.
                               8238                 :                :  */
                               8239                 :                : void
 5076 simon@2ndQuadrant.co     8240                 :           1454 : UpdateFullPageWrites(void)
                               8241                 :                : {
                               8242                 :           1454 :     XLogCtlInsert *Insert = &XLogCtl->Insert;
                               8243                 :                :     bool        recoveryInProgress;
                               8244                 :                : 
                               8245                 :                :     /*
                               8246                 :                :      * Do nothing if full_page_writes has not been changed.
                               8247                 :                :      *
                               8248                 :                :      * It's safe to check the shared full_page_writes without the lock,
                               8249                 :                :      * because we assume that there is no concurrently running process which
                               8250                 :                :      * can update it.
                               8251                 :                :      */
                               8252         [ +  + ]:           1454 :     if (fullPageWrites == Insert->fullPageWrites)
                               8253                 :            998 :         return;
                               8254                 :                : 
                               8255                 :                :     /*
                               8256                 :                :      * Perform this outside critical section so that the WAL insert
                               8257                 :                :      * initialization done by RecoveryInProgress() doesn't trigger an
                               8258                 :                :      * assertion failure.
                               8259                 :                :      */
 2638 akapila@postgresql.o     8260                 :            456 :     recoveryInProgress = RecoveryInProgress();
                               8261                 :                : 
 5035 heikki.linnakangas@i     8262                 :            456 :     START_CRIT_SECTION();
                               8263                 :                : 
                               8264                 :                :     /*
                               8265                 :                :      * It's always safe to take full page images, even when not strictly
                               8266                 :                :      * required, but not the other round. So if we're setting full_page_writes
                               8267                 :                :      * to true, first set it true and then write the WAL record. If we're
                               8268                 :                :      * setting it to false, first write the WAL record and then set the global
                               8269                 :                :      * flag.
                               8270                 :                :      */
                               8271         [ +  + ]:            456 :     if (fullPageWrites)
                               8272                 :                :     {
 4290                          8273                 :            445 :         WALInsertLockAcquireExclusive();
 5035                          8274                 :            445 :         Insert->fullPageWrites = true;
 4290                          8275                 :            445 :         WALInsertLockRelease();
                               8276                 :                :     }
                               8277                 :                : 
                               8278                 :                :     /*
                               8279                 :                :      * Write an XLOG_FPW_CHANGE record. This allows us to keep track of
                               8280                 :                :      * full_page_writes during archive recovery, if required.
                               8281                 :                :      */
 2638 akapila@postgresql.o     8282   [ +  +  -  + ]:            456 :     if (XLogStandbyInfoActive() && !recoveryInProgress)
                               8283                 :                :     {
 4046 heikki.linnakangas@i     8284                 :UBC           0 :         XLogBeginInsert();
  310 peter@eisentraut.org     8285                 :              0 :         XLogRegisterData(&fullPageWrites, sizeof(bool));
                               8286                 :                : 
 4046 heikki.linnakangas@i     8287                 :              0 :         XLogInsert(RM_XLOG_ID, XLOG_FPW_CHANGE);
                               8288                 :                :     }
                               8289                 :                : 
 5035 heikki.linnakangas@i     8290         [ +  + ]:CBC         456 :     if (!fullPageWrites)
                               8291                 :                :     {
 4290                          8292                 :             11 :         WALInsertLockAcquireExclusive();
 5035                          8293                 :             11 :         Insert->fullPageWrites = false;
 4290                          8294                 :             11 :         WALInsertLockRelease();
                               8295                 :                :     }
 5035                          8296         [ -  + ]:            456 :     END_CRIT_SECTION();
                               8297                 :                : }
                               8298                 :                : 
                               8299                 :                : /*
                               8300                 :                :  * XLOG resource manager's routines
                               8301                 :                :  *
                               8302                 :                :  * Definitions of info values are in include/catalog/pg_control.h, though
                               8303                 :                :  * not all record types are related to control file updates.
                               8304                 :                :  *
                               8305                 :                :  * NOTE: Some XLOG record types that are directly related to WAL recovery
                               8306                 :                :  * are handled in xlogrecovery_redo().
                               8307                 :                :  */
                               8308                 :                : void
 4046                          8309                 :          41488 : xlog_redo(XLogReaderState *record)
                               8310                 :                : {
                               8311                 :          41488 :     uint8       info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
                               8312                 :          41488 :     XLogRecPtr  lsn = record->EndRecPtr;
                               8313                 :                : 
                               8314                 :                :     /*
                               8315                 :                :      * In XLOG rmgr, backup blocks are only used by XLOG_FPI and
                               8316                 :                :      * XLOG_FPI_FOR_HINT records.
                               8317                 :                :      */
 4042                          8318   [ +  +  +  +  :          41488 :     Assert(info == XLOG_FPI || info == XLOG_FPI_FOR_HINT ||
                                              -  + ]
                               8319                 :                :            !XLogRecHasAnyBlockRefs(record));
                               8320                 :                : 
 9041 tgl@sss.pgh.pa.us        8321         [ +  + ]:          41488 :     if (info == XLOG_NEXTOID)
                               8322                 :                :     {
                               8323                 :                :         Oid         nextOid;
                               8324                 :                : 
                               8325                 :                :         /*
                               8326                 :                :          * We used to try to take the maximum of TransamVariables->nextOid and
                               8327                 :                :          * the recorded nextOid, but that fails if the OID counter wraps
                               8328                 :                :          * around.  Since no OID allocation should be happening during replay
                               8329                 :                :          * anyway, better to just believe the record exactly.  We still take
                               8330                 :                :          * OidGenLock while setting the variable, just in case.
                               8331                 :                :          */
 9176 vadim4o@yahoo.com        8332                 :             91 :         memcpy(&nextOid, XLogRecGetData(record), sizeof(Oid));
 5064 tgl@sss.pgh.pa.us        8333                 :             91 :         LWLockAcquire(OidGenLock, LW_EXCLUSIVE);
  741 heikki.linnakangas@i     8334                 :             91 :         TransamVariables->nextOid = nextOid;
                               8335                 :             91 :         TransamVariables->oidCount = 0;
 5064 tgl@sss.pgh.pa.us        8336                 :             91 :         LWLockRelease(OidGenLock);
                               8337                 :                :     }
 9046                          8338         [ +  + ]:          41397 :     else if (info == XLOG_CHECKPOINT_SHUTDOWN)
                               8339                 :                :     {
                               8340                 :                :         CheckPoint  checkPoint;
                               8341                 :                :         TimeLineID  replayTLI;
                               8342                 :                : 
                               8343                 :             31 :         memcpy(&checkPoint, XLogRecGetData(record), sizeof(CheckPoint));
                               8344                 :                :         /* In a SHUTDOWN checkpoint, believe the counters exactly */
 5064                          8345                 :             31 :         LWLockAcquire(XidGenLock, LW_EXCLUSIVE);
  741 heikki.linnakangas@i     8346                 :             31 :         TransamVariables->nextXid = checkPoint.nextXid;
 5064 tgl@sss.pgh.pa.us        8347                 :             31 :         LWLockRelease(XidGenLock);
                               8348                 :             31 :         LWLockAcquire(OidGenLock, LW_EXCLUSIVE);
  741 heikki.linnakangas@i     8349                 :             31 :         TransamVariables->nextOid = checkPoint.nextOid;
                               8350                 :             31 :         TransamVariables->oidCount = 0;
 5064 tgl@sss.pgh.pa.us        8351                 :             31 :         LWLockRelease(OidGenLock);
 7498                          8352                 :             31 :         MultiXactSetNextMXact(checkPoint.nextMulti,
                               8353                 :                :                               checkPoint.nextMultiOffset);
                               8354                 :                : 
 3736 andres@anarazel.de       8355                 :             31 :         MultiXactAdvanceOldest(checkPoint.oldestMulti,
                               8356                 :                :                                checkPoint.oldestMultiDB);
                               8357                 :                : 
                               8358                 :                :         /*
                               8359                 :                :          * No need to set oldestClogXid here as well; it'll be set when we
                               8360                 :                :          * redo an xl_clog_truncate if it changed since initialization.
                               8361                 :                :          */
 5783 tgl@sss.pgh.pa.us        8362                 :             31 :         SetTransactionIdLimit(checkPoint.oldestXid, checkPoint.oldestXidDB);
                               8363                 :                : 
                               8364                 :                :         /*
                               8365                 :                :          * If we see a shutdown checkpoint while waiting for an end-of-backup
                               8366                 :                :          * record, the backup was canceled and the end-of-backup record will
                               8367                 :                :          * never arrive.
                               8368                 :                :          */
 4682 heikki.linnakangas@i     8369         [ +  - ]:             31 :         if (ArchiveRecoveryRequested &&
   42 alvherre@kurilemu.de     8370         [ -  + ]:GNC          31 :             XLogRecPtrIsValid(ControlFile->backupStartPoint) &&
   42 alvherre@kurilemu.de     8371         [ #  # ]:UNC           0 :             !XLogRecPtrIsValid(ControlFile->backupEndPoint))
 5064 tgl@sss.pgh.pa.us        8372         [ #  # ]:UBC           0 :             ereport(PANIC,
                               8373                 :                :                     (errmsg("online backup was canceled, recovery cannot continue")));
                               8374                 :                : 
                               8375                 :                :         /*
                               8376                 :                :          * If we see a shutdown checkpoint, we know that nothing was running
                               8377                 :                :          * on the primary at this point. So fake-up an empty running-xacts
                               8378                 :                :          * record and use that here and now. Recover additional standby state
                               8379                 :                :          * for prepared transactions.
                               8380                 :                :          */
 5843 simon@2ndQuadrant.co     8381         [ +  + ]:CBC          31 :         if (standbyState >= STANDBY_INITIALIZED)
                               8382                 :                :         {
                               8383                 :                :             TransactionId *xids;
                               8384                 :                :             int         nxids;
                               8385                 :                :             TransactionId oldestActiveXID;
                               8386                 :                :             TransactionId latestCompletedXid;
                               8387                 :                :             RunningTransactionsData running;
                               8388                 :                : 
 5728 heikki.linnakangas@i     8389                 :             29 :             oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
                               8390                 :                : 
                               8391                 :                :             /* Update pg_subtrans entries for any prepared transactions */
  539                          8392                 :             29 :             StandbyRecoverPreparedTransactions();
                               8393                 :                : 
                               8394                 :                :             /*
                               8395                 :                :              * Construct a RunningTransactions snapshot representing a shut
                               8396                 :                :              * down server, with only prepared transactions still alive. We're
                               8397                 :                :              * never overflowed at this point because all subxids are listed
                               8398                 :                :              * with their parent prepared transactions.
                               8399                 :                :              */
 5728                          8400                 :             29 :             running.xcnt = nxids;
 4764 simon@2ndQuadrant.co     8401                 :             29 :             running.subxcnt = 0;
  539 heikki.linnakangas@i     8402                 :             29 :             running.subxid_status = SUBXIDS_IN_SUBTRANS;
 1955 andres@anarazel.de       8403                 :             29 :             running.nextXid = XidFromFullTransactionId(checkPoint.nextXid);
 5728 heikki.linnakangas@i     8404                 :             29 :             running.oldestRunningXid = oldestActiveXID;
 1955 andres@anarazel.de       8405                 :             29 :             latestCompletedXid = XidFromFullTransactionId(checkPoint.nextXid);
 5698 simon@2ndQuadrant.co     8406         [ -  + ]:             29 :             TransactionIdRetreat(latestCompletedXid);
 5697                          8407         [ -  + ]:             29 :             Assert(TransactionIdIsNormal(latestCompletedXid));
 5698                          8408                 :             29 :             running.latestCompletedXid = latestCompletedXid;
 5728 heikki.linnakangas@i     8409                 :             29 :             running.xids = xids;
                               8410                 :                : 
                               8411                 :             29 :             ProcArrayApplyRecoveryInfo(&running);
                               8412                 :                :         }
                               8413                 :                : 
                               8414                 :                :         /* ControlFile->checkPointCopy always tracks the latest ckpt XID */
 2019 tmunro@postgresql.or     8415                 :             31 :         LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 1955 andres@anarazel.de       8416                 :             31 :         ControlFile->checkPointCopy.nextXid = checkPoint.nextXid;
 2019 tmunro@postgresql.or     8417                 :             31 :         LWLockRelease(ControlFileLock);
                               8418                 :                : 
                               8419                 :                :         /*
                               8420                 :                :          * We should've already switched to the new TLI before replaying this
                               8421                 :                :          * record.
                               8422                 :                :          */
 1401 heikki.linnakangas@i     8423                 :             31 :         (void) GetCurrentReplayRecPtr(&replayTLI);
 1504 rhaas@postgresql.org     8424         [ -  + ]:             31 :         if (checkPoint.ThisTimeLineID != replayTLI)
 4759 heikki.linnakangas@i     8425         [ #  # ]:UBC           0 :             ereport(PANIC,
                               8426                 :                :                     (errmsg("unexpected timeline ID %u (should be %u) in shutdown checkpoint record",
                               8427                 :                :                             checkPoint.ThisTimeLineID, replayTLI)));
                               8428                 :                : 
 1485 rhaas@postgresql.org     8429                 :CBC          31 :         RecoveryRestartPoint(&checkPoint, record);
                               8430                 :                : 
                               8431                 :                :         /*
                               8432                 :                :          * After replaying a checkpoint record, free all smgr objects.
                               8433                 :                :          * Otherwise we would never do so for dropped relations, as the
                               8434                 :                :          * startup does not process shared invalidation messages or call
                               8435                 :                :          * AtEOXact_SMgr().
                               8436                 :                :          */
   99 michael@paquier.xyz      8437                 :             31 :         smgrdestroyall();
                               8438                 :                :     }
 9046 tgl@sss.pgh.pa.us        8439         [ +  + ]:          41366 :     else if (info == XLOG_CHECKPOINT_ONLINE)
                               8440                 :                :     {
                               8441                 :                :         CheckPoint  checkPoint;
                               8442                 :                :         TimeLineID  replayTLI;
                               8443                 :                : 
                               8444                 :            671 :         memcpy(&checkPoint, XLogRecGetData(record), sizeof(CheckPoint));
                               8445                 :                :         /* In an ONLINE checkpoint, treat the XID counter as a minimum */
 5064                          8446                 :            671 :         LWLockAcquire(XidGenLock, LW_EXCLUSIVE);
  741 heikki.linnakangas@i     8447         [ -  + ]:            671 :         if (FullTransactionIdPrecedes(TransamVariables->nextXid,
                               8448                 :                :                                       checkPoint.nextXid))
  741 heikki.linnakangas@i     8449                 :UBC           0 :             TransamVariables->nextXid = checkPoint.nextXid;
 5064 tgl@sss.pgh.pa.us        8450                 :CBC         671 :         LWLockRelease(XidGenLock);
                               8451                 :                : 
                               8452                 :                :         /*
                               8453                 :                :          * We ignore the nextOid counter in an ONLINE checkpoint, preferring
                               8454                 :                :          * to track OID assignment through XLOG_NEXTOID records.  The nextOid
                               8455                 :                :          * counter is from the start of the checkpoint and might well be stale
                               8456                 :                :          * compared to later XLOG_NEXTOID records.  We could try to take the
                               8457                 :                :          * maximum of the nextOid counter and our latest value, but since
                               8458                 :                :          * there's no particular guarantee about the speed with which the OID
                               8459                 :                :          * counter wraps around, that's a risky thing to do.  In any case,
                               8460                 :                :          * users of the nextOid counter are required to avoid assignment of
                               8461                 :                :          * duplicates, so that a somewhat out-of-date value should be safe.
                               8462                 :                :          */
                               8463                 :                : 
                               8464                 :                :         /* Handle multixact */
 7498                          8465                 :            671 :         MultiXactAdvanceNextMXact(checkPoint.nextMulti,
                               8466                 :                :                                   checkPoint.nextMultiOffset);
                               8467                 :                : 
                               8468                 :                :         /*
                               8469                 :                :          * NB: This may perform multixact truncation when replaying WAL
                               8470                 :                :          * generated by an older primary.
                               8471                 :                :          */
 3736 andres@anarazel.de       8472                 :            671 :         MultiXactAdvanceOldest(checkPoint.oldestMulti,
                               8473                 :                :                                checkPoint.oldestMultiDB);
  741 heikki.linnakangas@i     8474         [ -  + ]:            671 :         if (TransactionIdPrecedes(TransamVariables->oldestXid,
                               8475                 :                :                                   checkPoint.oldestXid))
 5783 tgl@sss.pgh.pa.us        8476                 :UBC           0 :             SetTransactionIdLimit(checkPoint.oldestXid,
                               8477                 :                :                                   checkPoint.oldestXidDB);
                               8478                 :                :         /* ControlFile->checkPointCopy always tracks the latest ckpt XID */
 2019 tmunro@postgresql.or     8479                 :CBC         671 :         LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 1955 andres@anarazel.de       8480                 :            671 :         ControlFile->checkPointCopy.nextXid = checkPoint.nextXid;
 2019 tmunro@postgresql.or     8481                 :            671 :         LWLockRelease(ControlFileLock);
                               8482                 :                : 
                               8483                 :                :         /* TLI should not change in an on-line checkpoint */
 1401 heikki.linnakangas@i     8484                 :            671 :         (void) GetCurrentReplayRecPtr(&replayTLI);
 1504 rhaas@postgresql.org     8485         [ -  + ]:            671 :         if (checkPoint.ThisTimeLineID != replayTLI)
 7981 tgl@sss.pgh.pa.us        8486         [ #  # ]:UBC           0 :             ereport(PANIC,
                               8487                 :                :                     (errmsg("unexpected timeline ID %u (should be %u) in online checkpoint record",
                               8488                 :                :                             checkPoint.ThisTimeLineID, replayTLI)));
                               8489                 :                : 
 1485 rhaas@postgresql.org     8490                 :CBC         671 :         RecoveryRestartPoint(&checkPoint, record);
                               8491                 :                : 
                               8492                 :                :         /*
                               8493                 :                :          * After replaying a checkpoint record, free all smgr objects.
                               8494                 :                :          * Otherwise we would never do so for dropped relations, as the
                               8495                 :                :          * startup does not process shared invalidation messages or call
                               8496                 :                :          * AtEOXact_SMgr().
                               8497                 :                :          */
   99 michael@paquier.xyz      8498                 :            671 :         smgrdestroyall();
                               8499                 :                :     }
 1541 alvherre@alvh.no-ip.     8500         [ +  + ]:          40695 :     else if (info == XLOG_OVERWRITE_CONTRECORD)
                               8501                 :                :     {
                               8502                 :                :         /* nothing to do here, handled in xlogrecovery_redo() */
                               8503                 :                :     }
 4706 simon@2ndQuadrant.co     8504         [ +  + ]:          40694 :     else if (info == XLOG_END_OF_RECOVERY)
                               8505                 :                :     {
                               8506                 :                :         xl_end_of_recovery xlrec;
                               8507                 :                :         TimeLineID  replayTLI;
                               8508                 :                : 
                               8509                 :              9 :         memcpy(&xlrec, XLogRecGetData(record), sizeof(xl_end_of_recovery));
                               8510                 :                : 
                               8511                 :                :         /*
                               8512                 :                :          * For Hot Standby, we could treat this like a Shutdown Checkpoint,
                               8513                 :                :          * but this case is rarer and harder to test, so the benefit doesn't
                               8514                 :                :          * outweigh the potential extra cost of maintenance.
                               8515                 :                :          */
                               8516                 :                : 
                               8517                 :                :         /*
                               8518                 :                :          * We should've already switched to the new TLI before replaying this
                               8519                 :                :          * record.
                               8520                 :                :          */
 1401 heikki.linnakangas@i     8521                 :              9 :         (void) GetCurrentReplayRecPtr(&replayTLI);
 1504 rhaas@postgresql.org     8522         [ -  + ]:              9 :         if (xlrec.ThisTimeLineID != replayTLI)
 4706 simon@2ndQuadrant.co     8523         [ #  # ]:UBC           0 :             ereport(PANIC,
                               8524                 :                :                     (errmsg("unexpected timeline ID %u (should be %u) in end-of-recovery record",
                               8525                 :                :                             xlrec.ThisTimeLineID, replayTLI)));
                               8526                 :                :     }
 6787 tgl@sss.pgh.pa.us        8527         [ +  - ]:CBC       40685 :     else if (info == XLOG_NOOP)
                               8528                 :                :     {
                               8529                 :                :         /* nothing to do here */
                               8530                 :                :     }
 7074                          8531         [ +  + ]:          40685 :     else if (info == XLOG_SWITCH)
                               8532                 :                :     {
                               8533                 :                :         /* nothing to do here */
                               8534                 :                :     }
 5427 simon@2ndQuadrant.co     8535         [ +  + ]:          40247 :     else if (info == XLOG_RESTORE_POINT)
                               8536                 :                :     {
                               8537                 :                :         /* nothing to do here, handled in xlogrecovery.c */
                               8538                 :                :     }
 4042 heikki.linnakangas@i     8539   [ +  +  +  + ]:          40242 :     else if (info == XLOG_FPI || info == XLOG_FPI_FOR_HINT)
                               8540                 :                :     {
                               8541                 :                :         /*
                               8542                 :                :          * XLOG_FPI records contain nothing else but one or more block
                               8543                 :                :          * references. Every block reference must include a full-page image
                               8544                 :                :          * even if full_page_writes was disabled when the record was generated
                               8545                 :                :          * - otherwise there would be no point in this record.
                               8546                 :                :          *
                               8547                 :                :          * XLOG_FPI_FOR_HINT records are generated when a page needs to be
                               8548                 :                :          * WAL-logged because of a hint bit update. They are only generated
                               8549                 :                :          * when checksums and/or wal_log_hints are enabled. They may include
                               8550                 :                :          * no full-page images if full_page_writes was disabled when they were
                               8551                 :                :          * generated. In this case there is nothing to do here.
                               8552                 :                :          *
                               8553                 :                :          * No recovery conflicts are generated by these generic records - if a
                               8554                 :                :          * resource manager needs to generate conflicts, it has to define a
                               8555                 :                :          * separate WAL record type and redo routine.
                               8556                 :                :          */
 1371 tmunro@postgresql.or     8557         [ +  + ]:          83729 :         for (uint8 block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
                               8558                 :                :         {
                               8559                 :                :             Buffer      buffer;
                               8560                 :                : 
 1611 fujii@postgresql.org     8561         [ +  + ]:          44273 :             if (!XLogRecHasBlockImage(record, block_id))
                               8562                 :                :             {
                               8563         [ -  + ]:             66 :                 if (info == XLOG_FPI)
 1611 fujii@postgresql.org     8564         [ #  # ]:UBC           0 :                     elog(ERROR, "XLOG_FPI record did not contain a full-page image");
 1611 fujii@postgresql.org     8565                 :CBC          66 :                 continue;
                               8566                 :                :             }
                               8567                 :                : 
 2451 heikki.linnakangas@i     8568         [ -  + ]:          44207 :             if (XLogReadBufferForRedo(record, block_id, &buffer) != BLK_RESTORED)
 2451 heikki.linnakangas@i     8569         [ #  # ]:UBC           0 :                 elog(ERROR, "unexpected XLogReadBufferForRedo result when restoring backup block");
 2451 heikki.linnakangas@i     8570                 :CBC       44207 :             UnlockReleaseBuffer(buffer);
                               8571                 :                :         }
                               8572                 :                :     }
 5827                          8573         [ +  + ]:            786 :     else if (info == XLOG_BACKUP_END)
                               8574                 :                :     {
                               8575                 :                :         /* nothing to do here, handled in xlogrecovery_redo() */
                               8576                 :                :     }
 5713                          8577         [ +  + ]:            703 :     else if (info == XLOG_PARAMETER_CHANGE)
                               8578                 :                :     {
                               8579                 :                :         xl_parameter_change xlrec;
                               8580                 :                : 
                               8581                 :                :         /* Update our copy of the parameters in pg_control */
                               8582                 :             31 :         memcpy(&xlrec, XLogRecGetData(record), sizeof(xl_parameter_change));
                               8583                 :                : 
                               8584                 :                :         /*
                               8585                 :                :          * Invalidate logical slots if we are in hot standby and the primary
                               8586                 :                :          * does not have a WAL level sufficient for logical decoding. No need
                               8587                 :                :          * to search for potentially conflicting logically slots if standby is
                               8588                 :                :          * running with wal_level lower than logical, because in that case, we
                               8589                 :                :          * would have either disallowed creation of logical slots or
                               8590                 :                :          * invalidated existing ones.
                               8591                 :                :          */
  986 andres@anarazel.de       8592   [ +  -  +  + ]:             31 :         if (InRecovery && InHotStandby &&
                               8593         [ +  + ]:             16 :             xlrec.wal_level < WAL_LEVEL_LOGICAL &&
                               8594         [ +  + ]:              6 :             wal_level >= WAL_LEVEL_LOGICAL)
                               8595                 :              3 :             InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_LEVEL,
                               8596                 :                :                                                0, InvalidOid,
                               8597                 :                :                                                InvalidTransactionId);
                               8598                 :                : 
 5708 heikki.linnakangas@i     8599                 :             31 :         LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
 5713                          8600                 :             31 :         ControlFile->MaxConnections = xlrec.MaxConnections;
 4550 rhaas@postgresql.org     8601                 :             31 :         ControlFile->max_worker_processes = xlrec.max_worker_processes;
 2501 michael@paquier.xyz      8602                 :             31 :         ControlFile->max_wal_senders = xlrec.max_wal_senders;
 5713 heikki.linnakangas@i     8603                 :             31 :         ControlFile->max_prepared_xacts = xlrec.max_prepared_xacts;
                               8604                 :             31 :         ControlFile->max_locks_per_xact = xlrec.max_locks_per_xact;
                               8605                 :             31 :         ControlFile->wal_level = xlrec.wal_level;
 3990                          8606                 :             31 :         ControlFile->wal_log_hints = xlrec.wal_log_hints;
                               8607                 :                : 
                               8608                 :                :         /*
                               8609                 :                :          * Update minRecoveryPoint to ensure that if recovery is aborted, we
                               8610                 :                :          * recover back up to this point before allowing hot standby again.
                               8611                 :                :          * This is important if the max_* settings are decreased, to ensure
                               8612                 :                :          * you don't run queries against the WAL preceding the change. The
                               8613                 :                :          * local copies cannot be updated as long as crash recovery is
                               8614                 :                :          * happening and we expect all the WAL to be replayed.
                               8615                 :                :          */
 2723 michael@paquier.xyz      8616         [ +  + ]:             31 :         if (InArchiveRecovery)
                               8617                 :                :         {
 1401 heikki.linnakangas@i     8618                 :             17 :             LocalMinRecoveryPoint = ControlFile->minRecoveryPoint;
                               8619                 :             17 :             LocalMinRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
                               8620                 :                :         }
   42 alvherre@kurilemu.de     8621   [ +  +  +  + ]:GNC          31 :         if (XLogRecPtrIsValid(LocalMinRecoveryPoint) && LocalMinRecoveryPoint < lsn)
                               8622                 :                :         {
                               8623                 :                :             TimeLineID  replayTLI;
                               8624                 :                : 
 1401 heikki.linnakangas@i     8625                 :CBC           6 :             (void) GetCurrentReplayRecPtr(&replayTLI);
 5708                          8626                 :              6 :             ControlFile->minRecoveryPoint = lsn;
 1504 rhaas@postgresql.org     8627                 :              6 :             ControlFile->minRecoveryPointTLI = replayTLI;
                               8628                 :                :         }
                               8629                 :                : 
 3731 alvherre@alvh.no-ip.     8630                 :             31 :         CommitTsParameterChange(xlrec.track_commit_timestamp,
                               8631                 :             31 :                                 ControlFile->track_commit_timestamp);
                               8632                 :             31 :         ControlFile->track_commit_timestamp = xlrec.track_commit_timestamp;
                               8633                 :                : 
 5713 heikki.linnakangas@i     8634                 :             31 :         UpdateControlFile();
 5708                          8635                 :             31 :         LWLockRelease(ControlFileLock);
                               8636                 :                : 
                               8637                 :                :         /* Check to see if any parameter change gives a problem on recovery */
 5713                          8638                 :             31 :         CheckRequiredParameterValues();
                               8639                 :                :     }
 5076 simon@2ndQuadrant.co     8640         [ -  + ]:            672 :     else if (info == XLOG_FPW_CHANGE)
                               8641                 :                :     {
                               8642                 :                :         bool        fpw;
                               8643                 :                : 
 5076 simon@2ndQuadrant.co     8644                 :UBC           0 :         memcpy(&fpw, XLogRecGetData(record), sizeof(bool));
                               8645                 :                : 
                               8646                 :                :         /*
                               8647                 :                :          * Update the LSN of the last replayed XLOG_FPW_CHANGE record so that
                               8648                 :                :          * do_pg_backup_start() and do_pg_backup_stop() can check whether
                               8649                 :                :          * full_page_writes has been disabled during online backup.
                               8650                 :                :          */
                               8651         [ #  # ]:              0 :         if (!fpw)
                               8652                 :                :         {
 4105 andres@anarazel.de       8653         [ #  # ]:              0 :             SpinLockAcquire(&XLogCtl->info_lck);
 1485 rhaas@postgresql.org     8654         [ #  # ]:              0 :             if (XLogCtl->lastFpwDisableRecPtr < record->ReadRecPtr)
                               8655                 :              0 :                 XLogCtl->lastFpwDisableRecPtr = record->ReadRecPtr;
 4105 andres@anarazel.de       8656                 :              0 :             SpinLockRelease(&XLogCtl->info_lck);
                               8657                 :                :         }
                               8658                 :                : 
                               8659                 :                :         /* Keep track of full_page_writes */
 5076 simon@2ndQuadrant.co     8660                 :              0 :         lastFullPageWrites = fpw;
                               8661                 :                :     }
                               8662                 :                :     else if (info == XLOG_CHECKPOINT_REDO)
                               8663                 :                :     {
                               8664                 :                :         /* nothing to do here, just for informational purposes */
                               8665                 :                :     }
 9189 vadim4o@yahoo.com        8666                 :CBC       41486 : }
                               8667                 :                : 
                               8668                 :                : /*
                               8669                 :                :  * Return the extra open flags used for opening a file, depending on the
                               8670                 :                :  * value of the GUCs wal_sync_method, fsync and debug_io_direct.
                               8671                 :                :  */
                               8672                 :                : static int
 6427 magnus@hagander.net      8673                 :          15861 : get_sync_bit(int method)
                               8674                 :                : {
 5774 bruce@momjian.us         8675                 :          15861 :     int         o_direct_flag = 0;
                               8676                 :                : 
                               8677                 :                :     /*
                               8678                 :                :      * Use O_DIRECT if requested, except in walreceiver process.  The WAL
                               8679                 :                :      * written by walreceiver is normally read by the startup process soon
                               8680                 :                :      * after it's written.  Also, walreceiver performs unaligned writes, which
                               8681                 :                :      * don't work with O_DIRECT, so it is required for correctness too.
                               8682                 :                :      */
  985 tmunro@postgresql.or     8683   [ +  +  +  - ]:          15861 :     if ((io_direct_flags & IO_DIRECT_WAL) && !AmWalReceiverProcess())
 5781 heikki.linnakangas@i     8684                 :              8 :         o_direct_flag = PG_O_DIRECT;
                               8685                 :                : 
                               8686                 :                :     /* If fsync is disabled, never open in sync mode */
  985 tmunro@postgresql.or     8687         [ +  - ]:          15861 :     if (!enableFsync)
                               8688                 :          15861 :         return o_direct_flag;
                               8689                 :                : 
 6427 magnus@hagander.net      8690   [ #  #  #  # ]:UBC           0 :     switch (method)
                               8691                 :                :     {
                               8692                 :                :             /*
                               8693                 :                :              * enum values for all sync options are defined even if they are
                               8694                 :                :              * not supported on the current platform.  But if not, they are
                               8695                 :                :              * not included in the enum option array, and therefore will never
                               8696                 :                :              * be seen here.
                               8697                 :                :              */
  797 nathan@postgresql.or     8698                 :              0 :         case WAL_SYNC_METHOD_FSYNC:
                               8699                 :                :         case WAL_SYNC_METHOD_FSYNC_WRITETHROUGH:
                               8700                 :                :         case WAL_SYNC_METHOD_FDATASYNC:
  985 tmunro@postgresql.or     8701                 :              0 :             return o_direct_flag;
                               8702                 :                : #ifdef O_SYNC
  797 nathan@postgresql.or     8703                 :              0 :         case WAL_SYNC_METHOD_OPEN:
 1245 tmunro@postgresql.or     8704                 :              0 :             return O_SYNC | o_direct_flag;
                               8705                 :                : #endif
                               8706                 :                : #ifdef O_DSYNC
  797 nathan@postgresql.or     8707                 :              0 :         case WAL_SYNC_METHOD_OPEN_DSYNC:
 1245 tmunro@postgresql.or     8708                 :              0 :             return O_DSYNC | o_direct_flag;
                               8709                 :                : #endif
 6429 magnus@hagander.net      8710                 :              0 :         default:
                               8711                 :                :             /* can't happen (unless we are out of sync with option array) */
  580 peter@eisentraut.org     8712         [ #  # ]:              0 :             elog(ERROR, "unrecognized \"wal_sync_method\": %d", method);
                               8713                 :                :             return 0;           /* silence warning */
                               8714                 :                :     }
                               8715                 :                : }
                               8716                 :                : 
                               8717                 :                : /*
                               8718                 :                :  * GUC support
                               8719                 :                :  */
                               8720                 :                : void
  797 nathan@postgresql.or     8721                 :CBC        1109 : assign_wal_sync_method(int new_wal_sync_method, void *extra)
                               8722                 :                : {
                               8723         [ -  + ]:           1109 :     if (wal_sync_method != new_wal_sync_method)
                               8724                 :                :     {
                               8725                 :                :         /*
                               8726                 :                :          * To ensure that no blocks escape unsynced, force an fsync on the
                               8727                 :                :          * currently open log segment (if any).  Also, if the open flag is
                               8728                 :                :          * changing, close the log file so it will be reopened (with new flag
                               8729                 :                :          * bit) at next use.
                               8730                 :                :          */
 9043 tgl@sss.pgh.pa.us        8731         [ #  # ]:UBC           0 :         if (openLogFile >= 0)
                               8732                 :                :         {
 3197 rhaas@postgresql.org     8733                 :              0 :             pgstat_report_wait_start(WAIT_EVENT_WAL_SYNC_METHOD_ASSIGN);
 9043 tgl@sss.pgh.pa.us        8734         [ #  # ]:              0 :             if (pg_fsync(openLogFile) != 0)
                               8735                 :                :             {
                               8736                 :                :                 char        xlogfname[MAXFNAMELEN];
                               8737                 :                :                 int         save_errno;
                               8738                 :                : 
 2207 michael@paquier.xyz      8739                 :              0 :                 save_errno = errno;
 1504 rhaas@postgresql.org     8740                 :              0 :                 XLogFileName(xlogfname, openLogTLI, openLogSegNo,
                               8741                 :                :                              wal_segment_size);
 2207 michael@paquier.xyz      8742                 :              0 :                 errno = save_errno;
 8186 tgl@sss.pgh.pa.us        8743         [ #  # ]:              0 :                 ereport(PANIC,
                               8744                 :                :                         (errcode_for_file_access(),
                               8745                 :                :                          errmsg("could not fsync file \"%s\": %m", xlogfname)));
                               8746                 :                :             }
                               8747                 :                : 
 3197 rhaas@postgresql.org     8748                 :              0 :             pgstat_report_wait_end();
  797 nathan@postgresql.or     8749         [ #  # ]:              0 :             if (get_sync_bit(wal_sync_method) != get_sync_bit(new_wal_sync_method))
 7126 bruce@momjian.us         8750                 :              0 :                 XLogFileClose();
                               8751                 :                :         }
                               8752                 :                :     }
 9043 tgl@sss.pgh.pa.us        8753                 :CBC        1109 : }
                               8754                 :                : 
                               8755                 :                : 
                               8756                 :                : /*
                               8757                 :                :  * Issue appropriate kind of fsync (if any) for an XLOG output file.
                               8758                 :                :  *
                               8759                 :                :  * 'fd' is a file descriptor for the XLOG file to be fsync'd.
                               8760                 :                :  * 'segno' is for error reporting purposes.
                               8761                 :                :  */
                               8762                 :                : void
 1504 rhaas@postgresql.org     8763                 :         173110 : issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
                               8764                 :                : {
 2207 michael@paquier.xyz      8765                 :         173110 :     char       *msg = NULL;
                               8766                 :                :     instr_time  start;
                               8767                 :                : 
 1504 rhaas@postgresql.org     8768         [ -  + ]:         173110 :     Assert(tli != 0);
                               8769                 :                : 
                               8770                 :                :     /*
                               8771                 :                :      * Quick exit if fsync is disabled or write() has already synced the WAL
                               8772                 :                :      * file.
                               8773                 :                :      */
 1745 fujii@postgresql.org     8774         [ -  + ]:         173110 :     if (!enableFsync ||
  797 nathan@postgresql.or     8775         [ #  # ]:UBC           0 :         wal_sync_method == WAL_SYNC_METHOD_OPEN ||
                               8776         [ #  # ]:              0 :         wal_sync_method == WAL_SYNC_METHOD_OPEN_DSYNC)
 1745 fujii@postgresql.org     8777                 :CBC      173110 :         return;
                               8778                 :                : 
                               8779                 :                :     /*
                               8780                 :                :      * Measure I/O timing to sync the WAL file for pg_stat_io.
                               8781                 :                :      */
  295 michael@paquier.xyz      8782                 :UBC           0 :     start = pgstat_prepare_io_time(track_wal_io_timing);
                               8783                 :                : 
 2726                          8784                 :              0 :     pgstat_report_wait_start(WAIT_EVENT_WAL_SYNC);
  797 nathan@postgresql.or     8785   [ #  #  #  # ]:              0 :     switch (wal_sync_method)
                               8786                 :                :     {
                               8787                 :              0 :         case WAL_SYNC_METHOD_FSYNC:
 5816 heikki.linnakangas@i     8788         [ #  # ]:              0 :             if (pg_fsync_no_writethrough(fd) != 0)
 2207 michael@paquier.xyz      8789                 :              0 :                 msg = _("could not fsync file \"%s\": %m");
 9043 tgl@sss.pgh.pa.us        8790                 :              0 :             break;
                               8791                 :                : #ifdef HAVE_FSYNC_WRITETHROUGH
                               8792                 :                :         case WAL_SYNC_METHOD_FSYNC_WRITETHROUGH:
                               8793                 :                :             if (pg_fsync_writethrough(fd) != 0)
                               8794                 :                :                 msg = _("could not fsync write-through file \"%s\": %m");
                               8795                 :                :             break;
                               8796                 :                : #endif
  797 nathan@postgresql.or     8797                 :              0 :         case WAL_SYNC_METHOD_FDATASYNC:
 5816 heikki.linnakangas@i     8798         [ #  # ]:              0 :             if (pg_fdatasync(fd) != 0)
 2207 michael@paquier.xyz      8799                 :              0 :                 msg = _("could not fdatasync file \"%s\": %m");
 9043 tgl@sss.pgh.pa.us        8800                 :              0 :             break;
  797 nathan@postgresql.or     8801                 :              0 :         case WAL_SYNC_METHOD_OPEN:
                               8802                 :                :         case WAL_SYNC_METHOD_OPEN_DSYNC:
                               8803                 :                :             /* not reachable */
 1745 fujii@postgresql.org     8804                 :              0 :             Assert(false);
                               8805                 :                :             break;
 9043 tgl@sss.pgh.pa.us        8806                 :              0 :         default:
  624 dgustafsson@postgres     8807         [ #  # ]:              0 :             ereport(PANIC,
                               8808                 :                :                     errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                               8809                 :                :                     errmsg_internal("unrecognized \"wal_sync_method\": %d", wal_sync_method));
                               8810                 :                :             break;
                               8811                 :                :     }
                               8812                 :                : 
                               8813                 :                :     /* PANIC if failed to fsync */
 2207 michael@paquier.xyz      8814         [ #  # ]:              0 :     if (msg)
                               8815                 :                :     {
                               8816                 :                :         char        xlogfname[MAXFNAMELEN];
                               8817                 :              0 :         int         save_errno = errno;
                               8818                 :                : 
 1504 rhaas@postgresql.org     8819                 :              0 :         XLogFileName(xlogfname, tli, segno, wal_segment_size);
 2207 michael@paquier.xyz      8820                 :              0 :         errno = save_errno;
                               8821         [ #  # ]:              0 :         ereport(PANIC,
                               8822                 :                :                 (errcode_for_file_access(),
                               8823                 :                :                  errmsg(msg, xlogfname)));
                               8824                 :                :     }
                               8825                 :                : 
                               8826                 :              0 :     pgstat_report_wait_end();
                               8827                 :                : 
  317                          8828                 :              0 :     pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL, IOOP_FSYNC,
                               8829                 :                :                             start, 1, 0);
                               8830                 :                : }
                               8831                 :                : 
                               8832                 :                : /*
                               8833                 :                :  * do_pg_backup_start is the workhorse of the user-visible pg_backup_start()
                               8834                 :                :  * function. It creates the necessary starting checkpoint and constructs the
                               8835                 :                :  * backup state and tablespace map.
                               8836                 :                :  *
                               8837                 :                :  * Input parameters are "state" (the backup state), "fast" (if true, we do
                               8838                 :                :  * the checkpoint in fast mode), and "tablespaces" (if non-NULL, indicates a
                               8839                 :                :  * list of tablespaceinfo structs describing the cluster's tablespaces.).
                               8840                 :                :  *
                               8841                 :                :  * The tablespace map contents are appended to passed-in parameter
                               8842                 :                :  * tablespace_map and the caller is responsible for including it in the backup
                               8843                 :                :  * archive as 'tablespace_map'. The tablespace_map file is required mainly for
                               8844                 :                :  * tar format in windows as native windows utilities are not able to create
                               8845                 :                :  * symlinks while extracting files from tar. However for consistency and
                               8846                 :                :  * platform-independence, we do it the same way everywhere.
                               8847                 :                :  *
                               8848                 :                :  * It fills in "state" with the information required for the backup, such
                               8849                 :                :  * as the minimum WAL location that must be present to restore from this
                               8850                 :                :  * backup (starttli) and the corresponding timeline ID (starttli).
                               8851                 :                :  *
                               8852                 :                :  * Every successfully started backup must be stopped by calling
                               8853                 :                :  * do_pg_backup_stop() or do_pg_abort_backup(). There can be many
                               8854                 :                :  * backups active at the same time.
                               8855                 :                :  *
                               8856                 :                :  * It is the responsibility of the caller of this function to verify the
                               8857                 :                :  * permissions of the calling user!
                               8858                 :                :  */
                               8859                 :                : void
 1179 michael@paquier.xyz      8860                 :CBC         168 : do_pg_backup_start(const char *backupidstr, bool fast, List **tablespaces,
                               8861                 :                :                    BackupState *state, StringInfo tblspcmapfile)
                               8862                 :                : {
                               8863                 :                :     bool        backup_started_in_recovery;
                               8864                 :                : 
                               8865         [ -  + ]:            168 :     Assert(state != NULL);
 5076 simon@2ndQuadrant.co     8866                 :            168 :     backup_started_in_recovery = RecoveryInProgress();
                               8867                 :                : 
                               8868                 :                :     /*
                               8869                 :                :      * During recovery, we don't need to check WAL level. Because, if WAL
                               8870                 :                :      * level is not sufficient, it's impossible to get here during recovery.
                               8871                 :                :      */
                               8872   [ +  +  -  + ]:            168 :     if (!backup_started_in_recovery && !XLogIsNeeded())
 6658 tgl@sss.pgh.pa.us        8873         [ #  # ]:UBC           0 :         ereport(ERROR,
                               8874                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               8875                 :                :                  errmsg("WAL level not sufficient for making an online backup"),
                               8876                 :                :                  errhint("\"wal_level\" must be set to \"replica\" or \"logical\" at server start.")));
                               8877                 :                : 
 5435 heikki.linnakangas@i     8878         [ +  + ]:CBC         168 :     if (strlen(backupidstr) > MAXPGPATH)
                               8879         [ +  - ]:              1 :         ereport(ERROR,
                               8880                 :                :                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                               8881                 :                :                  errmsg("backup label too long (max %d bytes)",
                               8882                 :                :                         MAXPGPATH)));
                               8883                 :                : 
  534 dgustafsson@postgres     8884                 :            167 :     strlcpy(state->name, backupidstr, sizeof(state->name));
                               8885                 :                : 
                               8886                 :                :     /*
                               8887                 :                :      * Mark backup active in shared memory.  We must do full-page WAL writes
                               8888                 :                :      * during an on-line backup even if not doing so at other times, because
                               8889                 :                :      * it's quite possible for the backup dump to obtain a "torn" (partially
                               8890                 :                :      * written) copy of a database page if it reads the page concurrently with
                               8891                 :                :      * our write to the same page.  This can be fixed as long as the first
                               8892                 :                :      * write to the page in the WAL sequence is a full-page write. Hence, we
                               8893                 :                :      * increment runningBackups then force a CHECKPOINT, to ensure there are
                               8894                 :                :      * no dirty pages in shared memory that might get dumped while the backup
                               8895                 :                :      * is in progress without having a corresponding WAL record.  (Once the
                               8896                 :                :      * backup is complete, we need not force full-page writes anymore, since
                               8897                 :                :      * we expect that any pages not modified during the backup interval must
                               8898                 :                :      * have been correctly captured by the backup.)
                               8899                 :                :      *
                               8900                 :                :      * Note that forcing full-page writes has no effect during an online
                               8901                 :                :      * backup from the standby.
                               8902                 :                :      *
                               8903                 :                :      * We must hold all the insertion locks to change the value of
                               8904                 :                :      * runningBackups, to ensure adequate interlocking against
                               8905                 :                :      * XLogInsertRecord().
                               8906                 :                :      */
 4290 heikki.linnakangas@i     8907                 :            167 :     WALInsertLockAcquireExclusive();
 1352 sfrost@snowman.net       8908                 :            167 :     XLogCtl->Insert.runningBackups++;
 4290 heikki.linnakangas@i     8909                 :            167 :     WALInsertLockRelease();
                               8910                 :                : 
                               8911                 :                :     /*
                               8912                 :                :      * Ensure we decrement runningBackups if we fail below. NB -- for this to
                               8913                 :                :      * work correctly, it is critical that sessionBackupState is only updated
                               8914                 :                :      * after this block is over.
                               8915                 :                :      */
  135 peter@eisentraut.org     8916         [ +  - ]:GNC         167 :     PG_ENSURE_ERROR_CLEANUP(do_pg_abort_backup, BoolGetDatum(true));
                               8917                 :                :     {
 5366 bruce@momjian.us         8918                 :CBC         167 :         bool        gotUniqueStartpoint = false;
                               8919                 :                :         DIR        *tblspcdir;
                               8920                 :                :         struct dirent *de;
                               8921                 :                :         tablespaceinfo *ti;
                               8922                 :                :         int         datadirpathlen;
                               8923                 :                : 
                               8924                 :                :         /*
                               8925                 :                :          * Force an XLOG file switch before the checkpoint, to ensure that the
                               8926                 :                :          * WAL segment the checkpoint is written to doesn't contain pages with
                               8927                 :                :          * old timeline IDs.  That would otherwise happen if you called
                               8928                 :                :          * pg_backup_start() right after restoring from a PITR archive: the
                               8929                 :                :          * first WAL segment containing the startup checkpoint has pages in
                               8930                 :                :          * the beginning with the old timeline ID.  That can cause trouble at
                               8931                 :                :          * recovery: we won't have a history file covering the old timeline if
                               8932                 :                :          * pg_wal directory was not included in the base backup and the WAL
                               8933                 :                :          * archive was cleared too before starting the backup.
                               8934                 :                :          *
                               8935                 :                :          * This also ensures that we have emitted a WAL page header that has
                               8936                 :                :          * XLP_BKP_REMOVABLE off before we emit the checkpoint record.
                               8937                 :                :          * Therefore, if a WAL archiver (such as pglesslog) is trying to
                               8938                 :                :          * compress out removable backup blocks, it won't remove any that
                               8939                 :                :          * occur after this point.
                               8940                 :                :          *
                               8941                 :                :          * During recovery, we skip forcing XLOG file switch, which means that
                               8942                 :                :          * the backup taken during recovery is not available for the special
                               8943                 :                :          * recovery case described above.
                               8944                 :                :          */
 5076 simon@2ndQuadrant.co     8945         [ +  + ]:            167 :         if (!backup_started_in_recovery)
 3283 andres@anarazel.de       8946                 :            161 :             RequestXLogSwitch(false);
                               8947                 :                : 
                               8948                 :                :         do
                               8949                 :                :         {
                               8950                 :                :             bool        checkpointfpw;
                               8951                 :                : 
                               8952                 :                :             /*
                               8953                 :                :              * Force a CHECKPOINT.  Aside from being necessary to prevent torn
                               8954                 :                :              * page problems, this guarantees that two successive backup runs
                               8955                 :                :              * will have different checkpoint positions and hence different
                               8956                 :                :              * history file names, even if nothing happened in between.
                               8957                 :                :              *
                               8958                 :                :              * During recovery, establish a restartpoint if possible. We use
                               8959                 :                :              * the last restartpoint as the backup starting checkpoint. This
                               8960                 :                :              * means that two successive backup runs can have same checkpoint
                               8961                 :                :              * positions.
                               8962                 :                :              *
                               8963                 :                :              * Since the fact that we are executing do_pg_backup_start()
                               8964                 :                :              * during recovery means that checkpointer is running, we can use
                               8965                 :                :              * RequestCheckpoint() to establish a restartpoint.
                               8966                 :                :              *
                               8967                 :                :              * We use CHECKPOINT_FAST only if requested by user (via passing
                               8968                 :                :              * fast = true).  Otherwise this can take awhile.
                               8969                 :                :              */
 5386 heikki.linnakangas@i     8970         [ +  + ]:            167 :             RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT |
                               8971                 :                :                               (fast ? CHECKPOINT_FAST : 0));
                               8972                 :                : 
                               8973                 :                :             /*
                               8974                 :                :              * Now we need to fetch the checkpoint record location, and also
                               8975                 :                :              * its REDO pointer.  The oldest point in WAL that would be needed
                               8976                 :                :              * to restore starting from the checkpoint is precisely the REDO
                               8977                 :                :              * pointer.
                               8978                 :                :              */
                               8979                 :            167 :             LWLockAcquire(ControlFileLock, LW_SHARED);
 1179 michael@paquier.xyz      8980                 :            167 :             state->checkpointloc = ControlFile->checkPoint;
                               8981                 :            167 :             state->startpoint = ControlFile->checkPointCopy.redo;
                               8982                 :            167 :             state->starttli = ControlFile->checkPointCopy.ThisTimeLineID;
 5076 simon@2ndQuadrant.co     8983                 :            167 :             checkpointfpw = ControlFile->checkPointCopy.fullPageWrites;
 5386 heikki.linnakangas@i     8984                 :            167 :             LWLockRelease(ControlFileLock);
                               8985                 :                : 
 5076 simon@2ndQuadrant.co     8986         [ +  + ]:            167 :             if (backup_started_in_recovery)
                               8987                 :                :             {
                               8988                 :                :                 XLogRecPtr  recptr;
                               8989                 :                : 
                               8990                 :                :                 /*
                               8991                 :                :                  * Check to see if all WAL replayed during online backup
                               8992                 :                :                  * (i.e., since last restartpoint used as backup starting
                               8993                 :                :                  * checkpoint) contain full-page writes.
                               8994                 :                :                  */
 4105 andres@anarazel.de       8995         [ -  + ]:              6 :                 SpinLockAcquire(&XLogCtl->info_lck);
                               8996                 :              6 :                 recptr = XLogCtl->lastFpwDisableRecPtr;
                               8997                 :              6 :                 SpinLockRelease(&XLogCtl->info_lck);
                               8998                 :                : 
 1179 michael@paquier.xyz      8999   [ +  -  -  + ]:              6 :                 if (!checkpointfpw || state->startpoint <= recptr)
 5076 simon@2ndQuadrant.co     9000         [ #  # ]:UBC           0 :                     ereport(ERROR,
                               9001                 :                :                             (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               9002                 :                :                              errmsg("WAL generated with \"full_page_writes=off\" was replayed "
                               9003                 :                :                                     "since last restartpoint"),
                               9004                 :                :                              errhint("This means that the backup being taken on the standby "
                               9005                 :                :                                      "is corrupt and should not be used. "
                               9006                 :                :                                      "Enable \"full_page_writes\" and run CHECKPOINT on the primary, "
                               9007                 :                :                                      "and then try an online backup again.")));
                               9008                 :                : 
                               9009                 :                :                 /*
                               9010                 :                :                  * During recovery, since we don't use the end-of-backup WAL
                               9011                 :                :                  * record and don't write the backup history file, the
                               9012                 :                :                  * starting WAL location doesn't need to be unique. This means
                               9013                 :                :                  * that two base backups started at the same time might use
                               9014                 :                :                  * the same checkpoint as starting locations.
                               9015                 :                :                  */
 5076 simon@2ndQuadrant.co     9016                 :CBC           6 :                 gotUniqueStartpoint = true;
                               9017                 :                :             }
                               9018                 :                : 
                               9019                 :                :             /*
                               9020                 :                :              * If two base backups are started at the same time (in WAL sender
                               9021                 :                :              * processes), we need to make sure that they use different
                               9022                 :                :              * checkpoints as starting locations, because we use the starting
                               9023                 :                :              * WAL location as a unique identifier for the base backup in the
                               9024                 :                :              * end-of-backup WAL record and when we write the backup history
                               9025                 :                :              * file. Perhaps it would be better generate a separate unique ID
                               9026                 :                :              * for each backup instead of forcing another checkpoint, but
                               9027                 :                :              * taking a checkpoint right after another is not that expensive
                               9028                 :                :              * either because only few buffers have been dirtied yet.
                               9029                 :                :              */
 4290 heikki.linnakangas@i     9030                 :            167 :             WALInsertLockAcquireExclusive();
 1179 michael@paquier.xyz      9031         [ +  - ]:            167 :             if (XLogCtl->Insert.lastBackupStart < state->startpoint)
                               9032                 :                :             {
                               9033                 :            167 :                 XLogCtl->Insert.lastBackupStart = state->startpoint;
 5386 heikki.linnakangas@i     9034                 :            167 :                 gotUniqueStartpoint = true;
                               9035                 :                :             }
 4290                          9036                 :            167 :             WALInsertLockRelease();
 5366 bruce@momjian.us         9037         [ -  + ]:            167 :         } while (!gotUniqueStartpoint);
                               9038                 :                : 
                               9039                 :                :         /*
                               9040                 :                :          * Construct tablespace_map file.
                               9041                 :                :          */
 3873 andrew@dunslane.net      9042                 :            167 :         datadirpathlen = strlen(DataDir);
                               9043                 :                : 
                               9044                 :                :         /* Collect information about all tablespaces */
  471 michael@paquier.xyz      9045                 :            167 :         tblspcdir = AllocateDir(PG_TBLSPC_DIR);
                               9046         [ +  + ]:            540 :         while ((de = ReadDir(tblspcdir, PG_TBLSPC_DIR)) != NULL)
                               9047                 :                :         {
                               9048                 :                :             char        fullpath[MAXPGPATH + sizeof(PG_TBLSPC_DIR)];
                               9049                 :                :             char        linkpath[MAXPGPATH];
 3873 andrew@dunslane.net      9050                 :            373 :             char       *relpath = NULL;
                               9051                 :                :             char       *s;
                               9052                 :                :             PGFileType  de_type;
                               9053                 :                :             char       *badp;
                               9054                 :                :             Oid         tsoid;
                               9055                 :                : 
                               9056                 :                :             /*
                               9057                 :                :              * Try to parse the directory name as an unsigned integer.
                               9058                 :                :              *
                               9059                 :                :              * Tablespace directories should be positive integers that can be
                               9060                 :                :              * represented in 32 bits, with no leading zeroes or trailing
                               9061                 :                :              * garbage. If we come across a name that doesn't meet those
                               9062                 :                :              * criteria, skip it.
                               9063                 :                :              */
  787 rhaas@postgresql.org     9064   [ +  +  -  + ]:            373 :             if (de->d_name[0] < '1' || de->d_name[1] > '9')
                               9065                 :            334 :                 continue;
                               9066                 :             39 :             errno = 0;
                               9067                 :             39 :             tsoid = strtoul(de->d_name, &badp, 10);
                               9068   [ +  -  +  -  :             39 :             if (*badp != '\0' || errno == EINVAL || errno == ERANGE)
                                              -  + ]
 3873 andrew@dunslane.net      9069                 :UBC           0 :                 continue;
                               9070                 :                : 
  471 michael@paquier.xyz      9071                 :CBC          39 :             snprintf(fullpath, sizeof(fullpath), "%s/%s", PG_TBLSPC_DIR, de->d_name);
                               9072                 :                : 
  975 rhaas@postgresql.org     9073                 :             39 :             de_type = get_dirent_type(fullpath, de, false, ERROR);
                               9074                 :                : 
                               9075         [ +  + ]:             39 :             if (de_type == PGFILETYPE_LNK)
                               9076                 :                :             {
                               9077                 :                :                 StringInfoData escapedpath;
                               9078                 :                :                 int         rllen;
                               9079                 :                : 
                               9080                 :             25 :                 rllen = readlink(fullpath, linkpath, sizeof(linkpath));
                               9081         [ -  + ]:             25 :                 if (rllen < 0)
                               9082                 :                :                 {
  975 rhaas@postgresql.org     9083         [ #  # ]:UBC           0 :                     ereport(WARNING,
                               9084                 :                :                             (errmsg("could not read symbolic link \"%s\": %m",
                               9085                 :                :                                     fullpath)));
                               9086                 :              0 :                     continue;
                               9087                 :                :                 }
  975 rhaas@postgresql.org     9088         [ -  + ]:CBC          25 :                 else if (rllen >= sizeof(linkpath))
                               9089                 :                :                 {
  975 rhaas@postgresql.org     9090         [ #  # ]:UBC           0 :                     ereport(WARNING,
                               9091                 :                :                             (errmsg("symbolic link \"%s\" target is too long",
                               9092                 :                :                                     fullpath)));
                               9093                 :              0 :                     continue;
                               9094                 :                :                 }
  975 rhaas@postgresql.org     9095                 :CBC          25 :                 linkpath[rllen] = '\0';
                               9096                 :                : 
                               9097                 :                :                 /*
                               9098                 :                :                  * Relpath holds the relative path of the tablespace directory
                               9099                 :                :                  * when it's located within PGDATA, or NULL if it's located
                               9100                 :                :                  * elsewhere.
                               9101                 :                :                  */
                               9102         [ -  + ]:             25 :                 if (rllen > datadirpathlen &&
  975 rhaas@postgresql.org     9103         [ #  # ]:UBC           0 :                     strncmp(linkpath, DataDir, datadirpathlen) == 0 &&
  944 tgl@sss.pgh.pa.us        9104         [ #  # ]:              0 :                     IS_DIR_SEP(linkpath[datadirpathlen]))
  975 rhaas@postgresql.org     9105                 :              0 :                     relpath = pstrdup(linkpath + datadirpathlen + 1);
                               9106                 :                : 
                               9107                 :                :                 /*
                               9108                 :                :                  * Add a backslash-escaped version of the link path to the
                               9109                 :                :                  * tablespace map file.
                               9110                 :                :                  */
  975 rhaas@postgresql.org     9111                 :CBC          25 :                 initStringInfo(&escapedpath);
                               9112         [ +  + ]:            594 :                 for (s = linkpath; *s; s++)
                               9113                 :                :                 {
                               9114   [ +  -  +  -  :            569 :                     if (*s == '\n' || *s == '\r' || *s == '\\')
                                              -  + ]
  975 rhaas@postgresql.org     9115                 :UBC           0 :                         appendStringInfoChar(&escapedpath, '\\');
  975 rhaas@postgresql.org     9116                 :CBC         569 :                     appendStringInfoChar(&escapedpath, *s);
                               9117                 :                :                 }
                               9118                 :             25 :                 appendStringInfo(tblspcmapfile, "%s %s\n",
                               9119                 :             25 :                                  de->d_name, escapedpath.data);
                               9120                 :             25 :                 pfree(escapedpath.data);
                               9121                 :                :             }
                               9122         [ +  - ]:             14 :             else if (de_type == PGFILETYPE_DIR)
                               9123                 :                :             {
                               9124                 :                :                 /*
                               9125                 :                :                  * It's possible to use allow_in_place_tablespaces to create
                               9126                 :                :                  * directories directly under pg_tblspc, for testing purposes
                               9127                 :                :                  * only.
                               9128                 :                :                  *
                               9129                 :                :                  * In this case, we store a relative path rather than an
                               9130                 :                :                  * absolute path into the tablespaceinfo.
                               9131                 :                :                  */
  471 michael@paquier.xyz      9132                 :             14 :                 snprintf(linkpath, sizeof(linkpath), "%s/%s",
                               9133                 :             14 :                          PG_TBLSPC_DIR, de->d_name);
  975 rhaas@postgresql.org     9134                 :             14 :                 relpath = pstrdup(linkpath);
                               9135                 :                :             }
                               9136                 :                :             else
                               9137                 :                :             {
                               9138                 :                :                 /* Skip any other file type that appears here. */
  975 rhaas@postgresql.org     9139                 :UBC           0 :                 continue;
                               9140                 :                :             }
                               9141                 :                : 
    8 michael@paquier.xyz      9142                 :GNC          39 :             ti = palloc_object(tablespaceinfo);
  787 rhaas@postgresql.org     9143                 :CBC          39 :             ti->oid = tsoid;
 1737 tgl@sss.pgh.pa.us        9144                 :             39 :             ti->path = pstrdup(linkpath);
  975 rhaas@postgresql.org     9145                 :             39 :             ti->rpath = relpath;
 2010                          9146                 :             39 :             ti->size = -1;
                               9147                 :                : 
 3862 bruce@momjian.us         9148         [ +  - ]:             39 :             if (tablespaces)
                               9149                 :             39 :                 *tablespaces = lappend(*tablespaces, ti);
                               9150                 :                :         }
 2936 tgl@sss.pgh.pa.us        9151                 :            167 :         FreeDir(tblspcdir);
                               9152                 :                : 
 1179 michael@paquier.xyz      9153                 :            167 :         state->starttime = (pg_time_t) time(NULL);
                               9154                 :                :     }
  135 peter@eisentraut.org     9155         [ -  + ]:GNC         167 :     PG_END_ENSURE_ERROR_CLEANUP(do_pg_abort_backup, BoolGetDatum(true));
                               9156                 :                : 
 1179 michael@paquier.xyz      9157                 :CBC         167 :     state->started_in_recovery = backup_started_in_recovery;
                               9158                 :                : 
                               9159                 :                :     /*
                               9160                 :                :      * Mark that the start phase has correctly finished for the backup.
                               9161                 :                :      */
 1352 sfrost@snowman.net       9162                 :            167 :     sessionBackupState = SESSION_BACKUP_RUNNING;
 7807 tgl@sss.pgh.pa.us        9163                 :            167 : }
                               9164                 :                : 
                               9165                 :                : /*
                               9166                 :                :  * Utility routine to fetch the session-level status of a backup running.
                               9167                 :                :  */
                               9168                 :                : SessionBackupState
 3191 teodor@sigaev.ru         9169                 :            188 : get_backup_status(void)
                               9170                 :                : {
                               9171                 :            188 :     return sessionBackupState;
                               9172                 :                : }
                               9173                 :                : 
                               9174                 :                : /*
                               9175                 :                :  * do_pg_backup_stop
                               9176                 :                :  *
                               9177                 :                :  * Utility function called at the end of an online backup.  It creates history
                               9178                 :                :  * file (if required), resets sessionBackupState and so on.  It can optionally
                               9179                 :                :  * wait for WAL segments to be archived.
                               9180                 :                :  *
                               9181                 :                :  * "state" is filled with the information necessary to restore from this
                               9182                 :                :  * backup with its stop LSN (stoppoint), its timeline ID (stoptli), etc.
                               9183                 :                :  *
                               9184                 :                :  * It is the responsibility of the caller of this function to verify the
                               9185                 :                :  * permissions of the calling user!
                               9186                 :                :  */
                               9187                 :                : void
 1179 michael@paquier.xyz      9188                 :            161 : do_pg_backup_stop(BackupState *state, bool waitforarchive)
                               9189                 :                : {
                               9190                 :            161 :     bool        backup_stopped_in_recovery = false;
                               9191                 :                :     char        histfilepath[MAXPGPATH];
                               9192                 :                :     char        lastxlogfilename[MAXFNAMELEN];
                               9193                 :                :     char        histfilename[MAXFNAMELEN];
                               9194                 :                :     XLogSegNo   _logSegNo;
                               9195                 :                :     FILE       *fp;
                               9196                 :                :     int         seconds_before_warning;
 6466 bruce@momjian.us         9197                 :            161 :     int         waits = 0;
 5723 simon@2ndQuadrant.co     9198                 :            161 :     bool        reported_waiting = false;
                               9199                 :                : 
 1179 michael@paquier.xyz      9200         [ -  + ]:            161 :     Assert(state != NULL);
                               9201                 :                : 
                               9202                 :            161 :     backup_stopped_in_recovery = RecoveryInProgress();
                               9203                 :                : 
                               9204                 :                :     /*
                               9205                 :                :      * During recovery, we don't need to check WAL level. Because, if WAL
                               9206                 :                :      * level is not sufficient, it's impossible to get here during recovery.
                               9207                 :                :      */
                               9208   [ +  +  -  + ]:            161 :     if (!backup_stopped_in_recovery && !XLogIsNeeded())
 6310 tgl@sss.pgh.pa.us        9209         [ #  # ]:UBC           0 :         ereport(ERROR,
                               9210                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               9211                 :                :                  errmsg("WAL level not sufficient for making an online backup"),
                               9212                 :                :                  errhint("\"wal_level\" must be set to \"replica\" or \"logical\" at server start.")));
                               9213                 :                : 
                               9214                 :                :     /*
                               9215                 :                :      * OK to update backup counter and session-level lock.
                               9216                 :                :      *
                               9217                 :                :      * Note that CHECK_FOR_INTERRUPTS() must not occur while updating them,
                               9218                 :                :      * otherwise they can be updated inconsistently, which might cause
                               9219                 :                :      * do_pg_abort_backup() to fail.
                               9220                 :                :      */
 3257 fujii@postgresql.org     9221                 :CBC         161 :     WALInsertLockAcquireExclusive();
                               9222                 :                : 
                               9223                 :                :     /*
                               9224                 :                :      * It is expected that each do_pg_backup_start() call is matched by
                               9225                 :                :      * exactly one do_pg_backup_stop() call.
                               9226                 :                :      */
 1352 sfrost@snowman.net       9227         [ -  + ]:            161 :     Assert(XLogCtl->Insert.runningBackups > 0);
                               9228                 :            161 :     XLogCtl->Insert.runningBackups--;
                               9229                 :                : 
                               9230                 :                :     /*
                               9231                 :                :      * Clean up session-level lock.
                               9232                 :                :      *
                               9233                 :                :      * You might think that WALInsertLockRelease() can be called before
                               9234                 :                :      * cleaning up session-level lock because session-level lock doesn't need
                               9235                 :                :      * to be protected with WAL insertion lock. But since
                               9236                 :                :      * CHECK_FOR_INTERRUPTS() can occur in it, session-level lock must be
                               9237                 :                :      * cleaned up before it.
                               9238                 :                :      */
 3191 teodor@sigaev.ru         9239                 :            161 :     sessionBackupState = SESSION_BACKUP_NONE;
                               9240                 :                : 
 2921 fujii@postgresql.org     9241                 :            161 :     WALInsertLockRelease();
                               9242                 :                : 
                               9243                 :                :     /*
                               9244                 :                :      * If we are taking an online backup from the standby, we confirm that the
                               9245                 :                :      * standby has not been promoted during the backup.
                               9246                 :                :      */
 1179 michael@paquier.xyz      9247   [ +  +  -  + ]:            161 :     if (state->started_in_recovery && !backup_stopped_in_recovery)
 5076 simon@2ndQuadrant.co     9248         [ #  # ]:UBC           0 :         ereport(ERROR,
                               9249                 :                :                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               9250                 :                :                  errmsg("the standby was promoted during online backup"),
                               9251                 :                :                  errhint("This means that the backup being taken is corrupt "
                               9252                 :                :                          "and should not be used. "
                               9253                 :                :                          "Try taking another online backup.")));
                               9254                 :                : 
                               9255                 :                :     /*
                               9256                 :                :      * During recovery, we don't write an end-of-backup record. We assume that
                               9257                 :                :      * pg_control was backed up last and its minimum recovery point can be
                               9258                 :                :      * available as the backup end location. Since we don't have an
                               9259                 :                :      * end-of-backup record, we use the pg_control value to check whether
                               9260                 :                :      * we've reached the end of backup when starting recovery from this
                               9261                 :                :      * backup. We have no way of checking if pg_control wasn't backed up last
                               9262                 :                :      * however.
                               9263                 :                :      *
                               9264                 :                :      * We don't force a switch to new WAL file but it is still possible to
                               9265                 :                :      * wait for all the required files to be archived if waitforarchive is
                               9266                 :                :      * true. This is okay if we use the backup to start a standby and fetch
                               9267                 :                :      * the missing WAL using streaming replication. But in the case of an
                               9268                 :                :      * archive recovery, a user should set waitforarchive to true and wait for
                               9269                 :                :      * them to be archived to ensure that all the required files are
                               9270                 :                :      * available.
                               9271                 :                :      *
                               9272                 :                :      * We return the current minimum recovery point as the backup end
                               9273                 :                :      * location. Note that it can be greater than the exact backup end
                               9274                 :                :      * location if the minimum recovery point is updated after the backup of
                               9275                 :                :      * pg_control. This is harmless for current uses.
                               9276                 :                :      *
                               9277                 :                :      * XXX currently a backup history file is for informational and debug
                               9278                 :                :      * purposes only. It's not essential for an online backup. Furthermore,
                               9279                 :                :      * even if it's created, it will not be archived during recovery because
                               9280                 :                :      * an archiver is not invoked. So it doesn't seem worthwhile to write a
                               9281                 :                :      * backup history file during recovery.
                               9282                 :                :      */
 1179 michael@paquier.xyz      9283         [ +  + ]:CBC         161 :     if (backup_stopped_in_recovery)
                               9284                 :                :     {
                               9285                 :                :         XLogRecPtr  recptr;
                               9286                 :                : 
                               9287                 :                :         /*
                               9288                 :                :          * Check to see if all WAL replayed during online backup contain
                               9289                 :                :          * full-page writes.
                               9290                 :                :          */
 4105 andres@anarazel.de       9291         [ -  + ]:              6 :         SpinLockAcquire(&XLogCtl->info_lck);
                               9292                 :              6 :         recptr = XLogCtl->lastFpwDisableRecPtr;
                               9293                 :              6 :         SpinLockRelease(&XLogCtl->info_lck);
                               9294                 :                : 
 1179 michael@paquier.xyz      9295         [ -  + ]:              6 :         if (state->startpoint <= recptr)
 5076 simon@2ndQuadrant.co     9296         [ #  # ]:UBC           0 :             ereport(ERROR,
                               9297                 :                :                     (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                               9298                 :                :                      errmsg("WAL generated with \"full_page_writes=off\" was replayed "
                               9299                 :                :                             "during online backup"),
                               9300                 :                :                      errhint("This means that the backup being taken on the standby "
                               9301                 :                :                              "is corrupt and should not be used. "
                               9302                 :                :                              "Enable \"full_page_writes\" and run CHECKPOINT on the primary, "
                               9303                 :                :                              "and then try an online backup again.")));
                               9304                 :                : 
                               9305                 :                : 
 5076 simon@2ndQuadrant.co     9306                 :CBC           6 :         LWLockAcquire(ControlFileLock, LW_SHARED);
 1179 michael@paquier.xyz      9307                 :              6 :         state->stoppoint = ControlFile->minRecoveryPoint;
                               9308                 :              6 :         state->stoptli = ControlFile->minRecoveryPointTLI;
 5076 simon@2ndQuadrant.co     9309                 :              6 :         LWLockRelease(ControlFileLock);
                               9310                 :                :     }
                               9311                 :                :     else
                               9312                 :                :     {
                               9313                 :                :         char       *history_file;
                               9314                 :                : 
                               9315                 :                :         /*
                               9316                 :                :          * Write the backup-end xlog record
                               9317                 :                :          */
 3057 rhaas@postgresql.org     9318                 :            155 :         XLogBeginInsert();
  310 peter@eisentraut.org     9319                 :            155 :         XLogRegisterData(&state->startpoint,
                               9320                 :                :                          sizeof(state->startpoint));
 1179 michael@paquier.xyz      9321                 :            155 :         state->stoppoint = XLogInsert(RM_XLOG_ID, XLOG_BACKUP_END);
                               9322                 :                : 
                               9323                 :                :         /*
                               9324                 :                :          * Given that we're not in recovery, InsertTimeLineID is set and can't
                               9325                 :                :          * change, so we can read it without a lock.
                               9326                 :                :          */
                               9327                 :            155 :         state->stoptli = XLogCtl->InsertTimeLineID;
                               9328                 :                : 
                               9329                 :                :         /*
                               9330                 :                :          * Force a switch to a new xlog segment file, so that the backup is
                               9331                 :                :          * valid as soon as archiver moves out the current segment file.
                               9332                 :                :          */
 3057 rhaas@postgresql.org     9333                 :            155 :         RequestXLogSwitch(false);
                               9334                 :                : 
 1179 michael@paquier.xyz      9335                 :            155 :         state->stoptime = (pg_time_t) time(NULL);
                               9336                 :                : 
                               9337                 :                :         /*
                               9338                 :                :          * Write the backup history file
                               9339                 :                :          */
                               9340                 :            155 :         XLByteToSeg(state->startpoint, _logSegNo, wal_segment_size);
                               9341                 :            155 :         BackupHistoryFilePath(histfilepath, state->stoptli, _logSegNo,
                               9342                 :                :                               state->startpoint, wal_segment_size);
 3057 rhaas@postgresql.org     9343                 :            155 :         fp = AllocateFile(histfilepath, "w");
                               9344         [ -  + ]:            155 :         if (!fp)
 3057 rhaas@postgresql.org     9345         [ #  # ]:UBC           0 :             ereport(ERROR,
                               9346                 :                :                     (errcode_for_file_access(),
                               9347                 :                :                      errmsg("could not create file \"%s\": %m",
                               9348                 :                :                             histfilepath)));
                               9349                 :                : 
                               9350                 :                :         /* Build and save the contents of the backup history file */
 1179 michael@paquier.xyz      9351                 :CBC         155 :         history_file = build_backup_content(state, true);
 1178                          9352                 :            155 :         fprintf(fp, "%s", history_file);
 1179                          9353                 :            155 :         pfree(history_file);
                               9354                 :                : 
 3057 rhaas@postgresql.org     9355   [ +  -  +  -  :            155 :         if (fflush(fp) || ferror(fp) || FreeFile(fp))
                                              -  + ]
 3057 rhaas@postgresql.org     9356         [ #  # ]:UBC           0 :             ereport(ERROR,
                               9357                 :                :                     (errcode_for_file_access(),
                               9358                 :                :                      errmsg("could not write file \"%s\": %m",
                               9359                 :                :                             histfilepath)));
                               9360                 :                : 
                               9361                 :                :         /*
                               9362                 :                :          * Clean out any no-longer-needed history files.  As a side effect,
                               9363                 :                :          * this will post a .ready file for the newly created history file,
                               9364                 :                :          * notifying the archiver that history file may be archived
                               9365                 :                :          * immediately.
                               9366                 :                :          */
 3057 rhaas@postgresql.org     9367                 :CBC         155 :         CleanupBackupHistory();
                               9368                 :                :     }
                               9369                 :                : 
                               9370                 :                :     /*
                               9371                 :                :      * If archiving is enabled, wait for all the required WAL files to be
                               9372                 :                :      * archived before returning. If archiving isn't enabled, the required WAL
                               9373                 :                :      * needs to be transported via streaming replication (hopefully with
                               9374                 :                :      * wal_keep_size set high enough), or some more exotic mechanism like
                               9375                 :                :      * polling and copying files from pg_wal with script. We have no knowledge
                               9376                 :                :      * of those mechanisms, so it's up to the user to ensure that he gets all
                               9377                 :                :      * the required WAL.
                               9378                 :                :      *
                               9379                 :                :      * We wait until both the last WAL file filled during backup and the
                               9380                 :                :      * history file have been archived, and assume that the alphabetic sorting
                               9381                 :                :      * property of the WAL files ensures any earlier WAL files are safely
                               9382                 :                :      * archived as well.
                               9383                 :                :      *
                               9384                 :                :      * We wait forever, since archive_command is supposed to work and we
                               9385                 :                :      * assume the admin wanted his backup to work completely. If you don't
                               9386                 :                :      * wish to wait, then either waitforarchive should be passed in as false,
                               9387                 :                :      * or you can set statement_timeout.  Also, some notices are issued to
                               9388                 :                :      * clue in anyone who might be doing this interactively.
                               9389                 :                :      */
                               9390                 :                : 
                               9391         [ +  + ]:            161 :     if (waitforarchive &&
 1179 michael@paquier.xyz      9392   [ +  +  +  +  :             10 :         ((!backup_stopped_in_recovery && XLogArchivingActive()) ||
                                     -  +  +  +  +  
                                                 + ]
                               9393   [ +  -  -  +  :              1 :          (backup_stopped_in_recovery && XLogArchivingAlways())))
                                              -  + ]
                               9394                 :                :     {
                               9395                 :              4 :         XLByteToPrevSeg(state->stoppoint, _logSegNo, wal_segment_size);
                               9396                 :              4 :         XLogFileName(lastxlogfilename, state->stoptli, _logSegNo,
                               9397                 :                :                      wal_segment_size);
                               9398                 :                : 
                               9399                 :              4 :         XLByteToSeg(state->startpoint, _logSegNo, wal_segment_size);
                               9400                 :              4 :         BackupHistoryFileName(histfilename, state->stoptli, _logSegNo,
                               9401                 :                :                               state->startpoint, wal_segment_size);
                               9402                 :                : 
 5644 bruce@momjian.us         9403                 :              4 :         seconds_before_warning = 60;
                               9404                 :              4 :         waits = 0;
                               9405                 :                : 
                               9406   [ +  +  -  + ]:             12 :         while (XLogArchiveIsBusy(lastxlogfilename) ||
                               9407                 :              4 :                XLogArchiveIsBusy(histfilename))
                               9408                 :                :         {
                               9409         [ -  + ]:              4 :             CHECK_FOR_INTERRUPTS();
                               9410                 :                : 
                               9411   [ +  -  -  + ]:              4 :             if (!reported_waiting && waits > 5)
                               9412                 :                :             {
 5644 bruce@momjian.us         9413         [ #  # ]:UBC           0 :                 ereport(NOTICE,
                               9414                 :                :                         (errmsg("base backup done, waiting for required WAL segments to be archived")));
                               9415                 :              0 :                 reported_waiting = true;
                               9416                 :                :             }
                               9417                 :                : 
 1626 michael@paquier.xyz      9418                 :CBC           4 :             (void) WaitLatch(MyLatch,
                               9419                 :                :                              WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
                               9420                 :                :                              1000L,
                               9421                 :                :                              WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE);
                               9422                 :              4 :             ResetLatch(MyLatch);
                               9423                 :                : 
 5644 bruce@momjian.us         9424         [ -  + ]:              4 :             if (++waits >= seconds_before_warning)
                               9425                 :                :             {
 5644 bruce@momjian.us         9426                 :UBC           0 :                 seconds_before_warning *= 2;    /* This wraps in >10 years... */
                               9427         [ #  # ]:              0 :                 ereport(WARNING,
                               9428                 :                :                         (errmsg("still waiting for all required WAL segments to be archived (%d seconds elapsed)",
                               9429                 :                :                                 waits),
                               9430                 :                :                          errhint("Check that your \"archive_command\" is executing properly.  "
                               9431                 :                :                                  "You can safely cancel this backup, "
                               9432                 :                :                                  "but the database backup will not be usable without all the WAL segments.")));
                               9433                 :                :             }
                               9434                 :                :         }
                               9435                 :                : 
 5644 bruce@momjian.us         9436         [ +  + ]:CBC           4 :         ereport(NOTICE,
                               9437                 :                :                 (errmsg("all required WAL segments have been archived")));
                               9438                 :                :     }
 5426 magnus@hagander.net      9439         [ +  + ]:            157 :     else if (waitforarchive)
 5712 tgl@sss.pgh.pa.us        9440         [ +  - ]:              6 :         ereport(NOTICE,
                               9441                 :                :                 (errmsg("WAL archiving is not enabled; you must ensure that all required WAL segments are copied through other means to complete the backup")));
 5457 magnus@hagander.net      9442                 :            161 : }
                               9443                 :                : 
                               9444                 :                : 
                               9445                 :                : /*
                               9446                 :                :  * do_pg_abort_backup: abort a running backup
                               9447                 :                :  *
                               9448                 :                :  * This does just the most basic steps of do_pg_backup_stop(), by taking the
                               9449                 :                :  * system out of backup mode, thus making it a lot more safe to call from
                               9450                 :                :  * an error handler.
                               9451                 :                :  *
                               9452                 :                :  * 'arg' indicates that it's being called during backup setup; so
                               9453                 :                :  * sessionBackupState has not been modified yet, but runningBackups has
                               9454                 :                :  * already been incremented.  When it's false, then it's invoked as a
                               9455                 :                :  * before_shmem_exit handler, and therefore we must not change state
                               9456                 :                :  * unless sessionBackupState indicates that a backup is actually running.
                               9457                 :                :  *
                               9458                 :                :  * NB: This gets used as a PG_ENSURE_ERROR_CLEANUP callback and
                               9459                 :                :  * before_shmem_exit handler, hence the odd-looking signature.
                               9460                 :                :  */
                               9461                 :                : void
 2191 rhaas@postgresql.org     9462                 :              8 : do_pg_abort_backup(int code, Datum arg)
                               9463                 :                : {
 1156 alvherre@alvh.no-ip.     9464                 :              8 :     bool        during_backup_start = DatumGetBool(arg);
                               9465                 :                : 
                               9466                 :                :     /* If called during backup start, there shouldn't be one already running */
 1151                          9467   [ -  +  -  - ]:              8 :     Assert(!during_backup_start || sessionBackupState == SESSION_BACKUP_NONE);
                               9468                 :                : 
 1156                          9469   [ +  -  +  + ]:              8 :     if (during_backup_start || sessionBackupState != SESSION_BACKUP_NONE)
                               9470                 :                :     {
                               9471                 :              6 :         WALInsertLockAcquireExclusive();
                               9472         [ -  + ]:              6 :         Assert(XLogCtl->Insert.runningBackups > 0);
                               9473                 :              6 :         XLogCtl->Insert.runningBackups--;
                               9474                 :                : 
                               9475                 :              6 :         sessionBackupState = SESSION_BACKUP_NONE;
                               9476                 :              6 :         WALInsertLockRelease();
                               9477                 :                : 
                               9478         [ +  - ]:              6 :         if (!during_backup_start)
                               9479         [ +  - ]:              6 :             ereport(WARNING,
                               9480                 :                :                     errmsg("aborting backup due to backend exiting before pg_backup_stop was called"));
                               9481                 :                :     }
 2191 rhaas@postgresql.org     9482                 :              8 : }
                               9483                 :                : 
                               9484                 :                : /*
                               9485                 :                :  * Register a handler that will warn about unterminated backups at end of
                               9486                 :                :  * session, unless this has already been done.
                               9487                 :                :  */
                               9488                 :                : void
                               9489                 :              4 : register_persistent_abort_backup_handler(void)
                               9490                 :                : {
                               9491                 :                :     static bool already_done = false;
                               9492                 :                : 
                               9493         [ +  + ]:              4 :     if (already_done)
                               9494                 :              1 :         return;
  135 peter@eisentraut.org     9495                 :GNC           3 :     before_shmem_exit(do_pg_abort_backup, BoolGetDatum(false));
 2191 rhaas@postgresql.org     9496                 :CBC           3 :     already_done = true;
                               9497                 :                : }
                               9498                 :                : 
                               9499                 :                : /*
                               9500                 :                :  * Get latest WAL insert pointer
                               9501                 :                :  */
                               9502                 :                : XLogRecPtr
 5090 heikki.linnakangas@i     9503                 :           1951 : GetXLogInsertRecPtr(void)
                               9504                 :                : {
 4105 andres@anarazel.de       9505                 :           1951 :     XLogCtlInsert *Insert = &XLogCtl->Insert;
                               9506                 :                :     uint64      current_bytepos;
                               9507                 :                : 
 4546 heikki.linnakangas@i     9508         [ -  + ]:           1951 :     SpinLockAcquire(&Insert->insertpos_lck);
                               9509                 :           1951 :     current_bytepos = Insert->CurrBytePos;
                               9510                 :           1951 :     SpinLockRelease(&Insert->insertpos_lck);
                               9511                 :                : 
                               9512                 :           1951 :     return XLogBytePosToRecPtr(current_bytepos);
                               9513                 :                : }
                               9514                 :                : 
                               9515                 :                : /*
                               9516                 :                :  * Get latest WAL write pointer
                               9517                 :                :  */
                               9518                 :                : XLogRecPtr
 1401                          9519                 :           1426 : GetXLogWriteRecPtr(void)
                               9520                 :                : {
  624 alvherre@alvh.no-ip.     9521                 :           1426 :     RefreshXLogWriteResult(LogwrtResult);
                               9522                 :                : 
 1401 heikki.linnakangas@i     9523                 :           1426 :     return LogwrtResult.Write;
                               9524                 :                : }
                               9525                 :                : 
                               9526                 :                : /*
                               9527                 :                :  * Returns the redo pointer of the last checkpoint or restartpoint. This is
                               9528                 :                :  * the oldest point in WAL that we still need, if we have to restart recovery.
                               9529                 :                :  */
                               9530                 :                : void
                               9531                 :            379 : GetOldestRestartPoint(XLogRecPtr *oldrecptr, TimeLineID *oldtli)
                               9532                 :                : {
                               9533                 :            379 :     LWLockAcquire(ControlFileLock, LW_SHARED);
                               9534                 :            379 :     *oldrecptr = ControlFile->checkPointCopy.redo;
                               9535                 :            379 :     *oldtli = ControlFile->checkPointCopy.ThisTimeLineID;
                               9536                 :            379 :     LWLockRelease(ControlFileLock);
 7209 tgl@sss.pgh.pa.us        9537                 :            379 : }
                               9538                 :                : 
                               9539                 :                : /* Thin wrapper around ShutdownWalRcv(). */
                               9540                 :                : void
 1634 noah@leadboat.com        9541                 :            923 : XLogShutdownWalRcv(void)
                               9542                 :                : {
   43 michael@paquier.xyz      9543   [ +  +  -  + ]:GNC         923 :     Assert(AmStartupProcess() || !IsUnderPostmaster);
                               9544                 :                : 
 1634 noah@leadboat.com        9545                 :CBC         923 :     ShutdownWalRcv();
   44 michael@paquier.xyz      9546                 :            923 :     ResetInstallXLogFileSegmentActive();
 1634 noah@leadboat.com        9547                 :            923 : }
                               9548                 :                : 
                               9549                 :                : /* Enable WAL file recycling and preallocation. */
                               9550                 :                : void
 1401 heikki.linnakangas@i     9551                 :           1076 : SetInstallXLogFileSegmentActive(void)
                               9552                 :                : {
                               9553                 :           1076 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               9554                 :           1076 :     XLogCtl->InstallXLogFileSegmentActive = true;
                               9555                 :           1076 :     LWLockRelease(ControlFileLock);
 3753 fujii@postgresql.org     9556                 :           1076 : }
                               9557                 :                : 
                               9558                 :                : /* Disable WAL file recycling and preallocation. */
                               9559                 :                : void
   44 michael@paquier.xyz      9560                 :           1027 : ResetInstallXLogFileSegmentActive(void)
                               9561                 :                : {
                               9562                 :           1027 :     LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
                               9563                 :           1027 :     XLogCtl->InstallXLogFileSegmentActive = false;
                               9564                 :           1027 :     LWLockRelease(ControlFileLock);
                               9565                 :           1027 : }
                               9566                 :                : 
                               9567                 :                : bool
 1401 heikki.linnakangas@i     9568                 :            357 : IsInstallXLogFileSegmentActive(void)
                               9569                 :                : {
                               9570                 :                :     bool        result;
                               9571                 :                : 
                               9572                 :            357 :     LWLockAcquire(ControlFileLock, LW_SHARED);
                               9573                 :            357 :     result = XLogCtl->InstallXLogFileSegmentActive;
                               9574                 :            357 :     LWLockRelease(ControlFileLock);
                               9575                 :                : 
                               9576                 :            357 :     return result;
                               9577                 :                : }
                               9578                 :                : 
                               9579                 :                : /*
                               9580                 :                :  * Update the WalWriterSleeping flag.
                               9581                 :                :  */
                               9582                 :                : void
 4972 tgl@sss.pgh.pa.us        9583                 :            482 : SetWalWriterSleeping(bool sleeping)
                               9584                 :                : {
 4105 andres@anarazel.de       9585         [ -  + ]:            482 :     SpinLockAcquire(&XLogCtl->info_lck);
                               9586                 :            482 :     XLogCtl->WalWriterSleeping = sleeping;
                               9587                 :            482 :     SpinLockRelease(&XLogCtl->info_lck);
 4972 tgl@sss.pgh.pa.us        9588                 :            482 : }

Generated by: LCOV version 2.4-beta