LCOV - differential code coverage report
Current view: top level - src/backend/storage/smgr - md.c (source / functions) Coverage Total Hit LBC UBC CBC
Current: c70b6db34ffeab48beef1fb4ce61bcad3772b8dd vs 06473f5a344df8c9594ead90a609b86f6724cff8 Lines: 71.8 % 529 380 149 380
Current Date: 2025-09-06 07:49:51 +0900 Functions: 92.3 % 39 36 3 36
Baseline: lcov-20250906-005545-baseline Branches: 48.2 % 392 189 1 202 189
Baseline Date: 2025-09-05 08:21:35 +0100 Line coverage date bins:
Legend: Lines:     hit not hit
Branches: + taken - not taken # not executed
(30,360] days: 65.1 % 83 54 29 54
(360..) days: 73.1 % 446 326 120 326
Function coverage date bins:
(30,360] days: 83.3 % 6 5 1 5
(360..) days: 93.9 % 33 31 2 31
Branch coverage date bins:
(30,360] days: 35.2 % 54 19 35 19
(360..) days: 50.3 % 338 170 1 167 170

 Age         Owner                    Branch data    TLA  Line data    Source code
                                  1                 :                : /*-------------------------------------------------------------------------
                                  2                 :                :  *
                                  3                 :                :  * md.c
                                  4                 :                :  *    This code manages relations that reside on magnetic disk.
                                  5                 :                :  *
                                  6                 :                :  * Or at least, that was what the Berkeley folk had in mind when they named
                                  7                 :                :  * this file.  In reality, what this code provides is an interface from
                                  8                 :                :  * the smgr API to Unix-like filesystem APIs, so it will work with any type
                                  9                 :                :  * of device for which the operating system provides filesystem support.
                                 10                 :                :  * It doesn't matter whether the bits are on spinning rust or some other
                                 11                 :                :  * storage technology.
                                 12                 :                :  *
                                 13                 :                :  * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
                                 14                 :                :  * Portions Copyright (c) 1994, Regents of the University of California
                                 15                 :                :  *
                                 16                 :                :  *
                                 17                 :                :  * IDENTIFICATION
                                 18                 :                :  *    src/backend/storage/smgr/md.c
                                 19                 :                :  *
                                 20                 :                :  *-------------------------------------------------------------------------
                                 21                 :                :  */
                                 22                 :                : #include "postgres.h"
                                 23                 :                : 
                                 24                 :                : #include <unistd.h>
                                 25                 :                : #include <fcntl.h>
                                 26                 :                : #include <sys/file.h>
                                 27                 :                : 
                                 28                 :                : #include "access/xlogutils.h"
                                 29                 :                : #include "commands/tablespace.h"
                                 30                 :                : #include "common/file_utils.h"
                                 31                 :                : #include "miscadmin.h"
                                 32                 :                : #include "pg_trace.h"
                                 33                 :                : #include "pgstat.h"
                                 34                 :                : #include "storage/aio.h"
                                 35                 :                : #include "storage/bufmgr.h"
                                 36                 :                : #include "storage/fd.h"
                                 37                 :                : #include "storage/md.h"
                                 38                 :                : #include "storage/relfilelocator.h"
                                 39                 :                : #include "storage/smgr.h"
                                 40                 :                : #include "storage/sync.h"
                                 41                 :                : #include "utils/memutils.h"
                                 42                 :                : 
                                 43                 :                : /*
                                 44                 :                :  * The magnetic disk storage manager keeps track of open file
                                 45                 :                :  * descriptors in its own descriptor pool.  This is done to make it
                                 46                 :                :  * easier to support relations that are larger than the operating
                                 47                 :                :  * system's file size limit (often 2GBytes).  In order to do that,
                                 48                 :                :  * we break relations up into "segment" files that are each shorter than
                                 49                 :                :  * the OS file size limit.  The segment size is set by the RELSEG_SIZE
                                 50                 :                :  * configuration constant in pg_config.h.
                                 51                 :                :  *
                                 52                 :                :  * On disk, a relation must consist of consecutively numbered segment
                                 53                 :                :  * files in the pattern
                                 54                 :                :  *  -- Zero or more full segments of exactly RELSEG_SIZE blocks each
                                 55                 :                :  *  -- Exactly one partial segment of size 0 <= size < RELSEG_SIZE blocks
                                 56                 :                :  *  -- Optionally, any number of inactive segments of size 0 blocks.
                                 57                 :                :  * The full and partial segments are collectively the "active" segments.
                                 58                 :                :  * Inactive segments are those that once contained data but are currently
                                 59                 :                :  * not needed because of an mdtruncate() operation.  The reason for leaving
                                 60                 :                :  * them present at size zero, rather than unlinking them, is that other
                                 61                 :                :  * backends and/or the checkpointer might be holding open file references to
                                 62                 :                :  * such segments.  If the relation expands again after mdtruncate(), such
                                 63                 :                :  * that a deactivated segment becomes active again, it is important that
                                 64                 :                :  * such file references still be valid --- else data might get written
                                 65                 :                :  * out to an unlinked old copy of a segment file that will eventually
                                 66                 :                :  * disappear.
                                 67                 :                :  *
                                 68                 :                :  * File descriptors are stored in the per-fork md_seg_fds arrays inside
                                 69                 :                :  * SMgrRelation. The length of these arrays is stored in md_num_open_segs.
                                 70                 :                :  * Note that a fork's md_num_open_segs having a specific value does not
                                 71                 :                :  * necessarily mean the relation doesn't have additional segments; we may
                                 72                 :                :  * just not have opened the next segment yet.  (We could not have "all
                                 73                 :                :  * segments are in the array" as an invariant anyway, since another backend
                                 74                 :                :  * could extend the relation while we aren't looking.)  We do not have
                                 75                 :                :  * entries for inactive segments, however; as soon as we find a partial
                                 76                 :                :  * segment, we assume that any subsequent segments are inactive.
                                 77                 :                :  *
                                 78                 :                :  * The entire MdfdVec array is palloc'd in the MdCxt memory context.
                                 79                 :                :  */
                                 80                 :                : 
                                 81                 :                : typedef struct _MdfdVec
                                 82                 :                : {
                                 83                 :                :     File        mdfd_vfd;       /* fd number in fd.c's pool */
                                 84                 :                :     BlockNumber mdfd_segno;     /* segment number, from 0 */
                                 85                 :                : } MdfdVec;
                                 86                 :                : 
                                 87                 :                : static MemoryContext MdCxt;     /* context for all MdfdVec objects */
                                 88                 :                : 
                                 89                 :                : 
                                 90                 :                : /* Populate a file tag describing an md.c segment file. */
                                 91                 :                : #define INIT_MD_FILETAG(a,xx_rlocator,xx_forknum,xx_segno) \
                                 92                 :                : ( \
                                 93                 :                :     memset(&(a), 0, sizeof(FileTag)), \
                                 94                 :                :     (a).handler = SYNC_HANDLER_MD, \
                                 95                 :                :     (a).rlocator = (xx_rlocator), \
                                 96                 :                :     (a).forknum = (xx_forknum), \
                                 97                 :                :     (a).segno = (xx_segno) \
                                 98                 :                : )
                                 99                 :                : 
                                100                 :                : 
                                101                 :                : /*** behavior for mdopen & _mdfd_getseg ***/
                                102                 :                : /* ereport if segment not present */
                                103                 :                : #define EXTENSION_FAIL              (1 << 0)
                                104                 :                : /* return NULL if segment not present */
                                105                 :                : #define EXTENSION_RETURN_NULL       (1 << 1)
                                106                 :                : /* create new segments as needed */
                                107                 :                : #define EXTENSION_CREATE            (1 << 2)
                                108                 :                : /* create new segments if needed during recovery */
                                109                 :                : #define EXTENSION_CREATE_RECOVERY   (1 << 3)
                                110                 :                : /* don't try to open a segment, if not already open */
                                111                 :                : #define EXTENSION_DONT_OPEN         (1 << 5)
                                112                 :                : 
                                113                 :                : 
                                114                 :                : /*
                                115                 :                :  * Fixed-length string to represent paths to files that need to be built by
                                116                 :                :  * md.c.
                                117                 :                :  *
                                118                 :                :  * The maximum number of segments is MaxBlockNumber / RELSEG_SIZE, where
                                119                 :                :  * RELSEG_SIZE can be set to 1 (for testing only).
                                120                 :                :  */
                                121                 :                : #define SEGMENT_CHARS   OIDCHARS
                                122                 :                : #define MD_PATH_STR_MAXLEN \
                                123                 :                :     (\
                                124                 :                :         REL_PATH_STR_MAXLEN \
                                125                 :                :         + sizeof((char)'.') \
                                126                 :                :         + SEGMENT_CHARS \
                                127                 :                :     )
                                128                 :                : typedef struct MdPathStr
                                129                 :                : {
                                130                 :                :     char        str[MD_PATH_STR_MAXLEN + 1];
                                131                 :                : } MdPathStr;
                                132                 :                : 
                                133                 :                : 
                                134                 :                : /* local routines */
                                135                 :                : static void mdunlinkfork(RelFileLocatorBackend rlocator, ForkNumber forknum,
                                136                 :                :                          bool isRedo);
                                137                 :                : static MdfdVec *mdopenfork(SMgrRelation reln, ForkNumber forknum, int behavior);
                                138                 :                : static void register_dirty_segment(SMgrRelation reln, ForkNumber forknum,
                                139                 :                :                                    MdfdVec *seg);
                                140                 :                : static void register_unlink_segment(RelFileLocatorBackend rlocator, ForkNumber forknum,
                                141                 :                :                                     BlockNumber segno);
                                142                 :                : static void register_forget_request(RelFileLocatorBackend rlocator, ForkNumber forknum,
                                143                 :                :                                     BlockNumber segno);
                                144                 :                : static void _fdvec_resize(SMgrRelation reln,
                                145                 :                :                           ForkNumber forknum,
                                146                 :                :                           int nseg);
                                147                 :                : static MdPathStr _mdfd_segpath(SMgrRelation reln, ForkNumber forknum,
                                148                 :                :                                BlockNumber segno);
                                149                 :                : static MdfdVec *_mdfd_openseg(SMgrRelation reln, ForkNumber forknum,
                                150                 :                :                               BlockNumber segno, int oflags);
                                151                 :                : static MdfdVec *_mdfd_getseg(SMgrRelation reln, ForkNumber forknum,
                                152                 :                :                              BlockNumber blkno, bool skipFsync, int behavior);
                                153                 :                : static BlockNumber _mdnblocks(SMgrRelation reln, ForkNumber forknum,
                                154                 :                :                               MdfdVec *seg);
                                155                 :                : 
                                156                 :                : static PgAioResult md_readv_complete(PgAioHandle *ioh, PgAioResult prior_result, uint8 cb_data);
                                157                 :                : static void md_readv_report(PgAioResult result, const PgAioTargetData *td, int elevel);
                                158                 :                : 
                                159                 :                : const PgAioHandleCallbacks aio_md_readv_cb = {
                                160                 :                :     .complete_shared = md_readv_complete,
                                161                 :                :     .report = md_readv_report,
                                162                 :                : };
                                163                 :                : 
                                164                 :                : 
                                165                 :                : static inline int
  882 tmunro@postgresql.or      166                 :CBC     1248815 : _mdfd_open_flags(void)
                                167                 :                : {
                                168                 :        1248815 :     int         flags = O_RDWR | PG_BINARY;
                                169                 :                : 
                                170         [ +  + ]:        1248815 :     if (io_direct_flags & IO_DIRECT_DATA)
                                171                 :            309 :         flags |= PG_O_DIRECT;
                                172                 :                : 
                                173                 :        1248815 :     return flags;
                                174                 :                : }
                                175                 :                : 
                                176                 :                : /*
                                177                 :                :  * mdinit() -- Initialize private state for magnetic disk storage manager.
                                178                 :                :  */
                                179                 :                : void
 8837 tgl@sss.pgh.pa.us         180                 :          18768 : mdinit(void)
                                181                 :                : {
 9201                           182                 :          18768 :     MdCxt = AllocSetContextCreate(TopMemoryContext,
                                183                 :                :                                   "MdSmgr",
                                184                 :                :                                   ALLOCSET_DEFAULT_SIZES);
 5917 heikki.linnakangas@i      185                 :          18768 : }
                                186                 :                : 
                                187                 :                : /*
                                188                 :                :  * mdexists() -- Does the physical file exist?
                                189                 :                :  *
                                190                 :                :  * Note: this will return true for lingering files, with pending deletions
                                191                 :                :  */
                                192                 :                : bool
 1083 pg@bowt.ie                193                 :         499257 : mdexists(SMgrRelation reln, ForkNumber forknum)
                                194                 :                : {
                                195                 :                :     /*
                                196                 :                :      * Close it first, to ensure that we notice if the fork has been unlinked
                                197                 :                :      * since we opened it.  As an optimization, we can skip that in recovery,
                                198                 :                :      * which already closes relations when dropping them.
                                199                 :                :      */
 1248 tmunro@postgresql.or      200         [ +  + ]:         499257 :     if (!InRecovery)
 1083 pg@bowt.ie                201                 :         479071 :         mdclose(reln, forknum);
                                202                 :                : 
                                203                 :         499257 :     return (mdopenfork(reln, forknum, EXTENSION_RETURN_NULL) != NULL);
                                204                 :                : }
                                205                 :                : 
                                206                 :                : /*
                                207                 :                :  * mdcreate() -- Create a new relation on magnetic disk.
                                208                 :                :  *
                                209                 :                :  * If isRedo is true, it's okay for the relation to exist already.
                                210                 :                :  */
                                211                 :                : void
                                212                 :        5641912 : mdcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo)
                                213                 :                : {
                                214                 :                :     MdfdVec    *mdfd;
                                215                 :                :     RelPathStr  path;
                                216                 :                :     File        fd;
                                217                 :                : 
                                218   [ +  +  +  + ]:        5641912 :     if (isRedo && reln->md_num_open_segs[forknum] > 0)
 6821 tgl@sss.pgh.pa.us         219                 :        5486047 :         return;                 /* created and opened already... */
                                220                 :                : 
 1083 pg@bowt.ie                221         [ -  + ]:         155865 :     Assert(reln->md_num_open_segs[forknum] == 0);
                                222                 :                : 
                                223                 :                :     /*
                                224                 :                :      * We may be using the target table space for the first time in this
                                225                 :                :      * database, so create a per-database subdirectory if needed.
                                226                 :                :      *
                                227                 :                :      * XXX this is a fairly ugly violation of module layering, but this seems
                                228                 :                :      * to be the best place to put the check.  Maybe TablespaceCreateDbspace
                                229                 :                :      * should be here and not in commands/tablespace.c?  But that would imply
                                230                 :                :      * importing a lot of stuff that smgr.c oughtn't know, either.
                                231                 :                :      */
 1158 rhaas@postgresql.org      232                 :         155865 :     TablespaceCreateDbspace(reln->smgr_rlocator.locator.spcOid,
                                233                 :                :                             reln->smgr_rlocator.locator.dbOid,
                                234                 :                :                             isRedo);
                                235                 :                : 
 1083 pg@bowt.ie                236                 :         155865 :     path = relpath(reln->smgr_rlocator, forknum);
                                237                 :                : 
  193 andres@anarazel.de        238                 :         155865 :     fd = PathNameOpenFile(path.str, _mdfd_open_flags() | O_CREAT | O_EXCL);
                                239                 :                : 
10226 bruce@momjian.us          240         [ +  + ]:         155865 :     if (fd < 0)
                                241                 :                :     {
 8934                           242                 :           3983 :         int         save_errno = errno;
                                243                 :                : 
 2413 akapila@postgresql.o      244         [ +  - ]:           3983 :         if (isRedo)
  193 andres@anarazel.de        245                 :           3983 :             fd = PathNameOpenFile(path.str, _mdfd_open_flags());
10226 bruce@momjian.us          246         [ -  + ]:           3983 :         if (fd < 0)
                                247                 :                :         {
                                248                 :                :             /* be sure to report the error reported by create, not open */
 9210 tgl@sss.pgh.pa.us         249                 :UBC           0 :             errno = save_errno;
 6821                           250         [ #  # ]:              0 :             ereport(ERROR,
                                251                 :                :                     (errcode_for_file_access(),
                                252                 :                :                      errmsg("could not create file \"%s\": %m", path.str)));
                                253                 :                :         }
                                254                 :                :     }
                                255                 :                : 
 1083 pg@bowt.ie                256                 :CBC      155865 :     _fdvec_resize(reln, forknum, 1);
                                257                 :         155865 :     mdfd = &reln->md_seg_fds[forknum][0];
 3285 andres@anarazel.de        258                 :         155865 :     mdfd->mdfd_vfd = fd;
                                259                 :         155865 :     mdfd->mdfd_segno = 0;
                                260                 :                : 
  795 heikki.linnakangas@i      261         [ +  + ]:         155865 :     if (!SmgrIsTemp(reln))
                                262                 :         152527 :         register_dirty_segment(reln, forknum, mdfd);
                                263                 :                : }
                                264                 :                : 
                                265                 :                : /*
                                266                 :                :  * mdunlink() -- Unlink a relation.
                                267                 :                :  *
                                268                 :                :  * Note that we're passed a RelFileLocatorBackend --- by the time this is called,
                                269                 :                :  * there won't be an SMgrRelation hashtable entry anymore.
                                270                 :                :  *
                                271                 :                :  * forknum can be a fork number to delete a specific fork, or InvalidForkNumber
                                272                 :                :  * to delete all forks.
                                273                 :                :  *
                                274                 :                :  * For regular relations, we don't unlink the first segment file of the rel,
                                275                 :                :  * but just truncate it to zero length, and record a request to unlink it after
                                276                 :                :  * the next checkpoint.  Additional segments can be unlinked immediately,
                                277                 :                :  * however.  Leaving the empty file in place prevents that relfilenumber
                                278                 :                :  * from being reused.  The scenario this protects us from is:
                                279                 :                :  * 1. We delete a relation (and commit, and actually remove its file).
                                280                 :                :  * 2. We create a new relation, which by chance gets the same relfilenumber as
                                281                 :                :  *    the just-deleted one (OIDs must've wrapped around for that to happen).
                                282                 :                :  * 3. We crash before another checkpoint occurs.
                                283                 :                :  * During replay, we would delete the file and then recreate it, which is fine
                                284                 :                :  * if the contents of the file were repopulated by subsequent WAL entries.
                                285                 :                :  * But if we didn't WAL-log insertions, but instead relied on fsyncing the
                                286                 :                :  * file after populating it (as we do at wal_level=minimal), the contents of
                                287                 :                :  * the file would be lost forever.  By leaving the empty file until after the
                                288                 :                :  * next checkpoint, we prevent reassignment of the relfilenumber until it's
                                289                 :                :  * safe, because relfilenumber assignment skips over any existing file.
                                290                 :                :  *
                                291                 :                :  * Additional segments, if any, are truncated and then unlinked.  The reason
                                292                 :                :  * for truncating is that other backends may still hold open FDs for these at
                                293                 :                :  * the smgr level, so that the kernel can't remove the file yet.  We want to
                                294                 :                :  * reclaim the disk space right away despite that.
                                295                 :                :  *
                                296                 :                :  * We do not need to go through this dance for temp relations, though, because
                                297                 :                :  * we never make WAL entries for temp rels, and so a temp rel poses no threat
                                298                 :                :  * to the health of a regular rel that has taken over its relfilenumber.
                                299                 :                :  * The fact that temp rels and regular rels have different file naming
                                300                 :                :  * patterns provides additional safety.  Other backends shouldn't have open
                                301                 :                :  * FDs for them, either.
                                302                 :                :  *
                                303                 :                :  * We also don't do it while performing a binary upgrade.  There is no reuse
                                304                 :                :  * hazard in that case, since after a crash or even a simple ERROR, the
                                305                 :                :  * upgrade fails and the whole cluster must be recreated from scratch.
                                306                 :                :  * Furthermore, it is important to remove the files from disk immediately,
                                307                 :                :  * because we may be about to reuse the same relfilenumber.
                                308                 :                :  *
                                309                 :                :  * All the above applies only to the relation's main fork; other forks can
                                310                 :                :  * just be removed immediately, since they are not needed to prevent the
                                311                 :                :  * relfilenumber from being recycled.  Also, we do not carefully
                                312                 :                :  * track whether other forks have been created or not, but just attempt to
                                313                 :                :  * unlink them unconditionally; so we should never complain about ENOENT.
                                314                 :                :  *
                                315                 :                :  * If isRedo is true, it's unsurprising for the relation to be already gone.
                                316                 :                :  * Also, we should remove the file immediately instead of queuing a request
                                317                 :                :  * for later, since during redo there's no possibility of creating a
                                318                 :                :  * conflicting relation.
                                319                 :                :  *
                                320                 :                :  * Note: we currently just never warn about ENOENT at all.  We could warn in
                                321                 :                :  * the main-fork, non-isRedo case, but it doesn't seem worth the trouble.
                                322                 :                :  *
                                323                 :                :  * Note: any failure should be reported as WARNING not ERROR, because
                                324                 :                :  * we are usually not in a transaction anymore when this is called.
                                325                 :                :  */
                                326                 :                : void
 1083 pg@bowt.ie                327                 :         180784 : mdunlink(RelFileLocatorBackend rlocator, ForkNumber forknum, bool isRedo)
                                328                 :                : {
                                329                 :                :     /* Now do the per-fork work */
                                330         [ -  + ]:         180784 :     if (forknum == InvalidForkNumber)
                                331                 :                :     {
 1083 pg@bowt.ie                332         [ #  # ]:UBC           0 :         for (forknum = 0; forknum <= MAX_FORKNUM; forknum++)
                                333                 :              0 :             mdunlinkfork(rlocator, forknum, isRedo);
                                334                 :                :     }
                                335                 :                :     else
 1083 pg@bowt.ie                336                 :CBC      180784 :         mdunlinkfork(rlocator, forknum, isRedo);
 4797 tgl@sss.pgh.pa.us         337                 :         180784 : }
                                338                 :                : 
                                339                 :                : /*
                                340                 :                :  * Truncate a file to release disk space.
                                341                 :                :  */
                                342                 :                : static int
 1740 tmunro@postgresql.or      343                 :         212028 : do_truncate(const char *path)
                                344                 :                : {
                                345                 :                :     int         save_errno;
                                346                 :                :     int         ret;
                                347                 :                : 
                                348                 :         212028 :     ret = pg_truncate(path, 0);
                                349                 :                : 
                                350                 :                :     /* Log a warning here to avoid repetition in callers. */
                                351   [ +  +  -  + ]:         212028 :     if (ret < 0 && errno != ENOENT)
                                352                 :                :     {
 1740 tmunro@postgresql.or      353                 :UBC           0 :         save_errno = errno;
                                354         [ #  # ]:              0 :         ereport(WARNING,
                                355                 :                :                 (errcode_for_file_access(),
                                356                 :                :                  errmsg("could not truncate file \"%s\": %m", path)));
                                357                 :              0 :         errno = save_errno;
                                358                 :                :     }
                                359                 :                : 
 1740 tmunro@postgresql.or      360                 :CBC      212028 :     return ret;
                                361                 :                : }
                                362                 :                : 
                                363                 :                : static void
 1083 pg@bowt.ie                364                 :         180784 : mdunlinkfork(RelFileLocatorBackend rlocator, ForkNumber forknum, bool isRedo)
                                365                 :                : {
                                366                 :                :     RelPathStr  path;
                                367                 :                :     int         ret;
                                368                 :                :     int         save_errno;
                                369                 :                : 
                                370                 :         180784 :     path = relpath(rlocator, forknum);
                                371                 :                : 
                                372                 :                :     /*
                                373                 :                :      * Truncate and then unlink the first segment, or just register a request
                                374                 :                :      * to unlink it later, as described in the comments for mdunlink().
                                375                 :                :      */
 1032 tgl@sss.pgh.pa.us         376   [ +  +  +  +  :         180784 :     if (isRedo || IsBinaryUpgrade || forknum != MAIN_FORKNUM ||
                                              +  + ]
                                377         [ +  + ]:          37625 :         RelFileLocatorBackendIsTemp(rlocator))
                                378                 :                :     {
 1158 rhaas@postgresql.org      379         [ +  + ]:         146321 :         if (!RelFileLocatorBackendIsTemp(rlocator))
                                380                 :                :         {
                                381                 :                :             /* Prevent other backends' fds from holding on to the disk space */
  193 andres@anarazel.de        382                 :         133673 :             ret = do_truncate(path.str);
                                383                 :                : 
                                384                 :                :             /* Forget any pending sync requests for the first segment */
 1034 tgl@sss.pgh.pa.us         385                 :         133673 :             save_errno = errno;
 1083 pg@bowt.ie                386                 :         133673 :             register_forget_request(rlocator, forknum, 0 /* first seg */ );
 1034 tgl@sss.pgh.pa.us         387                 :         133673 :             errno = save_errno;
                                388                 :                :         }
                                389                 :                :         else
 1740 tmunro@postgresql.or      390                 :          12648 :             ret = 0;
                                391                 :                : 
                                392                 :                :         /* Next unlink the file, unless it was already found to be missing */
 1032 tgl@sss.pgh.pa.us         393   [ +  +  -  + ]:         146321 :         if (ret >= 0 || errno != ENOENT)
                                394                 :                :         {
  193 andres@anarazel.de        395                 :          22081 :             ret = unlink(path.str);
 1740 tmunro@postgresql.or      396   [ +  +  -  + ]:          22081 :             if (ret < 0 && errno != ENOENT)
                                397                 :                :             {
 1032 tgl@sss.pgh.pa.us         398                 :UBC           0 :                 save_errno = errno;
 1740 tmunro@postgresql.or      399         [ #  # ]:              0 :                 ereport(WARNING,
                                400                 :                :                         (errcode_for_file_access(),
                                401                 :                :                          errmsg("could not remove file \"%s\": %m", path.str)));
 1032 tgl@sss.pgh.pa.us         402                 :              0 :                 errno = save_errno;
                                403                 :                :             }
                                404                 :                :         }
                                405                 :                :     }
                                406                 :                :     else
                                407                 :                :     {
                                408                 :                :         /* Prevent other backends' fds from holding on to the disk space */
  193 andres@anarazel.de        409                 :CBC       34463 :         ret = do_truncate(path.str);
                                410                 :                : 
                                411                 :                :         /* Register request to unlink first segment later */
 1032 tgl@sss.pgh.pa.us         412                 :          34463 :         save_errno = errno;
                                413                 :          34463 :         register_unlink_segment(rlocator, forknum, 0 /* first seg */ );
                                414                 :          34463 :         errno = save_errno;
                                415                 :                :     }
                                416                 :                : 
                                417                 :                :     /*
                                418                 :                :      * Delete any additional segments.
                                419                 :                :      *
                                420                 :                :      * Note that because we loop until getting ENOENT, we will correctly
                                421                 :                :      * remove all inactive segments as well as active ones.  Ideally we'd
                                422                 :                :      * continue the loop until getting exactly that errno, but that risks an
                                423                 :                :      * infinite loop if the problem is directory-wide (for instance, if we
                                424                 :                :      * suddenly can't read the data directory itself).  We compromise by
                                425                 :                :      * continuing after a non-ENOENT truncate error, but stopping after any
                                426                 :                :      * unlink error.  If there is indeed a directory-wide problem, additional
                                427                 :                :      * unlink attempts wouldn't work anyway.
                                428                 :                :      */
                                429   [ +  +  -  + ]:         180784 :     if (ret >= 0 || errno != ENOENT)
                                430                 :                :     {
                                431                 :                :         MdPathStr   segpath;
                                432                 :                :         BlockNumber segno;
                                433                 :                : 
                                434                 :          47230 :         for (segno = 1;; segno++)
                                435                 :                :         {
  193 andres@anarazel.de        436                 :          47230 :             sprintf(segpath.str, "%s.%u", path.str, segno);
                                437                 :                : 
 1158 rhaas@postgresql.org      438         [ +  + ]:          47230 :             if (!RelFileLocatorBackendIsTemp(rlocator))
                                439                 :                :             {
                                440                 :                :                 /*
                                441                 :                :                  * Prevent other backends' fds from holding on to the disk
                                442                 :                :                  * space.  We're done if we see ENOENT, though.
                                443                 :                :                  */
  193 andres@anarazel.de        444   [ +  -  +  - ]:          43892 :                 if (do_truncate(segpath.str) < 0 && errno == ENOENT)
 1740 tmunro@postgresql.or      445                 :          43892 :                     break;
                                446                 :                : 
                                447                 :                :                 /*
                                448                 :                :                  * Forget any pending sync requests for this segment before we
                                449                 :                :                  * try to unlink.
                                450                 :                :                  */
 1083 pg@bowt.ie                451                 :UBC           0 :                 register_forget_request(rlocator, forknum, segno);
                                452                 :                :             }
                                453                 :                : 
  193 andres@anarazel.de        454         [ +  - ]:CBC        3338 :             if (unlink(segpath.str) < 0)
                                455                 :                :             {
                                456                 :                :                 /* ENOENT is expected after the last segment... */
 9068 tgl@sss.pgh.pa.us         457         [ -  + ]:           3338 :                 if (errno != ENOENT)
 6821 tgl@sss.pgh.pa.us         458         [ #  # ]:UBC           0 :                     ereport(WARNING,
                                459                 :                :                             (errcode_for_file_access(),
                                460                 :                :                              errmsg("could not remove file \"%s\": %m", segpath.str)));
 9068 tgl@sss.pgh.pa.us         461                 :CBC        3338 :                 break;
                                462                 :                :             }
                                463                 :                :         }
                                464                 :                :     }
10651 scrappy@hub.org           465                 :         180784 : }
                                466                 :                : 
                                467                 :                : /*
                                468                 :                :  * mdextend() -- Add a block to the specified relation.
                                469                 :                :  *
                                470                 :                :  * The semantics are nearly the same as mdwrite(): write at the
                                471                 :                :  * specified position.  However, this is to be used for the case of
                                472                 :                :  * extending a relation (i.e., blocknum is at or beyond the current
                                473                 :                :  * EOF).  Note that we assume writing a block beyond current EOF
                                474                 :                :  * causes intervening file space to become filled with zeroes.
                                475                 :                :  */
                                476                 :                : void
 6235 heikki.linnakangas@i      477                 :         117525 : mdextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
                                478                 :                :          const void *buffer, bool skipFsync)
                                479                 :                : {
                                480                 :                :     off_t       seekpos;
                                481                 :                :     int         nbytes;
                                482                 :                :     MdfdVec    *v;
                                483                 :                : 
                                484                 :                :     /* If this build supports direct I/O, the buffer must be I/O aligned. */
                                485                 :                :     if (PG_O_DIRECT != 0 && PG_IO_ALIGN_SIZE <= BLCKSZ)
  882 tmunro@postgresql.or      486         [ -  + ]:         117525 :         Assert((uintptr_t) buffer == TYPEALIGN(PG_IO_ALIGN_SIZE, buffer));
                                487                 :                : 
                                488                 :                :     /* This assert is too expensive to have on normally ... */
                                489                 :                : #ifdef CHECK_WRITE_VS_EXTEND
                                490                 :                :     Assert(blocknum >= mdnblocks(reln, forknum));
                                491                 :                : #endif
                                492                 :                : 
                                493                 :                :     /*
                                494                 :                :      * If a relation manages to grow to 2^32-1 blocks, refuse to extend it any
                                495                 :                :      * more --- we mustn't create a block whose number actually is
                                496                 :                :      * InvalidBlockNumber.  (Note that this failure should be unreachable
                                497                 :                :      * because of upstream checks in bufmgr.c.)
                                498                 :                :      */
 6821 tgl@sss.pgh.pa.us         499         [ -  + ]:         117525 :     if (blocknum == InvalidBlockNumber)
 6821 tgl@sss.pgh.pa.us         500         [ #  # ]:UBC           0 :         ereport(ERROR,
                                501                 :                :                 (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
                                502                 :                :                  errmsg("cannot extend file \"%s\" beyond %u blocks",
                                503                 :                :                         relpath(reln->smgr_rlocator, forknum).str,
                                504                 :                :                         InvalidBlockNumber)));
                                505                 :                : 
 5503 rhaas@postgresql.org      506                 :CBC      117525 :     v = _mdfd_getseg(reln, forknum, blocknum, skipFsync, EXTENSION_CREATE);
                                507                 :                : 
 2999 tgl@sss.pgh.pa.us         508                 :         117525 :     seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE));
                                509                 :                : 
 6389                           510         [ -  + ]:         117525 :     Assert(seekpos < (off_t) BLCKSZ * RELSEG_SIZE);
                                511                 :                : 
 2495 tmunro@postgresql.or      512         [ -  + ]:         117525 :     if ((nbytes = FileWrite(v->mdfd_vfd, buffer, BLCKSZ, seekpos, WAIT_EVENT_DATA_FILE_EXTEND)) != BLCKSZ)
                                513                 :                :     {
 6821 tgl@sss.pgh.pa.us         514         [ #  # ]:UBC           0 :         if (nbytes < 0)
                                515         [ #  # ]:              0 :             ereport(ERROR,
                                516                 :                :                     (errcode_for_file_access(),
                                517                 :                :                      errmsg("could not extend file \"%s\": %m",
                                518                 :                :                             FilePathName(v->mdfd_vfd)),
                                519                 :                :                      errhint("Check free disk space.")));
                                520                 :                :         /* short write: complain appropriately */
                                521         [ #  # ]:              0 :         ereport(ERROR,
                                522                 :                :                 (errcode(ERRCODE_DISK_FULL),
                                523                 :                :                  errmsg("could not extend file \"%s\": wrote only %d of %d bytes at block %u",
                                524                 :                :                         FilePathName(v->mdfd_vfd),
                                525                 :                :                         nbytes, BLCKSZ, blocknum),
                                526                 :                :                  errhint("Check free disk space.")));
                                527                 :                :     }
                                528                 :                : 
 5503 rhaas@postgresql.org      529   [ +  +  +  - ]:CBC      117525 :     if (!skipFsync && !SmgrIsTemp(reln))
 6235 heikki.linnakangas@i      530                 :             30 :         register_dirty_segment(reln, forknum, v);
                                531                 :                : 
                                532         [ -  + ]:         117525 :     Assert(_mdnblocks(reln, forknum, v) <= ((BlockNumber) RELSEG_SIZE));
10651 scrappy@hub.org           533                 :         117525 : }
                                534                 :                : 
                                535                 :                : /*
                                536                 :                :  * mdzeroextend() -- Add new zeroed out blocks to the specified relation.
                                537                 :                :  *
                                538                 :                :  * Similar to mdextend(), except the relation can be extended by multiple
                                539                 :                :  * blocks at once and the added blocks will be filled with zeroes.
                                540                 :                :  */
                                541                 :                : void
  885 andres@anarazel.de        542                 :         207816 : mdzeroextend(SMgrRelation reln, ForkNumber forknum,
                                543                 :                :              BlockNumber blocknum, int nblocks, bool skipFsync)
                                544                 :                : {
                                545                 :                :     MdfdVec    *v;
                                546                 :         207816 :     BlockNumber curblocknum = blocknum;
                                547                 :         207816 :     int         remblocks = nblocks;
                                548                 :                : 
                                549         [ -  + ]:         207816 :     Assert(nblocks > 0);
                                550                 :                : 
                                551                 :                :     /* This assert is too expensive to have on normally ... */
                                552                 :                : #ifdef CHECK_WRITE_VS_EXTEND
                                553                 :                :     Assert(blocknum >= mdnblocks(reln, forknum));
                                554                 :                : #endif
                                555                 :                : 
                                556                 :                :     /*
                                557                 :                :      * If a relation manages to grow to 2^32-1 blocks, refuse to extend it any
                                558                 :                :      * more --- we mustn't create a block whose number actually is
                                559                 :                :      * InvalidBlockNumber or larger.
                                560                 :                :      */
                                561         [ -  + ]:         207816 :     if ((uint64) blocknum + nblocks >= (uint64) InvalidBlockNumber)
  885 andres@anarazel.de        562         [ #  # ]:UBC           0 :         ereport(ERROR,
                                563                 :                :                 (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
                                564                 :                :                  errmsg("cannot extend file \"%s\" beyond %u blocks",
                                565                 :                :                         relpath(reln->smgr_rlocator, forknum).str,
                                566                 :                :                         InvalidBlockNumber)));
                                567                 :                : 
  885 andres@anarazel.de        568         [ +  + ]:CBC      415632 :     while (remblocks > 0)
                                569                 :                :     {
  841 tgl@sss.pgh.pa.us         570                 :         207816 :         BlockNumber segstartblock = curblocknum % ((BlockNumber) RELSEG_SIZE);
  885 andres@anarazel.de        571                 :         207816 :         off_t       seekpos = (off_t) BLCKSZ * segstartblock;
                                572                 :                :         int         numblocks;
                                573                 :                : 
                                574         [ -  + ]:         207816 :         if (segstartblock + remblocks > RELSEG_SIZE)
  885 andres@anarazel.de        575                 :UBC           0 :             numblocks = RELSEG_SIZE - segstartblock;
                                576                 :                :         else
  885 andres@anarazel.de        577                 :CBC      207816 :             numblocks = remblocks;
                                578                 :                : 
                                579                 :         207816 :         v = _mdfd_getseg(reln, forknum, curblocknum, skipFsync, EXTENSION_CREATE);
                                580                 :                : 
                                581         [ -  + ]:         207816 :         Assert(segstartblock < RELSEG_SIZE);
                                582         [ -  + ]:         207816 :         Assert(segstartblock + numblocks <= RELSEG_SIZE);
                                583                 :                : 
                                584                 :                :         /*
                                585                 :                :          * If available and useful, use posix_fallocate() (via
                                586                 :                :          * FileFallocate()) to extend the relation. That's often more
                                587                 :                :          * efficient than using write(), as it commonly won't cause the kernel
                                588                 :                :          * to allocate page cache space for the extended pages.
                                589                 :                :          *
                                590                 :                :          * However, we don't use FileFallocate() for small extensions, as it
                                591                 :                :          * defeats delayed allocation on some filesystems. Not clear where
                                592                 :                :          * that decision should be made though? For now just use a cutoff of
                                593                 :                :          * 8, anything between 4 and 8 worked OK in some local testing.
                                594                 :                :          */
                                595         [ +  + ]:         207816 :         if (numblocks > 8)
                                596                 :                :         {
                                597                 :                :             int         ret;
                                598                 :                : 
                                599                 :            526 :             ret = FileFallocate(v->mdfd_vfd,
                                600                 :                :                                 seekpos, (off_t) BLCKSZ * numblocks,
                                601                 :                :                                 WAIT_EVENT_DATA_FILE_EXTEND);
                                602         [ -  + ]:            526 :             if (ret != 0)
                                603                 :                :             {
  885 andres@anarazel.de        604         [ #  # ]:UBC           0 :                 ereport(ERROR,
                                605                 :                :                         errcode_for_file_access(),
                                606                 :                :                         errmsg("could not extend file \"%s\" with FileFallocate(): %m",
                                607                 :                :                                FilePathName(v->mdfd_vfd)),
                                608                 :                :                         errhint("Check free disk space."));
                                609                 :                :             }
                                610                 :                :         }
                                611                 :                :         else
                                612                 :                :         {
                                613                 :                :             int         ret;
                                614                 :                : 
                                615                 :                :             /*
                                616                 :                :              * Even if we don't want to use fallocate, we can still extend a
                                617                 :                :              * bit more efficiently than writing each 8kB block individually.
                                618                 :                :              * pg_pwrite_zeros() (via FileZero()) uses pg_pwritev_with_retry()
                                619                 :                :              * to avoid multiple writes or needing a zeroed buffer for the
                                620                 :                :              * whole length of the extension.
                                621                 :                :              */
  885 andres@anarazel.de        622                 :CBC      207290 :             ret = FileZero(v->mdfd_vfd,
                                623                 :                :                            seekpos, (off_t) BLCKSZ * numblocks,
                                624                 :                :                            WAIT_EVENT_DATA_FILE_EXTEND);
                                625         [ -  + ]:         207290 :             if (ret < 0)
  885 andres@anarazel.de        626         [ #  # ]:UBC           0 :                 ereport(ERROR,
                                627                 :                :                         errcode_for_file_access(),
                                628                 :                :                         errmsg("could not extend file \"%s\": %m",
                                629                 :                :                                FilePathName(v->mdfd_vfd)),
                                630                 :                :                         errhint("Check free disk space."));
                                631                 :                :         }
                                632                 :                : 
  885 andres@anarazel.de        633   [ +  -  +  + ]:CBC      207816 :         if (!skipFsync && !SmgrIsTemp(reln))
                                634                 :         195368 :             register_dirty_segment(reln, forknum, v);
                                635                 :                : 
                                636         [ -  + ]:         207816 :         Assert(_mdnblocks(reln, forknum, v) <= ((BlockNumber) RELSEG_SIZE));
                                637                 :                : 
                                638                 :         207816 :         remblocks -= numblocks;
                                639                 :         207816 :         curblocknum += numblocks;
                                640                 :                :     }
                                641                 :         207816 : }
                                642                 :                : 
                                643                 :                : /*
                                644                 :                :  * mdopenfork() -- Open one fork of the specified relation.
                                645                 :                :  *
                                646                 :                :  * Note we only open the first segment, when there are multiple segments.
                                647                 :                :  *
                                648                 :                :  * If first segment is not present, either ereport or return NULL according
                                649                 :                :  * to "behavior".  We treat EXTENSION_CREATE the same as EXTENSION_FAIL;
                                650                 :                :  * EXTENSION_CREATE means it's OK to extend an existing relation, not to
                                651                 :                :  * invent one out of whole cloth.
                                652                 :                :  */
                                653                 :                : static MdfdVec *
 2243 tmunro@postgresql.or      654                 :        3295318 : mdopenfork(SMgrRelation reln, ForkNumber forknum, int behavior)
                                655                 :                : {
                                656                 :                :     MdfdVec    *mdfd;
                                657                 :                :     RelPathStr  path;
                                658                 :                :     File        fd;
                                659                 :                : 
                                660                 :                :     /* No work if already open */
 3285 andres@anarazel.de        661         [ +  + ]:        3295318 :     if (reln->md_num_open_segs[forknum] > 0)
                                662                 :        2231027 :         return &reln->md_seg_fds[forknum][0];
                                663                 :                : 
 1158 rhaas@postgresql.org      664                 :        1064291 :     path = relpath(reln->smgr_rlocator, forknum);
                                665                 :                : 
  193 andres@anarazel.de        666                 :        1064291 :     fd = PathNameOpenFile(path.str, _mdfd_open_flags());
                                667                 :                : 
10226 bruce@momjian.us          668         [ +  + ]:        1064291 :     if (fd < 0)
                                669                 :                :     {
 2413 akapila@postgresql.o      670         [ +  + ]:         351454 :         if ((behavior & EXTENSION_RETURN_NULL) &&
                                671         [ +  - ]:         351432 :             FILE_POSSIBLY_DELETED(errno))
                                672                 :         351432 :             return NULL;
                                673         [ +  - ]:             22 :         ereport(ERROR,
                                674                 :                :                 (errcode_for_file_access(),
                                675                 :                :                  errmsg("could not open file \"%s\": %m", path.str)));
                                676                 :                :     }
                                677                 :                : 
 3285 andres@anarazel.de        678                 :         712837 :     _fdvec_resize(reln, forknum, 1);
                                679                 :         712837 :     mdfd = &reln->md_seg_fds[forknum][0];
 7768 tgl@sss.pgh.pa.us         680                 :         712837 :     mdfd->mdfd_vfd = fd;
                                681                 :         712837 :     mdfd->mdfd_segno = 0;
                                682                 :                : 
 6235 heikki.linnakangas@i      683         [ -  + ]:         712837 :     Assert(_mdnblocks(reln, forknum, mdfd) <= ((BlockNumber) RELSEG_SIZE));
                                684                 :                : 
 7768 tgl@sss.pgh.pa.us         685                 :         712837 :     return mdfd;
                                686                 :                : }
                                687                 :                : 
                                688                 :                : /*
                                689                 :                :  * mdopen() -- Initialize newly-opened relation.
                                690                 :                :  */
                                691                 :                : void
 2243 tmunro@postgresql.or      692                 :         947375 : mdopen(SMgrRelation reln)
                                693                 :                : {
                                694                 :                :     /* mark it not open */
                                695         [ +  + ]:        4736875 :     for (int forknum = 0; forknum <= MAX_FORKNUM; forknum++)
                                696                 :        3789500 :         reln->md_num_open_segs[forknum] = 0;
                                697                 :         947375 : }
                                698                 :                : 
                                699                 :                : /*
                                700                 :                :  * mdclose() -- Close the specified relation, if it isn't closed already.
                                701                 :                :  */
                                702                 :                : void
 6235 heikki.linnakangas@i      703                 :        3397071 : mdclose(SMgrRelation reln, ForkNumber forknum)
                                704                 :                : {
 3285 andres@anarazel.de        705                 :        3397071 :     int         nopensegs = reln->md_num_open_segs[forknum];
                                706                 :                : 
                                707                 :                :     /* No work if already closed */
                                708         [ +  + ]:        3397071 :     if (nopensegs == 0)
 6821 tgl@sss.pgh.pa.us         709                 :        2887229 :         return;
                                710                 :                : 
                                711                 :                :     /* close segments starting from the end */
 3285 andres@anarazel.de        712         [ +  + ]:        1019684 :     while (nopensegs > 0)
                                713                 :                :     {
                                714                 :         509842 :         MdfdVec    *v = &reln->md_seg_fds[forknum][nopensegs - 1];
                                715                 :                : 
 2066 noah@leadboat.com         716                 :         509842 :         FileClose(v->mdfd_vfd);
                                717                 :         509842 :         _fdvec_resize(reln, forknum, nopensegs - 1);
 3285 andres@anarazel.de        718                 :         509842 :         nopensegs--;
                                719                 :                :     }
                                720                 :                : }
                                721                 :                : 
                                722                 :                : /*
                                723                 :                :  * mdprefetch() -- Initiate asynchronous read of the specified blocks of a relation
                                724                 :                :  */
                                725                 :                : bool
  630 tmunro@postgresql.or      726                 :           8006 : mdprefetch(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
                                727                 :                :            int nblocks)
                                728                 :                : {
                                729                 :                : #ifdef USE_PREFETCH
                                730                 :                : 
  882                           731         [ -  + ]:           8006 :     Assert((io_direct_flags & IO_DIRECT_DATA) == 0);
                                732                 :                : 
  630                           733         [ -  + ]:           8006 :     if ((uint64) blocknum + nblocks > (uint64) MaxBlockNumber + 1)
 1977 tmunro@postgresql.or      734                 :UBC           0 :         return false;
                                735                 :                : 
  630 tmunro@postgresql.or      736         [ +  + ]:CBC       16012 :     while (nblocks > 0)
                                737                 :                :     {
                                738                 :                :         off_t       seekpos;
                                739                 :                :         MdfdVec    *v;
                                740                 :                :         int         nblocks_this_segment;
                                741                 :                : 
                                742                 :           8006 :         v = _mdfd_getseg(reln, forknum, blocknum, false,
                                743         [ +  + ]:           8006 :                          InRecovery ? EXTENSION_RETURN_NULL : EXTENSION_FAIL);
                                744         [ -  + ]:           8006 :         if (v == NULL)
  630 tmunro@postgresql.or      745                 :UBC           0 :             return false;
                                746                 :                : 
  630 tmunro@postgresql.or      747                 :CBC        8006 :         seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE));
                                748                 :                : 
                                749         [ -  + ]:           8006 :         Assert(seekpos < (off_t) BLCKSZ * RELSEG_SIZE);
                                750                 :                : 
                                751                 :           8006 :         nblocks_this_segment =
                                752                 :           8006 :             Min(nblocks,
                                753                 :                :                 RELSEG_SIZE - (blocknum % ((BlockNumber) RELSEG_SIZE)));
                                754                 :                : 
                                755                 :           8006 :         (void) FilePrefetch(v->mdfd_vfd, seekpos, BLCKSZ * nblocks_this_segment,
                                756                 :                :                             WAIT_EVENT_DATA_FILE_PREFETCH);
                                757                 :                : 
                                758                 :           8006 :         blocknum += nblocks_this_segment;
                                759                 :           8006 :         nblocks -= nblocks_this_segment;
                                760                 :                :     }
                                761                 :                : #endif                          /* USE_PREFETCH */
                                762                 :                : 
 1977                           763                 :           8006 :     return true;
                                764                 :                : }
                                765                 :                : 
                                766                 :                : /*
                                767                 :                :  * Convert an array of buffer address into an array of iovec objects, and
                                768                 :                :  * return the number that were required.  'iov' must have enough space for up
                                769                 :                :  * to 'nblocks' elements, but the number used may be less depending on
                                770                 :                :  * merging.  In the case of a run of fully contiguous buffers, a single iovec
                                771                 :                :  * will be populated that can be handled as a plain non-vectored I/O.
                                772                 :                :  */
                                773                 :                : static int
  628                           774                 :        1803440 : buffers_to_iovec(struct iovec *iov, void **buffers, int nblocks)
                                775                 :                : {
                                776                 :                :     struct iovec *iovp;
                                777                 :                :     int         iovcnt;
                                778                 :                : 
                                779         [ -  + ]:        1803440 :     Assert(nblocks >= 1);
                                780                 :                : 
                                781                 :                :     /* If this build supports direct I/O, buffers must be I/O aligned. */
                                782         [ +  + ]:        3766814 :     for (int i = 0; i < nblocks; ++i)
                                783                 :                :     {
                                784                 :                :         if (PG_O_DIRECT != 0 && PG_IO_ALIGN_SIZE <= BLCKSZ)
                                785         [ -  + ]:        1963374 :             Assert((uintptr_t) buffers[i] ==
                                786                 :                :                    TYPEALIGN(PG_IO_ALIGN_SIZE, buffers[i]));
                                787                 :                :     }
                                788                 :                : 
                                789                 :                :     /* Start the first iovec off with the first buffer. */
                                790                 :        1803440 :     iovp = &iov[0];
                                791                 :        1803440 :     iovp->iov_base = buffers[0];
                                792                 :        1803440 :     iovp->iov_len = BLCKSZ;
                                793                 :        1803440 :     iovcnt = 1;
                                794                 :                : 
                                795                 :                :     /* Try to merge the rest. */
                                796         [ +  + ]:        1963374 :     for (int i = 1; i < nblocks; ++i)
                                797                 :                :     {
                                798                 :         159934 :         void       *buffer = buffers[i];
                                799                 :                : 
                                800         [ +  + ]:         159934 :         if (((char *) iovp->iov_base + iovp->iov_len) == buffer)
                                801                 :                :         {
                                802                 :                :             /* Contiguous with the last iovec. */
                                803                 :         158698 :             iovp->iov_len += BLCKSZ;
                                804                 :                :         }
                                805                 :                :         else
                                806                 :                :         {
                                807                 :                :             /* Need a new iovec. */
                                808                 :           1236 :             iovp++;
                                809                 :           1236 :             iovp->iov_base = buffer;
                                810                 :           1236 :             iovp->iov_len = BLCKSZ;
                                811                 :           1236 :             iovcnt++;
                                812                 :                :         }
                                813                 :                :     }
                                814                 :                : 
                                815                 :        1803440 :     return iovcnt;
                                816                 :                : }
                                817                 :                : 
                                818                 :                : /*
                                819                 :                :  * mdmaxcombine() -- Return the maximum number of total blocks that can be
                                820                 :                :  *               combined with an IO starting at blocknum.
                                821                 :                :  */
                                822                 :                : uint32
  333 andres@anarazel.de        823                 :          31935 : mdmaxcombine(SMgrRelation reln, ForkNumber forknum,
                                824                 :                :              BlockNumber blocknum)
                                825                 :                : {
                                826                 :                :     BlockNumber segoff;
                                827                 :                : 
                                828                 :          31935 :     segoff = blocknum % ((BlockNumber) RELSEG_SIZE);
                                829                 :                : 
                                830                 :          31935 :     return RELSEG_SIZE - segoff;
                                831                 :                : }
                                832                 :                : 
                                833                 :                : /*
                                834                 :                :  * mdreadv() -- Read the specified blocks from a relation.
                                835                 :                :  */
                                836                 :                : void
  628 tmunro@postgresql.or      837                 :            598 : mdreadv(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
                                838                 :                :         void **buffers, BlockNumber nblocks)
                                839                 :                : {
                                840         [ +  + ]:           1196 :     while (nblocks > 0)
                                841                 :                :     {
                                842                 :                :         struct iovec iov[PG_IOV_MAX];
                                843                 :                :         int         iovcnt;
                                844                 :                :         off_t       seekpos;
                                845                 :                :         int         nbytes;
                                846                 :                :         MdfdVec    *v;
                                847                 :                :         BlockNumber nblocks_this_segment;
                                848                 :                :         size_t      transferred_this_segment;
                                849                 :                :         size_t      size_this_segment;
                                850                 :                : 
                                851                 :            598 :         v = _mdfd_getseg(reln, forknum, blocknum, false,
                                852                 :                :                          EXTENSION_FAIL | EXTENSION_CREATE_RECOVERY);
                                853                 :                : 
                                854                 :            598 :         seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE));
                                855                 :                : 
                                856         [ -  + ]:            598 :         Assert(seekpos < (off_t) BLCKSZ * RELSEG_SIZE);
                                857                 :                : 
                                858                 :            598 :         nblocks_this_segment =
                                859                 :            598 :             Min(nblocks,
                                860                 :                :                 RELSEG_SIZE - (blocknum % ((BlockNumber) RELSEG_SIZE)));
                                861                 :            598 :         nblocks_this_segment = Min(nblocks_this_segment, lengthof(iov));
                                862                 :                : 
  333 andres@anarazel.de        863         [ -  + ]:            598 :         if (nblocks_this_segment != nblocks)
  333 andres@anarazel.de        864         [ #  # ]:UBC           0 :             elog(ERROR, "read crosses segment boundary");
                                865                 :                : 
  628 tmunro@postgresql.or      866                 :CBC         598 :         iovcnt = buffers_to_iovec(iov, buffers, nblocks_this_segment);
                                867                 :            598 :         size_this_segment = nblocks_this_segment * BLCKSZ;
                                868                 :            598 :         transferred_this_segment = 0;
                                869                 :                : 
                                870                 :                :         /*
                                871                 :                :          * Inner loop to continue after a short read.  We'll keep going until
                                872                 :                :          * we hit EOF rather than assuming that a short read means we hit the
                                873                 :                :          * end.
                                874                 :                :          */
                                875                 :                :         for (;;)
                                876                 :                :         {
  628 tmunro@postgresql.or      877                 :UBC           0 :             TRACE_POSTGRESQL_SMGR_MD_READ_START(forknum, blocknum,
                                878                 :                :                                                 reln->smgr_rlocator.locator.spcOid,
                                879                 :                :                                                 reln->smgr_rlocator.locator.dbOid,
                                880                 :                :                                                 reln->smgr_rlocator.locator.relNumber,
                                881                 :                :                                                 reln->smgr_rlocator.backend);
  628 tmunro@postgresql.or      882                 :CBC         598 :             nbytes = FileReadV(v->mdfd_vfd, iov, iovcnt, seekpos,
                                883                 :                :                                WAIT_EVENT_DATA_FILE_READ);
                                884                 :                :             TRACE_POSTGRESQL_SMGR_MD_READ_DONE(forknum, blocknum,
                                885                 :                :                                                reln->smgr_rlocator.locator.spcOid,
                                886                 :                :                                                reln->smgr_rlocator.locator.dbOid,
                                887                 :                :                                                reln->smgr_rlocator.locator.relNumber,
                                888                 :                :                                                reln->smgr_rlocator.backend,
                                889                 :                :                                                nbytes,
                                890                 :                :                                                size_this_segment - transferred_this_segment);
                                891                 :                : 
                                892                 :                : #ifdef SIMULATE_SHORT_READ
                                893                 :                :             nbytes = Min(nbytes, 4096);
                                894                 :                : #endif
                                895                 :                : 
                                896         [ -  + ]:            598 :             if (nbytes < 0)
  628 tmunro@postgresql.or      897         [ #  # ]:UBC           0 :                 ereport(ERROR,
                                898                 :                :                         (errcode_for_file_access(),
                                899                 :                :                          errmsg("could not read blocks %u..%u in file \"%s\": %m",
                                900                 :                :                                 blocknum,
                                901                 :                :                                 blocknum + nblocks_this_segment - 1,
                                902                 :                :                                 FilePathName(v->mdfd_vfd))));
                                903                 :                : 
  628 tmunro@postgresql.or      904         [ -  + ]:CBC         598 :             if (nbytes == 0)
                                905                 :                :             {
                                906                 :                :                 /*
                                907                 :                :                  * We are at or past EOF, or we read a partial block at EOF.
                                908                 :                :                  * Normally this is an error; upper levels should never try to
                                909                 :                :                  * read a nonexistent block.  However, if zero_damaged_pages
                                910                 :                :                  * is ON or we are InRecovery, we should instead return zeroes
                                911                 :                :                  * without complaining.  This allows, for example, the case of
                                912                 :                :                  * trying to update a block that was later truncated away.
                                913                 :                :                  *
                                914                 :                :                  * NB: We think that this codepath is unreachable in recovery
                                915                 :                :                  * and incomplete with zero_damaged_pages, as missing segments
                                916                 :                :                  * are not created. Putting blocks into the buffer-pool that
                                917                 :                :                  * do not exist on disk is rather problematic, as it will not
                                918                 :                :                  * be found by scans that rely on smgrnblocks(), as they are
                                919                 :                :                  * beyond EOF. It also can cause weird problems with relation
                                920                 :                :                  * extension, as relation extension does not expect blocks
                                921                 :                :                  * beyond EOF to exist.
                                922                 :                :                  *
                                923                 :                :                  * Therefore we do not want to copy the logic into
                                924                 :                :                  * mdstartreadv(), where it would have to be more complicated
                                925                 :                :                  * due to potential differences in the zero_damaged_pages
                                926                 :                :                  * setting between the definer and completor of IO.
                                927                 :                :                  *
                                928                 :                :                  * For PG 18, we are putting an Assert(false) in mdreadv()
                                929                 :                :                  * (triggering failures in assertion-enabled builds, but
                                930                 :                :                  * continuing to work in production builds). Afterwards we
                                931                 :                :                  * plan to remove this code entirely.
                                932                 :                :                  */
  628 tmunro@postgresql.or      933   [ #  #  #  # ]:UBC           0 :                 if (zero_damaged_pages || InRecovery)
                                934                 :                :                 {
  158 andres@anarazel.de        935                 :              0 :                     Assert(false);  /* see comment above */
                                936                 :                : 
                                937                 :                :                     for (BlockNumber i = transferred_this_segment / BLCKSZ;
                                938                 :                :                          i < nblocks_this_segment;
                                939                 :                :                          ++i)
                                940                 :                :                         memset(buffers[i], 0, BLCKSZ);
                                941                 :                :                     break;
                                942                 :                :                 }
                                943                 :                :                 else
  628 tmunro@postgresql.or      944         [ #  # ]:              0 :                     ereport(ERROR,
                                945                 :                :                             (errcode(ERRCODE_DATA_CORRUPTED),
                                946                 :                :                              errmsg("could not read blocks %u..%u in file \"%s\": read only %zu of %zu bytes",
                                947                 :                :                                     blocknum,
                                948                 :                :                                     blocknum + nblocks_this_segment - 1,
                                949                 :                :                                     FilePathName(v->mdfd_vfd),
                                950                 :                :                                     transferred_this_segment,
                                951                 :                :                                     size_this_segment)));
                                952                 :                :             }
                                953                 :                : 
                                954                 :                :             /* One loop should usually be enough. */
  628 tmunro@postgresql.or      955                 :CBC         598 :             transferred_this_segment += nbytes;
                                956         [ -  + ]:            598 :             Assert(transferred_this_segment <= size_this_segment);
                                957         [ +  - ]:            598 :             if (transferred_this_segment == size_this_segment)
                                958                 :            598 :                 break;
                                959                 :                : 
                                960                 :                :             /* Adjust position and vectors after a short read. */
  628 tmunro@postgresql.or      961                 :UBC           0 :             seekpos += nbytes;
                                962                 :              0 :             iovcnt = compute_remaining_iovec(iov, iov, iovcnt, nbytes);
                                963                 :                :         }
                                964                 :                : 
  628 tmunro@postgresql.or      965                 :CBC         598 :         nblocks -= nblocks_this_segment;
                                966                 :            598 :         buffers += nblocks_this_segment;
                                967                 :            598 :         blocknum += nblocks_this_segment;
                                968                 :                :     }
10651 scrappy@hub.org           969                 :            598 : }
                                970                 :                : 
                                971                 :                : /*
                                972                 :                :  * mdstartreadv() -- Asynchronous version of mdreadv().
                                973                 :                :  */
                                974                 :                : void
  161 andres@anarazel.de        975                 :        1245727 : mdstartreadv(PgAioHandle *ioh,
                                976                 :                :              SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
                                977                 :                :              void **buffers, BlockNumber nblocks)
                                978                 :                : {
                                979                 :                :     off_t       seekpos;
                                980                 :                :     MdfdVec    *v;
                                981                 :                :     BlockNumber nblocks_this_segment;
                                982                 :                :     struct iovec *iov;
                                983                 :                :     int         iovcnt;
                                984                 :                :     int         ret;
                                985                 :                : 
                                986                 :        1245727 :     v = _mdfd_getseg(reln, forknum, blocknum, false,
                                987                 :                :                      EXTENSION_FAIL | EXTENSION_CREATE_RECOVERY);
                                988                 :                : 
                                989                 :        1245712 :     seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE));
                                990                 :                : 
                                991         [ -  + ]:        1245712 :     Assert(seekpos < (off_t) BLCKSZ * RELSEG_SIZE);
                                992                 :                : 
                                993                 :        1245712 :     nblocks_this_segment =
                                994                 :        1245712 :         Min(nblocks,
                                995                 :                :             RELSEG_SIZE - (blocknum % ((BlockNumber) RELSEG_SIZE)));
                                996                 :                : 
                                997         [ -  + ]:        1245712 :     if (nblocks_this_segment != nblocks)
  161 andres@anarazel.de        998         [ #  # ]:UBC           0 :         elog(ERROR, "read crossing segment boundary");
                                999                 :                : 
  161 andres@anarazel.de       1000                 :CBC     1245712 :     iovcnt = pgaio_io_get_iovec(ioh, &iov);
                               1001                 :                : 
                               1002         [ -  + ]:        1245712 :     Assert(nblocks <= iovcnt);
                               1003                 :                : 
                               1004                 :        1245712 :     iovcnt = buffers_to_iovec(iov, buffers, nblocks_this_segment);
                               1005                 :                : 
                               1006         [ -  + ]:        1245712 :     Assert(iovcnt <= nblocks_this_segment);
                               1007                 :                : 
                               1008         [ +  + ]:        1245712 :     if (!(io_direct_flags & IO_DIRECT_DATA))
                               1009                 :        1244305 :         pgaio_io_set_flag(ioh, PGAIO_HF_BUFFERED);
                               1010                 :                : 
                               1011                 :        1245712 :     pgaio_io_set_target_smgr(ioh,
                               1012                 :                :                              reln,
                               1013                 :                :                              forknum,
                               1014                 :                :                              blocknum,
                               1015                 :                :                              nblocks,
                               1016                 :                :                              false);
                               1017                 :        1245712 :     pgaio_io_register_callbacks(ioh, PGAIO_HCB_MD_READV, 0);
                               1018                 :                : 
                               1019                 :        1245712 :     ret = FileStartReadV(ioh, v->mdfd_vfd, iovcnt, seekpos, WAIT_EVENT_DATA_FILE_READ);
                               1020         [ -  + ]:        1245712 :     if (ret != 0)
  161 andres@anarazel.de       1021         [ #  # ]:UBC           0 :         ereport(ERROR,
                               1022                 :                :                 (errcode_for_file_access(),
                               1023                 :                :                  errmsg("could not start reading blocks %u..%u in file \"%s\": %m",
                               1024                 :                :                         blocknum,
                               1025                 :                :                         blocknum + nblocks_this_segment - 1,
                               1026                 :                :                         FilePathName(v->mdfd_vfd))));
                               1027                 :                : 
                               1028                 :                :     /*
                               1029                 :                :      * The error checks corresponding to the post-read checks in mdreadv() are
                               1030                 :                :      * in md_readv_complete().
                               1031                 :                :      *
                               1032                 :                :      * However we chose, at least for now, to not implement the
                               1033                 :                :      * zero_damaged_pages logic present in mdreadv(). As outlined in mdreadv()
                               1034                 :                :      * that logic is rather problematic, and we want to get rid of it. Here
                               1035                 :                :      * equivalent logic would have to be more complicated due to potential
                               1036                 :                :      * differences in the zero_damaged_pages setting between the definer and
                               1037                 :                :      * completor of IO.
                               1038                 :                :      */
  161 andres@anarazel.de       1039                 :CBC     1245712 : }
                               1040                 :                : 
                               1041                 :                : /*
                               1042                 :                :  * mdwritev() -- Write the supplied blocks at the appropriate location.
                               1043                 :                :  *
                               1044                 :                :  * This is to be used only for updating already-existing blocks of a
                               1045                 :                :  * relation (ie, those before the current EOF).  To extend a relation,
                               1046                 :                :  * use mdextend().
                               1047                 :                :  */
                               1048                 :                : void
  628 tmunro@postgresql.or     1049                 :         557130 : mdwritev(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
                               1050                 :                :          const void **buffers, BlockNumber nblocks, bool skipFsync)
                               1051                 :                : {
                               1052                 :                :     /* This assert is too expensive to have on normally ... */
                               1053                 :                : #ifdef CHECK_WRITE_VS_EXTEND
                               1054                 :                :     Assert((uint64) blocknum + (uint64) nblocks <= (uint64) mdnblocks(reln, forknum));
                               1055                 :                : #endif
                               1056                 :                : 
                               1057         [ +  + ]:        1114260 :     while (nblocks > 0)
                               1058                 :                :     {
                               1059                 :                :         struct iovec iov[PG_IOV_MAX];
                               1060                 :                :         int         iovcnt;
                               1061                 :                :         off_t       seekpos;
                               1062                 :                :         int         nbytes;
                               1063                 :                :         MdfdVec    *v;
                               1064                 :                :         BlockNumber nblocks_this_segment;
                               1065                 :                :         size_t      transferred_this_segment;
                               1066                 :                :         size_t      size_this_segment;
                               1067                 :                : 
                               1068                 :         557130 :         v = _mdfd_getseg(reln, forknum, blocknum, skipFsync,
                               1069                 :                :                          EXTENSION_FAIL | EXTENSION_CREATE_RECOVERY);
                               1070                 :                : 
                               1071                 :         557130 :         seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE));
                               1072                 :                : 
                               1073         [ -  + ]:         557130 :         Assert(seekpos < (off_t) BLCKSZ * RELSEG_SIZE);
                               1074                 :                : 
                               1075                 :         557130 :         nblocks_this_segment =
                               1076                 :         557130 :             Min(nblocks,
                               1077                 :                :                 RELSEG_SIZE - (blocknum % ((BlockNumber) RELSEG_SIZE)));
                               1078                 :         557130 :         nblocks_this_segment = Min(nblocks_this_segment, lengthof(iov));
                               1079                 :                : 
  333 andres@anarazel.de       1080         [ -  + ]:         557130 :         if (nblocks_this_segment != nblocks)
  333 andres@anarazel.de       1081         [ #  # ]:UBC           0 :             elog(ERROR, "write crosses segment boundary");
                               1082                 :                : 
  628 tmunro@postgresql.or     1083                 :CBC      557130 :         iovcnt = buffers_to_iovec(iov, (void **) buffers, nblocks_this_segment);
                               1084                 :         557130 :         size_this_segment = nblocks_this_segment * BLCKSZ;
                               1085                 :         557130 :         transferred_this_segment = 0;
                               1086                 :                : 
                               1087                 :                :         /*
                               1088                 :                :          * Inner loop to continue after a short write.  If the reason is that
                               1089                 :                :          * we're out of disk space, a future attempt should get an ENOSPC
                               1090                 :                :          * error from the kernel.
                               1091                 :                :          */
                               1092                 :                :         for (;;)
                               1093                 :                :         {
  628 tmunro@postgresql.or     1094                 :UBC           0 :             TRACE_POSTGRESQL_SMGR_MD_WRITE_START(forknum, blocknum,
                               1095                 :                :                                                  reln->smgr_rlocator.locator.spcOid,
                               1096                 :                :                                                  reln->smgr_rlocator.locator.dbOid,
                               1097                 :                :                                                  reln->smgr_rlocator.locator.relNumber,
                               1098                 :                :                                                  reln->smgr_rlocator.backend);
  628 tmunro@postgresql.or     1099                 :CBC      557130 :             nbytes = FileWriteV(v->mdfd_vfd, iov, iovcnt, seekpos,
                               1100                 :                :                                 WAIT_EVENT_DATA_FILE_WRITE);
                               1101                 :                :             TRACE_POSTGRESQL_SMGR_MD_WRITE_DONE(forknum, blocknum,
                               1102                 :                :                                                 reln->smgr_rlocator.locator.spcOid,
                               1103                 :                :                                                 reln->smgr_rlocator.locator.dbOid,
                               1104                 :                :                                                 reln->smgr_rlocator.locator.relNumber,
                               1105                 :                :                                                 reln->smgr_rlocator.backend,
                               1106                 :                :                                                 nbytes,
                               1107                 :                :                                                 size_this_segment - transferred_this_segment);
                               1108                 :                : 
                               1109                 :                : #ifdef SIMULATE_SHORT_WRITE
                               1110                 :                :             nbytes = Min(nbytes, 4096);
                               1111                 :                : #endif
                               1112                 :                : 
                               1113         [ -  + ]:         557130 :             if (nbytes < 0)
                               1114                 :                :             {
  628 tmunro@postgresql.or     1115                 :UBC           0 :                 bool        enospc = errno == ENOSPC;
                               1116                 :                : 
                               1117   [ #  #  #  # ]:              0 :                 ereport(ERROR,
                               1118                 :                :                         (errcode_for_file_access(),
                               1119                 :                :                          errmsg("could not write blocks %u..%u in file \"%s\": %m",
                               1120                 :                :                                 blocknum,
                               1121                 :                :                                 blocknum + nblocks_this_segment - 1,
                               1122                 :                :                                 FilePathName(v->mdfd_vfd)),
                               1123                 :                :                          enospc ? errhint("Check free disk space.") : 0));
                               1124                 :                :             }
                               1125                 :                : 
                               1126                 :                :             /* One loop should usually be enough. */
  628 tmunro@postgresql.or     1127                 :CBC      557130 :             transferred_this_segment += nbytes;
                               1128         [ -  + ]:         557130 :             Assert(transferred_this_segment <= size_this_segment);
                               1129         [ +  - ]:         557130 :             if (transferred_this_segment == size_this_segment)
                               1130                 :         557130 :                 break;
                               1131                 :                : 
                               1132                 :                :             /* Adjust position and iovecs after a short write. */
  628 tmunro@postgresql.or     1133                 :UBC           0 :             seekpos += nbytes;
                               1134                 :              0 :             iovcnt = compute_remaining_iovec(iov, iov, iovcnt, nbytes);
                               1135                 :                :         }
                               1136                 :                : 
  628 tmunro@postgresql.or     1137   [ +  +  +  + ]:CBC      557130 :         if (!skipFsync && !SmgrIsTemp(reln))
                               1138                 :         552710 :             register_dirty_segment(reln, forknum, v);
                               1139                 :                : 
                               1140                 :         557130 :         nblocks -= nblocks_this_segment;
                               1141                 :         557130 :         buffers += nblocks_this_segment;
                               1142                 :         557130 :         blocknum += nblocks_this_segment;
                               1143                 :                :     }
 9281 tgl@sss.pgh.pa.us        1144                 :         557130 : }
                               1145                 :                : 
                               1146                 :                : 
                               1147                 :                : /*
                               1148                 :                :  * mdwriteback() -- Tell the kernel to write pages back to storage.
                               1149                 :                :  *
                               1150                 :                :  * This accepts a range of blocks because flushing several pages at once is
                               1151                 :                :  * considerably more efficient than doing so individually.
                               1152                 :                :  */
                               1153                 :                : void
  841 peter@eisentraut.org     1154                 :UBC           0 : mdwriteback(SMgrRelation reln, ForkNumber forknum,
                               1155                 :                :             BlockNumber blocknum, BlockNumber nblocks)
                               1156                 :                : {
                               1157         [ #  # ]:              0 :     Assert((io_direct_flags & IO_DIRECT_DATA) == 0);
                               1158                 :                : 
                               1159                 :                :     /*
                               1160                 :                :      * Issue flush requests in as few requests as possible; have to split at
                               1161                 :                :      * segment boundaries though, since those are actually separate files.
                               1162                 :                :      */
                               1163         [ #  # ]:              0 :     while (nblocks > 0)
                               1164                 :                :     {
                               1165                 :              0 :         BlockNumber nflush = nblocks;
                               1166                 :                :         off_t       seekpos;
                               1167                 :                :         MdfdVec    *v;
                               1168                 :                :         int         segnum_start,
                               1169                 :                :                     segnum_end;
                               1170                 :                : 
                               1171                 :              0 :         v = _mdfd_getseg(reln, forknum, blocknum, true /* not used */ ,
                               1172                 :                :                          EXTENSION_DONT_OPEN);
                               1173                 :                : 
                               1174                 :                :         /*
                               1175                 :                :          * We might be flushing buffers of already removed relations, that's
                               1176                 :                :          * ok, just ignore that case.  If the segment file wasn't open already
                               1177                 :                :          * (ie from a recent mdwrite()), then we don't want to re-open it, to
                               1178                 :                :          * avoid a race with PROCSIGNAL_BARRIER_SMGRRELEASE that might leave
                               1179                 :                :          * us with a descriptor to a file that is about to be unlinked.
                               1180                 :                :          */
                               1181         [ #  # ]:              0 :         if (!v)
                               1182                 :              0 :             return;
                               1183                 :                : 
                               1184                 :                :         /* compute offset inside the current segment */
                               1185                 :              0 :         segnum_start = blocknum / RELSEG_SIZE;
                               1186                 :                : 
                               1187                 :                :         /* compute number of desired writes within the current segment */
                               1188                 :              0 :         segnum_end = (blocknum + nblocks - 1) / RELSEG_SIZE;
                               1189         [ #  # ]:              0 :         if (segnum_start != segnum_end)
                               1190                 :              0 :             nflush = RELSEG_SIZE - (blocknum % ((BlockNumber) RELSEG_SIZE));
                               1191                 :                : 
                               1192         [ #  # ]:              0 :         Assert(nflush >= 1);
                               1193         [ #  # ]:              0 :         Assert(nflush <= nblocks);
                               1194                 :                : 
                               1195                 :              0 :         seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE));
                               1196                 :                : 
                               1197                 :              0 :         FileWriteback(v->mdfd_vfd, seekpos, (off_t) BLCKSZ * nflush, WAIT_EVENT_DATA_FILE_FLUSH);
                               1198                 :                : 
                               1199                 :              0 :         nblocks -= nflush;
                               1200                 :              0 :         blocknum += nflush;
                               1201                 :                :     }
                               1202                 :                : }
                               1203                 :                : 
                               1204                 :                : /*
                               1205                 :                :  * mdnblocks() -- Get the number of blocks stored in a relation.
                               1206                 :                :  *
                               1207                 :                :  * Important side effect: all active segments of the relation are opened
                               1208                 :                :  * and added to the md_seg_fds array.  If this routine has not been
                               1209                 :                :  * called, then only segments up to the last one actually touched
                               1210                 :                :  * are present in the array.
                               1211                 :                :  */
                               1212                 :                : BlockNumber
 6235 heikki.linnakangas@i     1213                 :CBC     2125614 : mdnblocks(SMgrRelation reln, ForkNumber forknum)
                               1214                 :                : {
                               1215                 :                :     MdfdVec    *v;
                               1216                 :                :     BlockNumber nblocks;
                               1217                 :                :     BlockNumber segno;
                               1218                 :                : 
 1829 bruce@momjian.us         1219                 :        2125614 :     mdopenfork(reln, forknum, EXTENSION_FAIL);
                               1220                 :                : 
                               1221                 :                :     /* mdopen has opened the first segment */
 3285 andres@anarazel.de       1222         [ -  + ]:        2125595 :     Assert(reln->md_num_open_segs[forknum] > 0);
                               1223                 :                : 
                               1224                 :                :     /*
                               1225                 :                :      * Start from the last open segments, to avoid redundant seeks.  We have
                               1226                 :                :      * previously verified that these segments are exactly RELSEG_SIZE long,
                               1227                 :                :      * and it's useless to recheck that each time.
                               1228                 :                :      *
                               1229                 :                :      * NOTE: this assumption could only be wrong if another backend has
                               1230                 :                :      * truncated the relation.  We rely on higher code levels to handle that
                               1231                 :                :      * scenario by closing and re-opening the md fd, which is handled via
                               1232                 :                :      * relcache flush.  (Since the checkpointer doesn't participate in
                               1233                 :                :      * relcache flush, it could have segment entries for inactive segments;
                               1234                 :                :      * that's OK because the checkpointer never needs to compute relation
                               1235                 :                :      * size.)
                               1236                 :                :      */
                               1237                 :        2125595 :     segno = reln->md_num_open_segs[forknum] - 1;
                               1238                 :        2125595 :     v = &reln->md_seg_fds[forknum][segno];
                               1239                 :                : 
                               1240                 :                :     for (;;)
                               1241                 :                :     {
 6235 heikki.linnakangas@i     1242                 :        2125595 :         nblocks = _mdnblocks(reln, forknum, v);
 8837 tgl@sss.pgh.pa.us        1243         [ -  + ]:        2125595 :         if (nblocks > ((BlockNumber) RELSEG_SIZE))
 8080 tgl@sss.pgh.pa.us        1244         [ #  # ]:UBC           0 :             elog(FATAL, "segment too big");
 8837 tgl@sss.pgh.pa.us        1245         [ +  - ]:CBC     2125595 :         if (nblocks < ((BlockNumber) RELSEG_SIZE))
                               1246                 :        2125595 :             return (segno * ((BlockNumber) RELSEG_SIZE)) + nblocks;
                               1247                 :                : 
                               1248                 :                :         /*
                               1249                 :                :          * If segment is exactly RELSEG_SIZE, advance to next one.
                               1250                 :                :          */
 8885 tgl@sss.pgh.pa.us        1251                 :UBC           0 :         segno++;
                               1252                 :                : 
                               1253                 :                :         /*
                               1254                 :                :          * We used to pass O_CREAT here, but that has the disadvantage that it
                               1255                 :                :          * might create a segment which has vanished through some operating
                               1256                 :                :          * system misadventure.  In such a case, creating the segment here
                               1257                 :                :          * undermines _mdfd_getseg's attempts to notice and report an error
                               1258                 :                :          * upon access to a missing segment.
                               1259                 :                :          */
 3285 andres@anarazel.de       1260                 :              0 :         v = _mdfd_openseg(reln, forknum, segno, 0);
                               1261         [ #  # ]:              0 :         if (v == NULL)
                               1262                 :              0 :             return segno * ((BlockNumber) RELSEG_SIZE);
                               1263                 :                :     }
                               1264                 :                : }
                               1265                 :                : 
                               1266                 :                : /*
                               1267                 :                :  * mdtruncate() -- Truncate relation to specified number of blocks.
                               1268                 :                :  *
                               1269                 :                :  * Guaranteed not to allocate memory, so it can be used in a critical section.
                               1270                 :                :  * Caller must have called smgrnblocks() to obtain curnblk while holding a
                               1271                 :                :  * sufficient lock to prevent a change in relation size, and not used any smgr
                               1272                 :                :  * functions for this relation or handled interrupts in between.  This makes
                               1273                 :                :  * sure we have opened all active segments, so that truncate loop will get
                               1274                 :                :  * them all!
                               1275                 :                :  */
                               1276                 :                : void
  260 tmunro@postgresql.or     1277                 :CBC         907 : mdtruncate(SMgrRelation reln, ForkNumber forknum,
                               1278                 :                :            BlockNumber curnblk, BlockNumber nblocks)
                               1279                 :                : {
                               1280                 :                :     BlockNumber priorblocks;
                               1281                 :                :     int         curopensegs;
                               1282                 :                : 
 8837 tgl@sss.pgh.pa.us        1283         [ -  + ]:            907 :     if (nblocks > curnblk)
                               1284                 :                :     {
                               1285                 :                :         /* Bogus request ... but no complaint if InRecovery */
 6821 tgl@sss.pgh.pa.us        1286         [ #  # ]:UBC           0 :         if (InRecovery)
                               1287                 :              0 :             return;
                               1288         [ #  # ]:              0 :         ereport(ERROR,
                               1289                 :                :                 (errmsg("could not truncate file \"%s\" to %u blocks: it's only %u blocks now",
                               1290                 :                :                         relpath(reln->smgr_rlocator, forknum).str,
                               1291                 :                :                         nblocks, curnblk)));
                               1292                 :                :     }
 9501 tgl@sss.pgh.pa.us        1293         [ +  + ]:CBC         907 :     if (nblocks == curnblk)
 6821                          1294                 :            371 :         return;                 /* no work */
                               1295                 :                : 
                               1296                 :                :     /*
                               1297                 :                :      * Truncate segments, starting at the last one. Starting at the end makes
                               1298                 :                :      * managing the memory for the fd array easier, should there be errors.
                               1299                 :                :      */
 3285 andres@anarazel.de       1300                 :            536 :     curopensegs = reln->md_num_open_segs[forknum];
                               1301         [ +  + ]:           1072 :     while (curopensegs > 0)
                               1302                 :                :     {
                               1303                 :                :         MdfdVec    *v;
                               1304                 :                : 
                               1305                 :            536 :         priorblocks = (curopensegs - 1) * RELSEG_SIZE;
                               1306                 :                : 
                               1307                 :            536 :         v = &reln->md_seg_fds[forknum][curopensegs - 1];
                               1308                 :                : 
 9501 tgl@sss.pgh.pa.us        1309         [ -  + ]:            536 :         if (priorblocks > nblocks)
                               1310                 :                :         {
                               1311                 :                :             /*
                               1312                 :                :              * This segment is no longer active. We truncate the file, but do
                               1313                 :                :              * not delete it, for reasons explained in the header comments.
                               1314                 :                :              */
 3094 rhaas@postgresql.org     1315         [ #  # ]:UBC           0 :             if (FileTruncate(v->mdfd_vfd, 0, WAIT_EVENT_DATA_FILE_TRUNCATE) < 0)
 6821 tgl@sss.pgh.pa.us        1316         [ #  # ]:              0 :                 ereport(ERROR,
                               1317                 :                :                         (errcode_for_file_access(),
                               1318                 :                :                          errmsg("could not truncate file \"%s\": %m",
                               1319                 :                :                                 FilePathName(v->mdfd_vfd))));
                               1320                 :                : 
 5503 rhaas@postgresql.org     1321         [ #  # ]:              0 :             if (!SmgrIsTemp(reln))
 6235 heikki.linnakangas@i     1322                 :              0 :                 register_dirty_segment(reln, forknum, v);
                               1323                 :                : 
                               1324                 :                :             /* we never drop the 1st segment */
 3285 andres@anarazel.de       1325         [ #  # ]:              0 :             Assert(v != &reln->md_seg_fds[forknum][0]);
                               1326                 :                : 
                               1327                 :              0 :             FileClose(v->mdfd_vfd);
                               1328                 :              0 :             _fdvec_resize(reln, forknum, curopensegs - 1);
                               1329                 :                :         }
 8837 tgl@sss.pgh.pa.us        1330         [ +  - ]:CBC         536 :         else if (priorblocks + ((BlockNumber) RELSEG_SIZE) > nblocks)
                               1331                 :                :         {
                               1332                 :                :             /*
                               1333                 :                :              * This is the last segment we want to keep. Truncate the file to
                               1334                 :                :              * the right length. NOTE: if nblocks is exactly a multiple K of
                               1335                 :                :              * RELSEG_SIZE, we will truncate the K+1st segment to 0 length but
                               1336                 :                :              * keep it. This adheres to the invariant given in the header
                               1337                 :                :              * comments.
                               1338                 :                :              */
 8717 bruce@momjian.us         1339                 :            536 :             BlockNumber lastsegblocks = nblocks - priorblocks;
                               1340                 :                : 
 3094 rhaas@postgresql.org     1341         [ -  + ]:            536 :             if (FileTruncate(v->mdfd_vfd, (off_t) lastsegblocks * BLCKSZ, WAIT_EVENT_DATA_FILE_TRUNCATE) < 0)
 6821 tgl@sss.pgh.pa.us        1342         [ #  # ]:UBC           0 :                 ereport(ERROR,
                               1343                 :                :                         (errcode_for_file_access(),
                               1344                 :                :                          errmsg("could not truncate file \"%s\" to %u blocks: %m",
                               1345                 :                :                                 FilePathName(v->mdfd_vfd),
                               1346                 :                :                                 nblocks)));
 5503 rhaas@postgresql.org     1347         [ +  + ]:CBC         536 :             if (!SmgrIsTemp(reln))
 6235 heikki.linnakangas@i     1348                 :            372 :                 register_dirty_segment(reln, forknum, v);
                               1349                 :                :         }
                               1350                 :                :         else
                               1351                 :                :         {
                               1352                 :                :             /*
                               1353                 :                :              * We still need this segment, so nothing to do for this and any
                               1354                 :                :              * earlier segment.
                               1355                 :                :              */
 3285 andres@anarazel.de       1356                 :UBC           0 :             break;
                               1357                 :                :         }
 3285 andres@anarazel.de       1358                 :CBC         536 :         curopensegs--;
                               1359                 :                :     }
                               1360                 :                : }
                               1361                 :                : 
                               1362                 :                : /*
                               1363                 :                :  * mdregistersync() -- Mark whole relation as needing fsync
                               1364                 :                :  */
                               1365                 :                : void
  561 heikki.linnakangas@i     1366                 :          24653 : mdregistersync(SMgrRelation reln, ForkNumber forknum)
                               1367                 :                : {
                               1368                 :                :     int         segno;
                               1369                 :                :     int         min_inactive_seg;
                               1370                 :                : 
                               1371                 :                :     /*
                               1372                 :                :      * NOTE: mdnblocks makes sure we have opened all active segments, so that
                               1373                 :                :      * the loop below will get them all!
                               1374                 :                :      */
                               1375                 :          24653 :     mdnblocks(reln, forknum);
                               1376                 :                : 
                               1377                 :          24653 :     min_inactive_seg = segno = reln->md_num_open_segs[forknum];
                               1378                 :                : 
                               1379                 :                :     /*
                               1380                 :                :      * Temporarily open inactive segments, then close them after sync.  There
                               1381                 :                :      * may be some inactive segments left opened after error, but that is
                               1382                 :                :      * harmless.  We don't bother to clean them up and take a risk of further
                               1383                 :                :      * trouble.  The next mdclose() will soon close them.
                               1384                 :                :      */
                               1385         [ -  + ]:          24653 :     while (_mdfd_openseg(reln, forknum, segno, 0) != NULL)
  561 heikki.linnakangas@i     1386                 :UBC           0 :         segno++;
                               1387                 :                : 
  561 heikki.linnakangas@i     1388         [ +  + ]:CBC       49306 :     while (segno > 0)
                               1389                 :                :     {
                               1390                 :          24653 :         MdfdVec    *v = &reln->md_seg_fds[forknum][segno - 1];
                               1391                 :                : 
                               1392                 :          24653 :         register_dirty_segment(reln, forknum, v);
                               1393                 :                : 
                               1394                 :                :         /* Close inactive segments immediately */
                               1395         [ -  + ]:          24653 :         if (segno > min_inactive_seg)
                               1396                 :                :         {
  561 heikki.linnakangas@i     1397                 :UBC           0 :             FileClose(v->mdfd_vfd);
                               1398                 :              0 :             _fdvec_resize(reln, forknum, segno - 1);
                               1399                 :                :         }
                               1400                 :                : 
  561 heikki.linnakangas@i     1401                 :CBC       24653 :         segno--;
                               1402                 :                :     }
                               1403                 :          24653 : }
                               1404                 :                : 
                               1405                 :                : /*
                               1406                 :                :  * mdimmedsync() -- Immediately sync a relation to stable storage.
                               1407                 :                :  *
                               1408                 :                :  * Note that only writes already issued are synced; this routine knows
                               1409                 :                :  * nothing of dirty buffers that may exist inside the buffer manager.  We
                               1410                 :                :  * sync active and inactive segments; smgrDoPendingSyncs() relies on this.
                               1411                 :                :  * Consider a relation skipping WAL.  Suppose a checkpoint syncs blocks of
                               1412                 :                :  * some segment, then mdtruncate() renders that segment inactive.  If we
                               1413                 :                :  * crash before the next checkpoint syncs the newly-inactive segment, that
                               1414                 :                :  * segment may survive recovery, reintroducing unwanted data into the table.
                               1415                 :                :  */
                               1416                 :                : void
 6235                          1417                 :             23 : mdimmedsync(SMgrRelation reln, ForkNumber forknum)
                               1418                 :                : {
                               1419                 :                :     int         segno;
                               1420                 :                :     int         min_inactive_seg;
                               1421                 :                : 
                               1422                 :                :     /*
                               1423                 :                :      * NOTE: mdnblocks makes sure we have opened all active segments, so that
                               1424                 :                :      * the loop below will get them all!
                               1425                 :                :      */
 5262 peter_e@gmx.net          1426                 :             23 :     mdnblocks(reln, forknum);
                               1427                 :                : 
 1981 noah@leadboat.com        1428                 :             23 :     min_inactive_seg = segno = reln->md_num_open_segs[forknum];
                               1429                 :                : 
                               1430                 :                :     /*
                               1431                 :                :      * Temporarily open inactive segments, then close them after sync.  There
                               1432                 :                :      * may be some inactive segments left opened after fsync() error, but that
                               1433                 :                :      * is harmless.  We don't bother to clean them up and take a risk of
                               1434                 :                :      * further trouble.  The next mdclose() will soon close them.
                               1435                 :                :      */
                               1436         [ -  + ]:             23 :     while (_mdfd_openseg(reln, forknum, segno, 0) != NULL)
 1981 noah@leadboat.com        1437                 :UBC           0 :         segno++;
                               1438                 :                : 
 3285 andres@anarazel.de       1439         [ +  + ]:CBC          46 :     while (segno > 0)
                               1440                 :                :     {
                               1441                 :             23 :         MdfdVec    *v = &reln->md_seg_fds[forknum][segno - 1];
                               1442                 :                : 
                               1443                 :                :         /*
                               1444                 :                :          * fsyncs done through mdimmedsync() should be tracked in a separate
                               1445                 :                :          * IOContext than those done through mdsyncfiletag() to differentiate
                               1446                 :                :          * between unavoidable client backend fsyncs (e.g. those done during
                               1447                 :                :          * index build) and those which ideally would have been done by the
                               1448                 :                :          * checkpointer. Since other IO operations bypassing the buffer
                               1449                 :                :          * manager could also be tracked in such an IOContext, wait until
                               1450                 :                :          * these are also tracked to track immediate fsyncs.
                               1451                 :                :          */
 3094 rhaas@postgresql.org     1452         [ -  + ]:             23 :         if (FileSync(v->mdfd_vfd, WAIT_EVENT_DATA_FILE_IMMEDIATE_SYNC) < 0)
 2483 tmunro@postgresql.or     1453         [ #  # ]:UBC           0 :             ereport(data_sync_elevel(ERROR),
                               1454                 :                :                     (errcode_for_file_access(),
                               1455                 :                :                      errmsg("could not fsync file \"%s\": %m",
                               1456                 :                :                             FilePathName(v->mdfd_vfd))));
                               1457                 :                : 
                               1458                 :                :         /* Close inactive segments immediately */
 1981 noah@leadboat.com        1459         [ -  + ]:CBC          23 :         if (segno > min_inactive_seg)
                               1460                 :                :         {
 1981 noah@leadboat.com        1461                 :UBC           0 :             FileClose(v->mdfd_vfd);
                               1462                 :              0 :             _fdvec_resize(reln, forknum, segno - 1);
                               1463                 :                :         }
                               1464                 :                : 
 3285 andres@anarazel.de       1465                 :CBC          23 :         segno--;
                               1466                 :                :     }
 7766 tgl@sss.pgh.pa.us        1467                 :             23 : }
                               1468                 :                : 
                               1469                 :                : int
  161 andres@anarazel.de       1470                 :         448402 : mdfd(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum, uint32 *off)
                               1471                 :                : {
                               1472                 :         448402 :     MdfdVec    *v = mdopenfork(reln, forknum, EXTENSION_FAIL);
                               1473                 :                : 
                               1474                 :         448402 :     v = _mdfd_getseg(reln, forknum, blocknum, false,
                               1475                 :                :                      EXTENSION_FAIL);
                               1476                 :                : 
                               1477                 :         448402 :     *off = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE));
                               1478                 :                : 
                               1479         [ -  + ]:         448402 :     Assert(*off < (off_t) BLCKSZ * RELSEG_SIZE);
                               1480                 :                : 
                               1481                 :         448402 :     return FileGetRawDesc(v->mdfd_vfd);
                               1482                 :                : }
                               1483                 :                : 
                               1484                 :                : /*
                               1485                 :                :  * register_dirty_segment() -- Mark a relation segment as needing fsync
                               1486                 :                :  *
                               1487                 :                :  * If there is a local pending-ops table, just make an entry in it for
                               1488                 :                :  * ProcessSyncRequests to process later.  Otherwise, try to pass off the
                               1489                 :                :  * fsync request to the checkpointer process.  If that fails, just do the
                               1490                 :                :  * fsync locally before returning (we hope this will not happen often
                               1491                 :                :  * enough to be a performance problem).
                               1492                 :                :  */
                               1493                 :                : static void
 6235 heikki.linnakangas@i     1494                 :         925660 : register_dirty_segment(SMgrRelation reln, ForkNumber forknum, MdfdVec *seg)
                               1495                 :                : {
                               1496                 :                :     FileTag     tag;
                               1497                 :                : 
 1158 rhaas@postgresql.org     1498                 :         925660 :     INIT_MD_FILETAG(tag, reln->smgr_rlocator.locator, forknum, seg->mdfd_segno);
                               1499                 :                : 
                               1500                 :                :     /* Temp relations should never be fsync'd */
 4799 tgl@sss.pgh.pa.us        1501         [ -  + ]:         925660 :     Assert(!SmgrIsTemp(reln));
                               1502                 :                : 
 2347 tmunro@postgresql.or     1503         [ +  + ]:         925660 :     if (!RegisterSyncRequest(&tag, SYNC_REQUEST, false /* retryOnError */ ))
                               1504                 :                :     {
                               1505                 :                :         instr_time  io_start;
                               1506                 :                : 
  883 andres@anarazel.de       1507         [ -  + ]:            243 :         ereport(DEBUG1,
                               1508                 :                :                 (errmsg_internal("could not forward fsync request because request queue is full")));
                               1509                 :                : 
  192 michael@paquier.xyz      1510                 :            243 :         io_start = pgstat_prepare_io_time(track_io_timing);
                               1511                 :                : 
  883 andres@anarazel.de       1512         [ -  + ]:            243 :         if (FileSync(seg->mdfd_vfd, WAIT_EVENT_DATA_FILE_SYNC) < 0)
  883 andres@anarazel.de       1513         [ #  # ]:UBC           0 :             ereport(data_sync_elevel(ERROR),
                               1514                 :                :                     (errcode_for_file_access(),
                               1515                 :                :                      errmsg("could not fsync file \"%s\": %m",
                               1516                 :                :                             FilePathName(seg->mdfd_vfd))));
                               1517                 :                : 
                               1518                 :                :         /*
                               1519                 :                :          * We have no way of knowing if the current IOContext is
                               1520                 :                :          * IOCONTEXT_NORMAL or IOCONTEXT_[BULKREAD, BULKWRITE, VACUUM] at this
                               1521                 :                :          * point, so count the fsync as being in the IOCONTEXT_NORMAL
                               1522                 :                :          * IOContext. This is probably okay, because the number of backend
                               1523                 :                :          * fsyncs doesn't say anything about the efficacy of the
                               1524                 :                :          * BufferAccessStrategy. And counting both fsyncs done in
                               1525                 :                :          * IOCONTEXT_NORMAL and IOCONTEXT_[BULKREAD, BULKWRITE, VACUUM] under
                               1526                 :                :          * IOCONTEXT_NORMAL is likely clearer when investigating the number of
                               1527                 :                :          * backend fsyncs.
                               1528                 :                :          */
  883 andres@anarazel.de       1529                 :CBC         243 :         pgstat_count_io_op_time(IOOBJECT_RELATION, IOCONTEXT_NORMAL,
                               1530                 :                :                                 IOOP_FSYNC, io_start, 1, 0);
                               1531                 :                :     }
10651 scrappy@hub.org          1532                 :         925660 : }
                               1533                 :                : 
                               1534                 :                : /*
                               1535                 :                :  * register_unlink_segment() -- Schedule a file to be deleted after next checkpoint
                               1536                 :                :  */
                               1537                 :                : static void
 1158 rhaas@postgresql.org     1538                 :          34463 : register_unlink_segment(RelFileLocatorBackend rlocator, ForkNumber forknum,
                               1539                 :                :                         BlockNumber segno)
                               1540                 :                : {
                               1541                 :                :     FileTag     tag;
                               1542                 :                : 
                               1543                 :          34463 :     INIT_MD_FILETAG(tag, rlocator.locator, forknum, segno);
                               1544                 :                : 
                               1545                 :                :     /* Should never be used with temp relations */
                               1546         [ -  + ]:          34463 :     Assert(!RelFileLocatorBackendIsTemp(rlocator));
                               1547                 :                : 
 2347 tmunro@postgresql.or     1548                 :          34463 :     RegisterSyncRequest(&tag, SYNC_UNLINK_REQUEST, true /* retryOnError */ );
 6505 tgl@sss.pgh.pa.us        1549                 :          34463 : }
                               1550                 :                : 
                               1551                 :                : /*
                               1552                 :                :  * register_forget_request() -- forget any fsyncs for a relation fork's segment
                               1553                 :                :  */
                               1554                 :                : static void
 1158 rhaas@postgresql.org     1555                 :         133673 : register_forget_request(RelFileLocatorBackend rlocator, ForkNumber forknum,
                               1556                 :                :                         BlockNumber segno)
                               1557                 :                : {
                               1558                 :                :     FileTag     tag;
                               1559                 :                : 
                               1560                 :         133673 :     INIT_MD_FILETAG(tag, rlocator.locator, forknum, segno);
                               1561                 :                : 
 2347 tmunro@postgresql.or     1562                 :         133673 :     RegisterSyncRequest(&tag, SYNC_FORGET_REQUEST, true /* retryOnError */ );
 6807 tgl@sss.pgh.pa.us        1563                 :         133673 : }
                               1564                 :                : 
                               1565                 :                : /*
                               1566                 :                :  * ForgetDatabaseSyncRequests -- forget any fsyncs and unlinks for a DB
                               1567                 :                :  */
                               1568                 :                : void
 2347 tmunro@postgresql.or     1569                 :             59 : ForgetDatabaseSyncRequests(Oid dbid)
                               1570                 :                : {
                               1571                 :                :     FileTag     tag;
                               1572                 :                :     RelFileLocator rlocator;
                               1573                 :                : 
 1158 rhaas@postgresql.org     1574                 :             59 :     rlocator.dbOid = dbid;
                               1575                 :             59 :     rlocator.spcOid = 0;
                               1576                 :             59 :     rlocator.relNumber = 0;
                               1577                 :                : 
                               1578                 :             59 :     INIT_MD_FILETAG(tag, rlocator, InvalidForkNumber, InvalidBlockNumber);
                               1579                 :                : 
 2347 tmunro@postgresql.or     1580                 :             59 :     RegisterSyncRequest(&tag, SYNC_FILTER_REQUEST, true /* retryOnError */ );
 9079 vadim4o@yahoo.com        1581                 :             59 : }
                               1582                 :                : 
                               1583                 :                : /*
                               1584                 :                :  * DropRelationFiles -- drop files of all given relations
                               1585                 :                :  */
                               1586                 :                : void
 1158 rhaas@postgresql.org     1587                 :           2620 : DropRelationFiles(RelFileLocator *delrels, int ndelrels, bool isRedo)
                               1588                 :                : {
                               1589                 :                :     SMgrRelation *srels;
                               1590                 :                :     int         i;
                               1591                 :                : 
 2620 fujii@postgresql.org     1592                 :           2620 :     srels = palloc(sizeof(SMgrRelation) * ndelrels);
                               1593         [ +  + ]:          10159 :     for (i = 0; i < ndelrels; i++)
                               1594                 :                :     {
  552 heikki.linnakangas@i     1595                 :           7539 :         SMgrRelation srel = smgropen(delrels[i], INVALID_PROC_NUMBER);
                               1596                 :                : 
 2620 fujii@postgresql.org     1597         [ +  + ]:           7539 :         if (isRedo)
                               1598                 :                :         {
                               1599                 :                :             ForkNumber  fork;
                               1600                 :                : 
                               1601         [ +  + ]:          37575 :             for (fork = 0; fork <= MAX_FORKNUM; fork++)
                               1602                 :          30060 :                 XLogDropRelation(delrels[i], fork);
                               1603                 :                :         }
                               1604                 :           7539 :         srels[i] = srel;
                               1605                 :                :     }
                               1606                 :                : 
                               1607                 :           2620 :     smgrdounlinkall(srels, ndelrels, isRedo);
                               1608                 :                : 
 2355 tomas.vondra@postgre     1609         [ +  + ]:          10159 :     for (i = 0; i < ndelrels; i++)
 2620 fujii@postgresql.org     1610                 :           7539 :         smgrclose(srels[i]);
                               1611                 :           2620 :     pfree(srels);
                               1612                 :           2620 : }
                               1613                 :                : 
                               1614                 :                : 
                               1615                 :                : /*
                               1616                 :                :  * _fdvec_resize() -- Resize the fork's open segments array
                               1617                 :                :  */
                               1618                 :                : static void
 3285 andres@anarazel.de       1619                 :        1378544 : _fdvec_resize(SMgrRelation reln,
                               1620                 :                :               ForkNumber forknum,
                               1621                 :                :               int nseg)
                               1622                 :                : {
                               1623         [ +  + ]:        1378544 :     if (nseg == 0)
                               1624                 :                :     {
                               1625         [ +  - ]:         509842 :         if (reln->md_num_open_segs[forknum] > 0)
                               1626                 :                :         {
                               1627                 :         509842 :             pfree(reln->md_seg_fds[forknum]);
                               1628                 :         509842 :             reln->md_seg_fds[forknum] = NULL;
                               1629                 :                :         }
                               1630                 :                :     }
                               1631         [ +  - ]:         868702 :     else if (reln->md_num_open_segs[forknum] == 0)
                               1632                 :                :     {
                               1633                 :         868702 :         reln->md_seg_fds[forknum] =
                               1634                 :         868702 :             MemoryContextAlloc(MdCxt, sizeof(MdfdVec) * nseg);
                               1635                 :                :     }
  260 tmunro@postgresql.or     1636         [ #  # ]:UBC           0 :     else if (nseg > reln->md_num_open_segs[forknum])
                               1637                 :                :     {
                               1638                 :                :         /*
                               1639                 :                :          * It doesn't seem worthwhile complicating the code to amortize
                               1640                 :                :          * repalloc() calls.  Those are far faster than PathNameOpenFile() or
                               1641                 :                :          * FileClose(), and the memory context internally will sometimes avoid
                               1642                 :                :          * doing an actual reallocation.
                               1643                 :                :          */
 3285 andres@anarazel.de       1644                 :              0 :         reln->md_seg_fds[forknum] =
                               1645                 :              0 :             repalloc(reln->md_seg_fds[forknum],
                               1646                 :                :                      sizeof(MdfdVec) * nseg);
                               1647                 :                :     }
                               1648                 :                :     else
                               1649                 :                :     {
                               1650                 :                :         /*
                               1651                 :                :          * We don't reallocate a smaller array, because we want mdtruncate()
                               1652                 :                :          * to be able to promise that it won't allocate memory, so that it is
                               1653                 :                :          * allowed in a critical section.  This means that a bit of space in
                               1654                 :                :          * the array is now wasted, until the next time we add a segment and
                               1655                 :                :          * reallocate.
                               1656                 :                :          */
                               1657                 :                :     }
                               1658                 :                : 
 3285 andres@anarazel.de       1659                 :CBC     1378544 :     reln->md_num_open_segs[forknum] = nseg;
10334 vadim4o@yahoo.com        1660                 :        1378544 : }
                               1661                 :                : 
                               1662                 :                : /*
                               1663                 :                :  * Return the filename for the specified segment of the relation. The
                               1664                 :                :  * returned string is palloc'd.
                               1665                 :                :  */
                               1666                 :                : static MdPathStr
 5876 heikki.linnakangas@i     1667                 :          24688 : _mdfd_segpath(SMgrRelation reln, ForkNumber forknum, BlockNumber segno)
                               1668                 :                : {
                               1669                 :                :     RelPathStr  path;
                               1670                 :                :     MdPathStr   fullpath;
                               1671                 :                : 
 1158 rhaas@postgresql.org     1672                 :          24688 :     path = relpath(reln->smgr_rlocator, forknum);
                               1673                 :                : 
10226 bruce@momjian.us         1674         [ +  - ]:          24688 :     if (segno > 0)
  193 andres@anarazel.de       1675                 :          24688 :         sprintf(fullpath.str, "%s.%u", path.str, segno);
                               1676                 :                :     else
  193 andres@anarazel.de       1677                 :UBC           0 :         strcpy(fullpath.str, path.str);
                               1678                 :                : 
 5876 heikki.linnakangas@i     1679                 :CBC       24688 :     return fullpath;
                               1680                 :                : }
                               1681                 :                : 
                               1682                 :                : /*
                               1683                 :                :  * Open the specified segment of the relation,
                               1684                 :                :  * and make a MdfdVec object for it.  Returns NULL on failure.
                               1685                 :                :  */
                               1686                 :                : static MdfdVec *
                               1687                 :          24676 : _mdfd_openseg(SMgrRelation reln, ForkNumber forknum, BlockNumber segno,
                               1688                 :                :               int oflags)
                               1689                 :                : {
                               1690                 :                :     MdfdVec    *v;
                               1691                 :                :     File        fd;
                               1692                 :                :     MdPathStr   fullpath;
                               1693                 :                : 
                               1694                 :          24676 :     fullpath = _mdfd_segpath(reln, forknum, segno);
                               1695                 :                : 
                               1696                 :                :     /* open the file */
  193 andres@anarazel.de       1697                 :          24676 :     fd = PathNameOpenFile(fullpath.str, _mdfd_open_flags() | oflags);
                               1698                 :                : 
10226 bruce@momjian.us         1699         [ +  - ]:          24676 :     if (fd < 0)
 7913 neilc@samurai.com        1700                 :          24676 :         return NULL;
                               1701                 :                : 
                               1702                 :                :     /*
                               1703                 :                :      * Segments are always opened in order from lowest to highest, so we must
                               1704                 :                :      * be adding a new one at the end.
                               1705                 :                :      */
 2049 tmunro@postgresql.or     1706         [ #  # ]:UBC           0 :     Assert(segno == reln->md_num_open_segs[forknum]);
                               1707                 :                : 
                               1708                 :              0 :     _fdvec_resize(reln, forknum, segno + 1);
                               1709                 :                : 
                               1710                 :                :     /* fill the entry */
 3285 andres@anarazel.de       1711                 :              0 :     v = &reln->md_seg_fds[forknum][segno];
10226 bruce@momjian.us         1712                 :              0 :     v->mdfd_vfd = fd;
 7768 tgl@sss.pgh.pa.us        1713                 :              0 :     v->mdfd_segno = segno;
                               1714                 :                : 
 6235 heikki.linnakangas@i     1715         [ #  # ]:              0 :     Assert(_mdnblocks(reln, forknum, v) <= ((BlockNumber) RELSEG_SIZE));
                               1716                 :                : 
                               1717                 :                :     /* all done */
 9867 bruce@momjian.us         1718                 :              0 :     return v;
                               1719                 :                : }
                               1720                 :                : 
                               1721                 :                : /*
                               1722                 :                :  * _mdfd_getseg() -- Find the segment of the relation holding the
                               1723                 :                :  *                   specified block.
                               1724                 :                :  *
                               1725                 :                :  * If the segment doesn't exist, we ereport, return NULL, or create the
                               1726                 :                :  * segment, according to "behavior".  Note: skipFsync is only used in the
                               1727                 :                :  * EXTENSION_CREATE case.
                               1728                 :                :  */
                               1729                 :                : static MdfdVec *
 6235 heikki.linnakangas@i     1730                 :CBC     2585204 : _mdfd_getseg(SMgrRelation reln, ForkNumber forknum, BlockNumber blkno,
                               1731                 :                :              bool skipFsync, int behavior)
                               1732                 :                : {
                               1733                 :                :     MdfdVec    *v;
                               1734                 :                :     BlockNumber targetseg;
                               1735                 :                :     BlockNumber nextsegno;
                               1736                 :                : 
                               1737                 :                :     /* some way to handle non-existent segments needs to be specified */
 3412 andres@anarazel.de       1738         [ -  + ]:        2585204 :     Assert(behavior &
                               1739                 :                :            (EXTENSION_FAIL | EXTENSION_CREATE | EXTENSION_RETURN_NULL |
                               1740                 :                :             EXTENSION_DONT_OPEN));
                               1741                 :                : 
 6821 tgl@sss.pgh.pa.us        1742                 :        2585204 :     targetseg = blkno / ((BlockNumber) RELSEG_SIZE);
                               1743                 :                : 
                               1744                 :                :     /* if an existing and opened segment, we're done */
 3285 andres@anarazel.de       1745         [ +  + ]:        2585204 :     if (targetseg < reln->md_num_open_segs[forknum])
                               1746                 :                :     {
                               1747                 :        2363147 :         v = &reln->md_seg_fds[forknum][targetseg];
                               1748                 :        2363147 :         return v;
                               1749                 :                :     }
                               1750                 :                : 
                               1751                 :                :     /* The caller only wants the segment if we already had it open. */
 1218 tmunro@postgresql.or     1752         [ -  + ]:         222057 :     if (behavior & EXTENSION_DONT_OPEN)
 1218 tmunro@postgresql.or     1753                 :UBC           0 :         return NULL;
                               1754                 :                : 
                               1755                 :                :     /*
                               1756                 :                :      * The target segment is not yet open. Iterate over all the segments
                               1757                 :                :      * between the last opened and the target segment. This way missing
                               1758                 :                :      * segments either raise an error, or get created (according to
                               1759                 :                :      * 'behavior'). Start with either the last opened, or the first segment if
                               1760                 :                :      * none was opened before.
                               1761                 :                :      */
 3285 andres@anarazel.de       1762         [ +  + ]:CBC      222057 :     if (reln->md_num_open_segs[forknum] > 0)
                               1763                 :             12 :         v = &reln->md_seg_fds[forknum][reln->md_num_open_segs[forknum] - 1];
                               1764                 :                :     else
                               1765                 :                :     {
 2243 tmunro@postgresql.or     1766                 :         222045 :         v = mdopenfork(reln, forknum, behavior);
 3285 andres@anarazel.de       1767         [ -  + ]:         222042 :         if (!v)
 3285 andres@anarazel.de       1768                 :UBC           0 :             return NULL;        /* if behavior & EXTENSION_RETURN_NULL */
                               1769                 :                :     }
                               1770                 :                : 
 3285 andres@anarazel.de       1771                 :CBC      222054 :     for (nextsegno = reln->md_num_open_segs[forknum];
                               1772         [ +  + ]:         222054 :          nextsegno <= targetseg; nextsegno++)
                               1773                 :                :     {
                               1774                 :             12 :         BlockNumber nblocks = _mdnblocks(reln, forknum, v);
                               1775                 :             12 :         int         flags = 0;
                               1776                 :                : 
                               1777         [ -  + ]:             12 :         Assert(nextsegno == v->mdfd_segno + 1);
                               1778                 :                : 
                               1779         [ -  + ]:             12 :         if (nblocks > ((BlockNumber) RELSEG_SIZE))
 3285 andres@anarazel.de       1780         [ #  # ]:UBC           0 :             elog(FATAL, "segment too big");
                               1781                 :                : 
 3285 andres@anarazel.de       1782         [ +  - ]:CBC          12 :         if ((behavior & EXTENSION_CREATE) ||
                               1783   [ -  +  -  - ]:             12 :             (InRecovery && (behavior & EXTENSION_CREATE_RECOVERY)))
                               1784                 :                :         {
                               1785                 :                :             /*
                               1786                 :                :              * Normally we will create new segments only if authorized by the
                               1787                 :                :              * caller (i.e., we are doing mdextend()).  But when doing WAL
                               1788                 :                :              * recovery, create segments anyway; this allows cases such as
                               1789                 :                :              * replaying WAL data that has a write into a high-numbered
                               1790                 :                :              * segment of a relation that was later deleted. We want to go
                               1791                 :                :              * ahead and create the segments so we can finish out the replay.
                               1792                 :                :              *
                               1793                 :                :              * We have to maintain the invariant that segments before the last
                               1794                 :                :              * active segment are of size RELSEG_SIZE; therefore, if
                               1795                 :                :              * extending, pad them out with zeroes if needed.  (This only
                               1796                 :                :              * matters if in recovery, or if the caller is extending the
                               1797                 :                :              * relation discontiguously, but that can happen in hash indexes.)
                               1798                 :                :              */
 3285 andres@anarazel.de       1799         [ #  # ]:UBC           0 :             if (nblocks < ((BlockNumber) RELSEG_SIZE))
                               1800                 :                :             {
  882 tmunro@postgresql.or     1801                 :              0 :                 char       *zerobuf = palloc_aligned(BLCKSZ, PG_IO_ALIGN_SIZE,
                               1802                 :                :                                                      MCXT_ALLOC_ZERO);
                               1803                 :                : 
 3285 andres@anarazel.de       1804                 :              0 :                 mdextend(reln, forknum,
                               1805                 :              0 :                          nextsegno * ((BlockNumber) RELSEG_SIZE) - 1,
                               1806                 :                :                          zerobuf, skipFsync);
                               1807                 :              0 :                 pfree(zerobuf);
                               1808                 :                :             }
                               1809                 :              0 :             flags = O_CREAT;
                               1810                 :                :         }
  266 tmunro@postgresql.or     1811         [ +  - ]:CBC          12 :         else if (nblocks < ((BlockNumber) RELSEG_SIZE))
                               1812                 :                :         {
                               1813                 :                :             /*
                               1814                 :                :              * When not extending, only open the next segment if the current
                               1815                 :                :              * one is exactly RELSEG_SIZE.  If not (this branch), either
                               1816                 :                :              * return NULL or fail.
                               1817                 :                :              */
 3285 andres@anarazel.de       1818         [ -  + ]:             12 :             if (behavior & EXTENSION_RETURN_NULL)
                               1819                 :                :             {
                               1820                 :                :                 /*
                               1821                 :                :                  * Some callers discern between reasons for _mdfd_getseg()
                               1822                 :                :                  * returning NULL based on errno. As there's no failing
                               1823                 :                :                  * syscall involved in this case, explicitly set errno to
                               1824                 :                :                  * ENOENT, as that seems the closest interpretation.
                               1825                 :                :                  */
 3285 andres@anarazel.de       1826                 :UBC           0 :                 errno = ENOENT;
                               1827                 :              0 :                 return NULL;
                               1828                 :                :             }
                               1829                 :                : 
 3285 andres@anarazel.de       1830         [ +  - ]:CBC          12 :             ereport(ERROR,
                               1831                 :                :                     (errcode_for_file_access(),
                               1832                 :                :                      errmsg("could not open file \"%s\" (target block %u): previous segment is only %u blocks",
                               1833                 :                :                             _mdfd_segpath(reln, forknum, nextsegno).str,
                               1834                 :                :                             blkno, nblocks)));
                               1835                 :                :         }
                               1836                 :                : 
 3285 andres@anarazel.de       1837                 :UBC           0 :         v = _mdfd_openseg(reln, forknum, nextsegno, flags);
                               1838                 :                : 
                               1839         [ #  # ]:              0 :         if (v == NULL)
                               1840                 :                :         {
                               1841         [ #  # ]:              0 :             if ((behavior & EXTENSION_RETURN_NULL) &&
                               1842         [ #  # ]:              0 :                 FILE_POSSIBLY_DELETED(errno))
                               1843                 :              0 :                 return NULL;
                               1844         [ #  # ]:              0 :             ereport(ERROR,
                               1845                 :                :                     (errcode_for_file_access(),
                               1846                 :                :                      errmsg("could not open file \"%s\" (target block %u): %m",
                               1847                 :                :                             _mdfd_segpath(reln, forknum, nextsegno).str,
                               1848                 :                :                             blkno)));
                               1849                 :                :         }
                               1850                 :                :     }
                               1851                 :                : 
 9867 bruce@momjian.us         1852                 :CBC      222042 :     return v;
                               1853                 :                : }
                               1854                 :                : 
                               1855                 :                : /*
                               1856                 :                :  * Get number of blocks present in a single disk file
                               1857                 :                :  */
                               1858                 :                : static BlockNumber
 6235 heikki.linnakangas@i     1859                 :        3163785 : _mdnblocks(SMgrRelation reln, ForkNumber forknum, MdfdVec *seg)
                               1860                 :                : {
                               1861                 :                :     off_t       len;
                               1862                 :                : 
 2495 tmunro@postgresql.or     1863                 :        3163785 :     len = FileSize(seg->mdfd_vfd);
 9278 bruce@momjian.us         1864         [ -  + ]:        3163785 :     if (len < 0)
 6821 tgl@sss.pgh.pa.us        1865         [ #  # ]:UBC           0 :         ereport(ERROR,
                               1866                 :                :                 (errcode_for_file_access(),
                               1867                 :                :                  errmsg("could not seek to end of file \"%s\": %m",
                               1868                 :                :                         FilePathName(seg->mdfd_vfd))));
                               1869                 :                :     /* note that this calculation will ignore any partial block at EOF */
 6821 tgl@sss.pgh.pa.us        1870                 :CBC     3163785 :     return (BlockNumber) (len / BLCKSZ);
                               1871                 :                : }
                               1872                 :                : 
                               1873                 :                : /*
                               1874                 :                :  * Sync a file to disk, given a file tag.  Write the path into an output
                               1875                 :                :  * buffer so the caller can use it in error messages.
                               1876                 :                :  *
                               1877                 :                :  * Return 0 on success, -1 on failure, with errno set.
                               1878                 :                :  */
                               1879                 :                : int
 2347 tmunro@postgresql.or     1880                 :UBC           0 : mdsyncfiletag(const FileTag *ftag, char *path)
                               1881                 :                : {
  552 heikki.linnakangas@i     1882                 :              0 :     SMgrRelation reln = smgropen(ftag->rlocator, INVALID_PROC_NUMBER);
                               1883                 :                :     File        file;
                               1884                 :                :     instr_time  io_start;
                               1885                 :                :     bool        need_to_close;
                               1886                 :                :     int         result,
                               1887                 :                :                 save_errno;
                               1888                 :                : 
                               1889                 :                :     /* See if we already have the file open, or need to open it. */
 2093 tmunro@postgresql.or     1890         [ #  # ]:              0 :     if (ftag->segno < reln->md_num_open_segs[ftag->forknum])
                               1891                 :                :     {
                               1892                 :              0 :         file = reln->md_seg_fds[ftag->forknum][ftag->segno].mdfd_vfd;
                               1893                 :              0 :         strlcpy(path, FilePathName(file), MAXPGPATH);
                               1894                 :              0 :         need_to_close = false;
                               1895                 :                :     }
                               1896                 :                :     else
                               1897                 :                :     {
                               1898                 :                :         MdPathStr   p;
                               1899                 :                : 
                               1900                 :              0 :         p = _mdfd_segpath(reln, ftag->forknum, ftag->segno);
  193 andres@anarazel.de       1901                 :              0 :         strlcpy(path, p.str, MD_PATH_STR_MAXLEN);
                               1902                 :                : 
  882 tmunro@postgresql.or     1903                 :              0 :         file = PathNameOpenFile(path, _mdfd_open_flags());
 2093                          1904         [ #  # ]:              0 :         if (file < 0)
                               1905                 :              0 :             return -1;
                               1906                 :              0 :         need_to_close = true;
                               1907                 :                :     }
                               1908                 :                : 
  192 michael@paquier.xyz      1909                 :              0 :     io_start = pgstat_prepare_io_time(track_io_timing);
                               1910                 :                : 
                               1911                 :                :     /* Sync the file. */
 2093 tmunro@postgresql.or     1912                 :              0 :     result = FileSync(file, WAIT_EVENT_DATA_FILE_SYNC);
                               1913                 :              0 :     save_errno = errno;
                               1914                 :                : 
                               1915         [ #  # ]:              0 :     if (need_to_close)
                               1916                 :              0 :         FileClose(file);
                               1917                 :                : 
  883 andres@anarazel.de       1918                 :              0 :     pgstat_count_io_op_time(IOOBJECT_RELATION, IOCONTEXT_NORMAL,
                               1919                 :                :                             IOOP_FSYNC, io_start, 1, 0);
                               1920                 :                : 
 2093 tmunro@postgresql.or     1921                 :              0 :     errno = save_errno;
                               1922                 :              0 :     return result;
                               1923                 :                : }
                               1924                 :                : 
                               1925                 :                : /*
                               1926                 :                :  * Unlink a file, given a file tag.  Write the path into an output
                               1927                 :                :  * buffer so the caller can use it in error messages.
                               1928                 :                :  *
                               1929                 :                :  * Return 0 on success, -1 on failure, with errno set.
                               1930                 :                :  */
                               1931                 :                : int
 2347 tmunro@postgresql.or     1932                 :CBC       32525 : mdunlinkfiletag(const FileTag *ftag, char *path)
                               1933                 :                : {
                               1934                 :                :     RelPathStr  p;
                               1935                 :                : 
                               1936                 :                :     /* Compute the path. */
 1158 rhaas@postgresql.org     1937                 :          32525 :     p = relpathperm(ftag->rlocator, MAIN_FORKNUM);
  193 andres@anarazel.de       1938                 :          32525 :     strlcpy(path, p.str, MAXPGPATH);
                               1939                 :                : 
                               1940                 :                :     /* Try to unlink the file. */
 2347 tmunro@postgresql.or     1941                 :          32525 :     return unlink(path);
                               1942                 :                : }
                               1943                 :                : 
                               1944                 :                : /*
                               1945                 :                :  * Check if a given candidate request matches a given tag, when processing
                               1946                 :                :  * a SYNC_FILTER_REQUEST request.  This will be called for all pending
                               1947                 :                :  * requests to find out whether to forget them.
                               1948                 :                :  */
                               1949                 :                : bool
                               1950                 :           7804 : mdfiletagmatches(const FileTag *ftag, const FileTag *candidate)
                               1951                 :                : {
                               1952                 :                :     /*
                               1953                 :                :      * For now we only use filter requests as a way to drop all scheduled
                               1954                 :                :      * callbacks relating to a given database, when dropping the database.
                               1955                 :                :      * We'll return true for all candidates that have the same database OID as
                               1956                 :                :      * the ftag from the SYNC_FILTER_REQUEST request, so they're forgotten.
                               1957                 :                :      */
 1158 rhaas@postgresql.org     1958                 :           7804 :     return ftag->rlocator.dbOid == candidate->rlocator.dbOid;
                               1959                 :                : }
                               1960                 :                : 
                               1961                 :                : /*
                               1962                 :                :  * AIO completion callback for mdstartreadv().
                               1963                 :                :  */
                               1964                 :                : static PgAioResult
  161 andres@anarazel.de       1965                 :        1142229 : md_readv_complete(PgAioHandle *ioh, PgAioResult prior_result, uint8 cb_data)
                               1966                 :                : {
                               1967                 :        1142229 :     PgAioTargetData *td = pgaio_io_get_target_data(ioh);
                               1968                 :        1142229 :     PgAioResult result = prior_result;
                               1969                 :                : 
                               1970         [ -  + ]:        1142229 :     if (prior_result.result < 0)
                               1971                 :                :     {
  161 andres@anarazel.de       1972                 :UBC           0 :         result.status = PGAIO_RS_ERROR;
                               1973                 :              0 :         result.id = PGAIO_HCB_MD_READV;
                               1974                 :                :         /* For "hard" errors, track the error number in error_data */
                               1975                 :              0 :         result.error_data = -prior_result.result;
                               1976                 :              0 :         result.result = 0;
                               1977                 :                : 
                               1978                 :                :         /*
                               1979                 :                :          * Immediately log a message about the IO error, but only to the
                               1980                 :                :          * server log. The reason to do so immediately is that the originator
                               1981                 :                :          * might not process the query result immediately (because it is busy
                               1982                 :                :          * doing another part of query processing) or at all (e.g. if it was
                               1983                 :                :          * cancelled or errored out due to another IO also failing).  The
                               1984                 :                :          * definer of the IO will emit an ERROR when processing the IO's
                               1985                 :                :          * results
                               1986                 :                :          */
                               1987                 :              0 :         pgaio_result_report(result, td, LOG_SERVER_ONLY);
                               1988                 :                : 
                               1989                 :              0 :         return result;
                               1990                 :                :     }
                               1991                 :                : 
                               1992                 :                :     /*
                               1993                 :                :      * As explained above smgrstartreadv(), the smgr API operates on the level
                               1994                 :                :      * of blocks, rather than bytes. Convert.
                               1995                 :                :      */
  161 andres@anarazel.de       1996                 :CBC     1142229 :     result.result /= BLCKSZ;
                               1997                 :                : 
                               1998         [ -  + ]:        1142229 :     Assert(result.result <= td->smgr.nblocks);
                               1999                 :                : 
                               2000         [ -  + ]:        1142229 :     if (result.result == 0)
                               2001                 :                :     {
                               2002                 :                :         /* consider 0 blocks read a failure */
  161 andres@anarazel.de       2003                 :UBC           0 :         result.status = PGAIO_RS_ERROR;
                               2004                 :              0 :         result.id = PGAIO_HCB_MD_READV;
                               2005                 :              0 :         result.error_data = 0;
                               2006                 :                : 
                               2007                 :                :         /* see comment above the "hard error" case */
                               2008                 :              0 :         pgaio_result_report(result, td, LOG_SERVER_ONLY);
                               2009                 :                : 
                               2010                 :              0 :         return result;
                               2011                 :                :     }
                               2012                 :                : 
  161 andres@anarazel.de       2013         [ +  - ]:CBC     1142229 :     if (result.status != PGAIO_RS_ERROR &&
                               2014         [ -  + ]:        1142229 :         result.result < td->smgr.nblocks)
                               2015                 :                :     {
                               2016                 :                :         /* partial reads should be retried at upper level */
  161 andres@anarazel.de       2017                 :UBC           0 :         result.status = PGAIO_RS_PARTIAL;
                               2018                 :              0 :         result.id = PGAIO_HCB_MD_READV;
                               2019                 :                :     }
                               2020                 :                : 
  161 andres@anarazel.de       2021                 :CBC     1142229 :     return result;
                               2022                 :                : }
                               2023                 :                : 
                               2024                 :                : /*
                               2025                 :                :  * AIO error reporting callback for mdstartreadv().
                               2026                 :                :  *
                               2027                 :                :  * Errors are encoded as follows:
                               2028                 :                :  * - PgAioResult.error_data != 0 encodes IO that failed with that errno
                               2029                 :                :  * - PgAioResult.error_data == 0 encodes IO that didn't read all data
                               2030                 :                :  */
                               2031                 :                : static void
  161 andres@anarazel.de       2032                 :UBC           0 : md_readv_report(PgAioResult result, const PgAioTargetData *td, int elevel)
                               2033                 :                : {
                               2034                 :                :     RelPathStr  path;
                               2035                 :                : 
                               2036         [ #  # ]:              0 :     path = relpathbackend(td->smgr.rlocator,
                               2037                 :                :                           td->smgr.is_temp ? MyProcNumber : INVALID_PROC_NUMBER,
                               2038                 :                :                           td->smgr.forkNum);
                               2039                 :                : 
                               2040         [ #  # ]:              0 :     if (result.error_data != 0)
                               2041                 :                :     {
                               2042                 :                :         /* for errcode_for_file_access() and %m */
                               2043                 :              0 :         errno = result.error_data;
                               2044                 :                : 
                               2045         [ #  # ]:              0 :         ereport(elevel,
                               2046                 :                :                 errcode_for_file_access(),
                               2047                 :                :                 errmsg("could not read blocks %u..%u in file \"%s\": %m",
                               2048                 :                :                        td->smgr.blockNum,
                               2049                 :                :                        td->smgr.blockNum + td->smgr.nblocks - 1,
                               2050                 :                :                        path.str));
                               2051                 :                :     }
                               2052                 :                :     else
                               2053                 :                :     {
                               2054                 :                :         /*
                               2055                 :                :          * NB: This will typically only be output in debug messages, while
                               2056                 :                :          * retrying a partial IO.
                               2057                 :                :          */
                               2058         [ #  # ]:              0 :         ereport(elevel,
                               2059                 :                :                 errcode(ERRCODE_DATA_CORRUPTED),
                               2060                 :                :                 errmsg("could not read blocks %u..%u in file \"%s\": read only %zu of %zu bytes",
                               2061                 :                :                        td->smgr.blockNum,
                               2062                 :                :                        td->smgr.blockNum + td->smgr.nblocks - 1,
                               2063                 :                :                        path.str,
                               2064                 :                :                        result.result * (size_t) BLCKSZ,
                               2065                 :                :                        td->smgr.nblocks * (size_t) BLCKSZ));
                               2066                 :                :     }
                               2067                 :              0 : }
        

Generated by: LCOV version 2.4-beta