Age Owner Branch data TLA Line data Source code
1 : : /*-------------------------------------------------------------------------
2 : : * worker.c
3 : : * PostgreSQL logical replication worker (apply)
4 : : *
5 : : * Copyright (c) 2016-2025, PostgreSQL Global Development Group
6 : : *
7 : : * IDENTIFICATION
8 : : * src/backend/replication/logical/worker.c
9 : : *
10 : : * NOTES
11 : : * This file contains the worker which applies logical changes as they come
12 : : * from remote logical replication stream.
13 : : *
14 : : * The main worker (apply) is started by logical replication worker
15 : : * launcher for every enabled subscription in a database. It uses
16 : : * walsender protocol to communicate with publisher.
17 : : *
18 : : * This module includes server facing code and shares libpqwalreceiver
19 : : * module with walreceiver for providing the libpq specific functionality.
20 : : *
21 : : *
22 : : * STREAMED TRANSACTIONS
23 : : * ---------------------
24 : : * Streamed transactions (large transactions exceeding a memory limit on the
25 : : * upstream) are applied using one of two approaches:
26 : : *
27 : : * 1) Write to temporary files and apply when the final commit arrives
28 : : *
29 : : * This approach is used when the user has set the subscription's streaming
30 : : * option as on.
31 : : *
32 : : * Unlike the regular (non-streamed) case, handling streamed transactions has
33 : : * to handle aborts of both the toplevel transaction and subtransactions. This
34 : : * is achieved by tracking offsets for subtransactions, which is then used
35 : : * to truncate the file with serialized changes.
36 : : *
37 : : * The files are placed in tmp file directory by default, and the filenames
38 : : * include both the XID of the toplevel transaction and OID of the
39 : : * subscription. This is necessary so that different workers processing a
40 : : * remote transaction with the same XID doesn't interfere.
41 : : *
42 : : * We use BufFiles instead of using normal temporary files because (a) the
43 : : * BufFile infrastructure supports temporary files that exceed the OS file size
44 : : * limit, (b) provides a way for automatic clean up on the error and (c) provides
45 : : * a way to survive these files across local transactions and allow to open and
46 : : * close at stream start and close. We decided to use FileSet
47 : : * infrastructure as without that it deletes the files on the closure of the
48 : : * file and if we decide to keep stream files open across the start/stop stream
49 : : * then it will consume a lot of memory (more than 8K for each BufFile and
50 : : * there could be multiple such BufFiles as the subscriber could receive
51 : : * multiple start/stop streams for different transactions before getting the
52 : : * commit). Moreover, if we don't use FileSet then we also need to invent
53 : : * a new way to pass filenames to BufFile APIs so that we are allowed to open
54 : : * the file we desired across multiple stream-open calls for the same
55 : : * transaction.
56 : : *
57 : : * 2) Parallel apply workers.
58 : : *
59 : : * This approach is used when the user has set the subscription's streaming
60 : : * option as parallel. See logical/applyparallelworker.c for information about
61 : : * this approach.
62 : : *
63 : : * TWO_PHASE TRANSACTIONS
64 : : * ----------------------
65 : : * Two phase transactions are replayed at prepare and then committed or
66 : : * rolled back at commit prepared and rollback prepared respectively. It is
67 : : * possible to have a prepared transaction that arrives at the apply worker
68 : : * when the tablesync is busy doing the initial copy. In this case, the apply
69 : : * worker skips all the prepared operations [e.g. inserts] while the tablesync
70 : : * is still busy (see the condition of should_apply_changes_for_rel). The
71 : : * tablesync worker might not get such a prepared transaction because say it
72 : : * was prior to the initial consistent point but might have got some later
73 : : * commits. Now, the tablesync worker will exit without doing anything for the
74 : : * prepared transaction skipped by the apply worker as the sync location for it
75 : : * will be already ahead of the apply worker's current location. This would lead
76 : : * to an "empty prepare", because later when the apply worker does the commit
77 : : * prepare, there is nothing in it (the inserts were skipped earlier).
78 : : *
79 : : * To avoid this, and similar prepare confusions the subscription's two_phase
80 : : * commit is enabled only after the initial sync is over. The two_phase option
81 : : * has been implemented as a tri-state with values DISABLED, PENDING, and
82 : : * ENABLED.
83 : : *
84 : : * Even if the user specifies they want a subscription with two_phase = on,
85 : : * internally it will start with a tri-state of PENDING which only becomes
86 : : * ENABLED after all tablesync initializations are completed - i.e. when all
87 : : * tablesync workers have reached their READY state. In other words, the value
88 : : * PENDING is only a temporary state for subscription start-up.
89 : : *
90 : : * Until the two_phase is properly available (ENABLED) the subscription will
91 : : * behave as if two_phase = off. When the apply worker detects that all
92 : : * tablesyncs have become READY (while the tri-state was PENDING) it will
93 : : * restart the apply worker process. This happens in
94 : : * process_syncing_tables_for_apply.
95 : : *
96 : : * When the (re-started) apply worker finds that all tablesyncs are READY for a
97 : : * two_phase tri-state of PENDING it start streaming messages with the
98 : : * two_phase option which in turn enables the decoding of two-phase commits at
99 : : * the publisher. Then, it updates the tri-state value from PENDING to ENABLED.
100 : : * Now, it is possible that during the time we have not enabled two_phase, the
101 : : * publisher (replication server) would have skipped some prepares but we
102 : : * ensure that such prepares are sent along with commit prepare, see
103 : : * ReorderBufferFinishPrepared.
104 : : *
105 : : * If the subscription has no tables then a two_phase tri-state PENDING is
106 : : * left unchanged. This lets the user still do an ALTER SUBSCRIPTION REFRESH
107 : : * PUBLICATION which might otherwise be disallowed (see below).
108 : : *
109 : : * If ever a user needs to be aware of the tri-state value, they can fetch it
110 : : * from the pg_subscription catalog (see column subtwophasestate).
111 : : *
112 : : * Finally, to avoid problems mentioned in previous paragraphs from any
113 : : * subsequent (not READY) tablesyncs (need to toggle two_phase option from 'on'
114 : : * to 'off' and then again back to 'on') there is a restriction for
115 : : * ALTER SUBSCRIPTION REFRESH PUBLICATION. This command is not permitted when
116 : : * the two_phase tri-state is ENABLED, except when copy_data = false.
117 : : *
118 : : * We can get prepare of the same GID more than once for the genuine cases
119 : : * where we have defined multiple subscriptions for publications on the same
120 : : * server and prepared transaction has operations on tables subscribed to those
121 : : * subscriptions. For such cases, if we use the GID sent by publisher one of
122 : : * the prepares will be successful and others will fail, in which case the
123 : : * server will send them again. Now, this can lead to a deadlock if user has
124 : : * set synchronous_standby_names for all the subscriptions on subscriber. To
125 : : * avoid such deadlocks, we generate a unique GID (consisting of the
126 : : * subscription oid and the xid of the prepared transaction) for each prepare
127 : : * transaction on the subscriber.
128 : : *
129 : : * FAILOVER
130 : : * ----------------------
131 : : * The logical slot on the primary can be synced to the standby by specifying
132 : : * failover = true when creating the subscription. Enabling failover allows us
133 : : * to smoothly transition to the promoted standby, ensuring that we can
134 : : * subscribe to the new primary without losing any data.
135 : : *
136 : : * RETAIN DEAD TUPLES
137 : : * ----------------------
138 : : * Each apply worker that enabled retain_dead_tuples option maintains a
139 : : * non-removable transaction ID (oldest_nonremovable_xid) in shared memory to
140 : : * prevent dead rows from being removed prematurely when the apply worker still
141 : : * needs them to detect update_deleted conflicts. Additionally, this helps to
142 : : * retain the required commit_ts module information, which further helps to
143 : : * detect update_origin_differs and delete_origin_differs conflicts reliably, as
144 : : * otherwise, vacuum freeze could remove the required information.
145 : : *
146 : : * The logical replication launcher manages an internal replication slot named
147 : : * "pg_conflict_detection". It asynchronously aggregates the non-removable
148 : : * transaction ID from all apply workers to determine the appropriate xmin for
149 : : * the slot, thereby retaining necessary tuples.
150 : : *
151 : : * The non-removable transaction ID in the apply worker is advanced to the
152 : : * oldest running transaction ID once all concurrent transactions on the
153 : : * publisher have been applied and flushed locally. The process involves:
154 : : *
155 : : * - RDT_GET_CANDIDATE_XID:
156 : : * Call GetOldestActiveTransactionId() to take oldestRunningXid as the
157 : : * candidate xid.
158 : : *
159 : : * - RDT_REQUEST_PUBLISHER_STATUS:
160 : : * Send a message to the walsender requesting the publisher status, which
161 : : * includes the latest WAL write position and information about transactions
162 : : * that are in the commit phase.
163 : : *
164 : : * - RDT_WAIT_FOR_PUBLISHER_STATUS:
165 : : * Wait for the status from the walsender. After receiving the first status,
166 : : * do not proceed if there are concurrent remote transactions that are still
167 : : * in the commit phase. These transactions might have been assigned an
168 : : * earlier commit timestamp but have not yet written the commit WAL record.
169 : : * Continue to request the publisher status (RDT_REQUEST_PUBLISHER_STATUS)
170 : : * until all these transactions have completed.
171 : : *
172 : : * - RDT_WAIT_FOR_LOCAL_FLUSH:
173 : : * Advance the non-removable transaction ID if the current flush location has
174 : : * reached or surpassed the last received WAL position.
175 : : *
176 : : * - RDT_STOP_CONFLICT_INFO_RETENTION:
177 : : * This phase is required only when max_retention_duration is defined. We
178 : : * enter this phase if the wait time in either the
179 : : * RDT_WAIT_FOR_PUBLISHER_STATUS or RDT_WAIT_FOR_LOCAL_FLUSH phase exceeds
180 : : * configured max_retention_duration. In this phase,
181 : : * pg_subscription.subretentionactive is updated to false within a new
182 : : * transaction, and oldest_nonremovable_xid is set to InvalidTransactionId.
183 : : *
184 : : * The overall state progression is: GET_CANDIDATE_XID ->
185 : : * REQUEST_PUBLISHER_STATUS -> WAIT_FOR_PUBLISHER_STATUS -> (loop to
186 : : * REQUEST_PUBLISHER_STATUS till concurrent remote transactions end) ->
187 : : * WAIT_FOR_LOCAL_FLUSH -> loop back to GET_CANDIDATE_XID.
188 : : *
189 : : * Retaining the dead tuples for this period is sufficient for ensuring
190 : : * eventual consistency using last-update-wins strategy, as dead tuples are
191 : : * useful for detecting conflicts only during the application of concurrent
192 : : * transactions from remote nodes. After applying and flushing all remote
193 : : * transactions that occurred concurrently with the tuple DELETE, any
194 : : * subsequent UPDATE from a remote node should have a later timestamp. In such
195 : : * cases, it is acceptable to detect an update_missing scenario and convert the
196 : : * UPDATE to an INSERT when applying it. But, for concurrent remote
197 : : * transactions with earlier timestamps than the DELETE, detecting
198 : : * update_deleted is necessary, as the UPDATEs in remote transactions should be
199 : : * ignored if their timestamp is earlier than that of the dead tuples.
200 : : *
201 : : * Note that advancing the non-removable transaction ID is not supported if the
202 : : * publisher is also a physical standby. This is because the logical walsender
203 : : * on the standby can only get the WAL replay position but there may be more
204 : : * WALs that are being replicated from the primary and those WALs could have
205 : : * earlier commit timestamp.
206 : : *
207 : : * Similarly, when the publisher has subscribed to another publisher,
208 : : * information necessary for conflict detection cannot be retained for
209 : : * changes from origins other than the publisher. This is because publisher
210 : : * lacks the information on concurrent transactions of other publishers to
211 : : * which it subscribes. As the information on concurrent transactions is
212 : : * unavailable beyond subscriber's immediate publishers, the non-removable
213 : : * transaction ID might be advanced prematurely before changes from other
214 : : * origins have been fully applied.
215 : : *
216 : : * XXX Retaining information for changes from other origins might be possible
217 : : * by requesting the subscription on that origin to enable retain_dead_tuples
218 : : * and fetching the conflict detection slot.xmin along with the publisher's
219 : : * status. In the RDT_WAIT_FOR_PUBLISHER_STATUS phase, the apply worker could
220 : : * wait for the remote slot's xmin to reach the oldest active transaction ID,
221 : : * ensuring that all transactions from other origins have been applied on the
222 : : * publisher, thereby getting the latest WAL position that includes all
223 : : * concurrent changes. However, this approach may impact performance, so it
224 : : * might not worth the effort.
225 : : *
226 : : * XXX It seems feasible to get the latest commit's WAL location from the
227 : : * publisher and wait till that is applied. However, we can't do that
228 : : * because commit timestamps can regress as a commit with a later LSN is not
229 : : * guaranteed to have a later timestamp than those with earlier LSNs. Having
230 : : * said that, even if that is possible, it won't improve performance much as
231 : : * the apply always lag and moves slowly as compared with the transactions
232 : : * on the publisher.
233 : : *-------------------------------------------------------------------------
234 : : */
235 : :
236 : : #include "postgres.h"
237 : :
238 : : #include <sys/stat.h>
239 : : #include <unistd.h>
240 : :
241 : : #include "access/commit_ts.h"
242 : : #include "access/table.h"
243 : : #include "access/tableam.h"
244 : : #include "access/twophase.h"
245 : : #include "access/xact.h"
246 : : #include "catalog/indexing.h"
247 : : #include "catalog/pg_inherits.h"
248 : : #include "catalog/pg_subscription.h"
249 : : #include "catalog/pg_subscription_rel.h"
250 : : #include "commands/subscriptioncmds.h"
251 : : #include "commands/tablecmds.h"
252 : : #include "commands/trigger.h"
253 : : #include "executor/executor.h"
254 : : #include "executor/execPartition.h"
255 : : #include "libpq/pqformat.h"
256 : : #include "miscadmin.h"
257 : : #include "optimizer/optimizer.h"
258 : : #include "parser/parse_relation.h"
259 : : #include "pgstat.h"
260 : : #include "postmaster/bgworker.h"
261 : : #include "postmaster/interrupt.h"
262 : : #include "postmaster/walwriter.h"
263 : : #include "replication/conflict.h"
264 : : #include "replication/logicallauncher.h"
265 : : #include "replication/logicalproto.h"
266 : : #include "replication/logicalrelation.h"
267 : : #include "replication/logicalworker.h"
268 : : #include "replication/origin.h"
269 : : #include "replication/slot.h"
270 : : #include "replication/walreceiver.h"
271 : : #include "replication/worker_internal.h"
272 : : #include "rewrite/rewriteHandler.h"
273 : : #include "storage/buffile.h"
274 : : #include "storage/ipc.h"
275 : : #include "storage/lmgr.h"
276 : : #include "storage/procarray.h"
277 : : #include "tcop/tcopprot.h"
278 : : #include "utils/acl.h"
279 : : #include "utils/dynahash.h"
280 : : #include "utils/guc.h"
281 : : #include "utils/inval.h"
282 : : #include "utils/lsyscache.h"
283 : : #include "utils/memutils.h"
284 : : #include "utils/pg_lsn.h"
285 : : #include "utils/rel.h"
286 : : #include "utils/rls.h"
287 : : #include "utils/snapmgr.h"
288 : : #include "utils/syscache.h"
289 : : #include "utils/usercontext.h"
290 : :
291 : : #define NAPTIME_PER_CYCLE 1000 /* max sleep time between cycles (1s) */
292 : :
293 : : typedef struct FlushPosition
294 : : {
295 : : dlist_node node;
296 : : XLogRecPtr local_end;
297 : : XLogRecPtr remote_end;
298 : : } FlushPosition;
299 : :
300 : : static dlist_head lsn_mapping = DLIST_STATIC_INIT(lsn_mapping);
301 : :
302 : : typedef struct ApplyExecutionData
303 : : {
304 : : EState *estate; /* executor state, used to track resources */
305 : :
306 : : LogicalRepRelMapEntry *targetRel; /* replication target rel */
307 : : ResultRelInfo *targetRelInfo; /* ResultRelInfo for same */
308 : :
309 : : /* These fields are used when the target relation is partitioned: */
310 : : ModifyTableState *mtstate; /* dummy ModifyTable state */
311 : : PartitionTupleRouting *proute; /* partition routing info */
312 : : } ApplyExecutionData;
313 : :
314 : : /* Struct for saving and restoring apply errcontext information */
315 : : typedef struct ApplyErrorCallbackArg
316 : : {
317 : : LogicalRepMsgType command; /* 0 if invalid */
318 : : LogicalRepRelMapEntry *rel;
319 : :
320 : : /* Remote node information */
321 : : int remote_attnum; /* -1 if invalid */
322 : : TransactionId remote_xid;
323 : : XLogRecPtr finish_lsn;
324 : : char *origin_name;
325 : : } ApplyErrorCallbackArg;
326 : :
327 : : /*
328 : : * The action to be taken for the changes in the transaction.
329 : : *
330 : : * TRANS_LEADER_APPLY:
331 : : * This action means that we are in the leader apply worker or table sync
332 : : * worker. The changes of the transaction are either directly applied or
333 : : * are read from temporary files (for streaming transactions) and then
334 : : * applied by the worker.
335 : : *
336 : : * TRANS_LEADER_SERIALIZE:
337 : : * This action means that we are in the leader apply worker or table sync
338 : : * worker. Changes are written to temporary files and then applied when the
339 : : * final commit arrives.
340 : : *
341 : : * TRANS_LEADER_SEND_TO_PARALLEL:
342 : : * This action means that we are in the leader apply worker and need to send
343 : : * the changes to the parallel apply worker.
344 : : *
345 : : * TRANS_LEADER_PARTIAL_SERIALIZE:
346 : : * This action means that we are in the leader apply worker and have sent some
347 : : * changes directly to the parallel apply worker and the remaining changes are
348 : : * serialized to a file, due to timeout while sending data. The parallel apply
349 : : * worker will apply these serialized changes when the final commit arrives.
350 : : *
351 : : * We can't use TRANS_LEADER_SERIALIZE for this case because, in addition to
352 : : * serializing changes, the leader worker also needs to serialize the
353 : : * STREAM_XXX message to a file, and wait for the parallel apply worker to
354 : : * finish the transaction when processing the transaction finish command. So
355 : : * this new action was introduced to keep the code and logic clear.
356 : : *
357 : : * TRANS_PARALLEL_APPLY:
358 : : * This action means that we are in the parallel apply worker and changes of
359 : : * the transaction are applied directly by the worker.
360 : : */
361 : : typedef enum
362 : : {
363 : : /* The action for non-streaming transactions. */
364 : : TRANS_LEADER_APPLY,
365 : :
366 : : /* Actions for streaming transactions. */
367 : : TRANS_LEADER_SERIALIZE,
368 : : TRANS_LEADER_SEND_TO_PARALLEL,
369 : : TRANS_LEADER_PARTIAL_SERIALIZE,
370 : : TRANS_PARALLEL_APPLY,
371 : : } TransApplyAction;
372 : :
373 : : /*
374 : : * The phases involved in advancing the non-removable transaction ID.
375 : : *
376 : : * See comments atop worker.c for details of the transition between these
377 : : * phases.
378 : : */
379 : : typedef enum
380 : : {
381 : : RDT_GET_CANDIDATE_XID,
382 : : RDT_REQUEST_PUBLISHER_STATUS,
383 : : RDT_WAIT_FOR_PUBLISHER_STATUS,
384 : : RDT_WAIT_FOR_LOCAL_FLUSH,
385 : : RDT_STOP_CONFLICT_INFO_RETENTION
386 : : } RetainDeadTuplesPhase;
387 : :
388 : : /*
389 : : * Critical information for managing phase transitions within the
390 : : * RetainDeadTuplesPhase.
391 : : */
392 : : typedef struct RetainDeadTuplesData
393 : : {
394 : : RetainDeadTuplesPhase phase; /* current phase */
395 : : XLogRecPtr remote_lsn; /* WAL write position on the publisher */
396 : :
397 : : /*
398 : : * Oldest transaction ID that was in the commit phase on the publisher.
399 : : * Use FullTransactionId to prevent issues with transaction ID wraparound,
400 : : * where a new remote_oldestxid could falsely appear to originate from the
401 : : * past and block advancement.
402 : : */
403 : : FullTransactionId remote_oldestxid;
404 : :
405 : : /*
406 : : * Next transaction ID to be assigned on the publisher. Use
407 : : * FullTransactionId for consistency and to allow straightforward
408 : : * comparisons with remote_oldestxid.
409 : : */
410 : : FullTransactionId remote_nextxid;
411 : :
412 : : TimestampTz reply_time; /* when the publisher responds with status */
413 : :
414 : : /*
415 : : * Publisher transaction ID that must be awaited to complete before
416 : : * entering the final phase (RDT_WAIT_FOR_LOCAL_FLUSH). Use
417 : : * FullTransactionId for the same reason as remote_nextxid.
418 : : */
419 : : FullTransactionId remote_wait_for;
420 : :
421 : : TransactionId candidate_xid; /* candidate for the non-removable
422 : : * transaction ID */
423 : : TimestampTz flushpos_update_time; /* when the remote flush position was
424 : : * updated in final phase
425 : : * (RDT_WAIT_FOR_LOCAL_FLUSH) */
426 : :
427 : : long table_sync_wait_time; /* time spent waiting for table sync
428 : : * to finish */
429 : :
430 : : /*
431 : : * The following fields are used to determine the timing for the next
432 : : * round of transaction ID advancement.
433 : : */
434 : : TimestampTz last_recv_time; /* when the last message was received */
435 : : TimestampTz candidate_xid_time; /* when the candidate_xid is decided */
436 : : int xid_advance_interval; /* how much time (ms) to wait before
437 : : * attempting to advance the
438 : : * non-removable transaction ID */
439 : : } RetainDeadTuplesData;
440 : :
441 : : /*
442 : : * The minimum (100ms) and maximum (3 minutes) intervals for advancing
443 : : * non-removable transaction IDs. The maximum interval is a bit arbitrary but
444 : : * is sufficient to not cause any undue network traffic.
445 : : */
446 : : #define MIN_XID_ADVANCE_INTERVAL 100
447 : : #define MAX_XID_ADVANCE_INTERVAL 180000
448 : :
449 : : /* errcontext tracker */
450 : : static ApplyErrorCallbackArg apply_error_callback_arg =
451 : : {
452 : : .command = 0,
453 : : .rel = NULL,
454 : : .remote_attnum = -1,
455 : : .remote_xid = InvalidTransactionId,
456 : : .finish_lsn = InvalidXLogRecPtr,
457 : : .origin_name = NULL,
458 : : };
459 : :
460 : : ErrorContextCallback *apply_error_context_stack = NULL;
461 : :
462 : : MemoryContext ApplyMessageContext = NULL;
463 : : MemoryContext ApplyContext = NULL;
464 : :
465 : : /* per stream context for streaming transactions */
466 : : static MemoryContext LogicalStreamingContext = NULL;
467 : :
468 : : WalReceiverConn *LogRepWorkerWalRcvConn = NULL;
469 : :
470 : : Subscription *MySubscription = NULL;
471 : : static bool MySubscriptionValid = false;
472 : :
473 : : static List *on_commit_wakeup_workers_subids = NIL;
474 : :
475 : : bool in_remote_transaction = false;
476 : : static XLogRecPtr remote_final_lsn = InvalidXLogRecPtr;
477 : :
478 : : /* fields valid only when processing streamed transaction */
479 : : static bool in_streamed_transaction = false;
480 : :
481 : : static TransactionId stream_xid = InvalidTransactionId;
482 : :
483 : : /*
484 : : * The number of changes applied by parallel apply worker during one streaming
485 : : * block.
486 : : */
487 : : static uint32 parallel_stream_nchanges = 0;
488 : :
489 : : /* Are we initializing an apply worker? */
490 : : bool InitializingApplyWorker = false;
491 : :
492 : : /*
493 : : * We enable skipping all data modification changes (INSERT, UPDATE, etc.) for
494 : : * the subscription if the remote transaction's finish LSN matches the subskiplsn.
495 : : * Once we start skipping changes, we don't stop it until we skip all changes of
496 : : * the transaction even if pg_subscription is updated and MySubscription->skiplsn
497 : : * gets changed or reset during that. Also, in streaming transaction cases (streaming = on),
498 : : * we don't skip receiving and spooling the changes since we decide whether or not
499 : : * to skip applying the changes when starting to apply changes. The subskiplsn is
500 : : * cleared after successfully skipping the transaction or applying non-empty
501 : : * transaction. The latter prevents the mistakenly specified subskiplsn from
502 : : * being left. Note that we cannot skip the streaming transactions when using
503 : : * parallel apply workers because we cannot get the finish LSN before applying
504 : : * the changes. So, we don't start parallel apply worker when finish LSN is set
505 : : * by the user.
506 : : */
507 : : static XLogRecPtr skip_xact_finish_lsn = InvalidXLogRecPtr;
508 : : #define is_skipping_changes() (unlikely(!XLogRecPtrIsInvalid(skip_xact_finish_lsn)))
509 : :
510 : : /* BufFile handle of the current streaming file */
511 : : static BufFile *stream_fd = NULL;
512 : :
513 : : /*
514 : : * The remote WAL position that has been applied and flushed locally. We record
515 : : * and use this information both while sending feedback to the server and
516 : : * advancing oldest_nonremovable_xid.
517 : : */
518 : : static XLogRecPtr last_flushpos = InvalidXLogRecPtr;
519 : :
520 : : typedef struct SubXactInfo
521 : : {
522 : : TransactionId xid; /* XID of the subxact */
523 : : int fileno; /* file number in the buffile */
524 : : off_t offset; /* offset in the file */
525 : : } SubXactInfo;
526 : :
527 : : /* Sub-transaction data for the current streaming transaction */
528 : : typedef struct ApplySubXactData
529 : : {
530 : : uint32 nsubxacts; /* number of sub-transactions */
531 : : uint32 nsubxacts_max; /* current capacity of subxacts */
532 : : TransactionId subxact_last; /* xid of the last sub-transaction */
533 : : SubXactInfo *subxacts; /* sub-xact offset in changes file */
534 : : } ApplySubXactData;
535 : :
536 : : static ApplySubXactData subxact_data = {0, 0, InvalidTransactionId, NULL};
537 : :
538 : : static inline void subxact_filename(char *path, Oid subid, TransactionId xid);
539 : : static inline void changes_filename(char *path, Oid subid, TransactionId xid);
540 : :
541 : : /*
542 : : * Information about subtransactions of a given toplevel transaction.
543 : : */
544 : : static void subxact_info_write(Oid subid, TransactionId xid);
545 : : static void subxact_info_read(Oid subid, TransactionId xid);
546 : : static void subxact_info_add(TransactionId xid);
547 : : static inline void cleanup_subxact_info(void);
548 : :
549 : : /*
550 : : * Serialize and deserialize changes for a toplevel transaction.
551 : : */
552 : : static void stream_open_file(Oid subid, TransactionId xid,
553 : : bool first_segment);
554 : : static void stream_write_change(char action, StringInfo s);
555 : : static void stream_open_and_write_change(TransactionId xid, char action, StringInfo s);
556 : : static void stream_close_file(void);
557 : :
558 : : static void send_feedback(XLogRecPtr recvpos, bool force, bool requestReply);
559 : :
560 : : static void maybe_advance_nonremovable_xid(RetainDeadTuplesData *rdt_data,
561 : : bool status_received);
562 : : static bool can_advance_nonremovable_xid(RetainDeadTuplesData *rdt_data);
563 : : static void process_rdt_phase_transition(RetainDeadTuplesData *rdt_data,
564 : : bool status_received);
565 : : static void get_candidate_xid(RetainDeadTuplesData *rdt_data);
566 : : static void request_publisher_status(RetainDeadTuplesData *rdt_data);
567 : : static void wait_for_publisher_status(RetainDeadTuplesData *rdt_data,
568 : : bool status_received);
569 : : static void wait_for_local_flush(RetainDeadTuplesData *rdt_data);
570 : : static bool should_stop_conflict_info_retention(RetainDeadTuplesData *rdt_data);
571 : : static void stop_conflict_info_retention(RetainDeadTuplesData *rdt_data);
572 : : static void reset_retention_data_fields(RetainDeadTuplesData *rdt_data);
573 : : static void adjust_xid_advance_interval(RetainDeadTuplesData *rdt_data,
574 : : bool new_xid_found);
575 : :
576 : : static void apply_handle_commit_internal(LogicalRepCommitData *commit_data);
577 : : static void apply_handle_insert_internal(ApplyExecutionData *edata,
578 : : ResultRelInfo *relinfo,
579 : : TupleTableSlot *remoteslot);
580 : : static void apply_handle_update_internal(ApplyExecutionData *edata,
581 : : ResultRelInfo *relinfo,
582 : : TupleTableSlot *remoteslot,
583 : : LogicalRepTupleData *newtup,
584 : : Oid localindexoid);
585 : : static void apply_handle_delete_internal(ApplyExecutionData *edata,
586 : : ResultRelInfo *relinfo,
587 : : TupleTableSlot *remoteslot,
588 : : Oid localindexoid);
589 : : static bool FindReplTupleInLocalRel(ApplyExecutionData *edata, Relation localrel,
590 : : LogicalRepRelation *remoterel,
591 : : Oid localidxoid,
592 : : TupleTableSlot *remoteslot,
593 : : TupleTableSlot **localslot);
594 : : static bool FindDeletedTupleInLocalRel(Relation localrel,
595 : : Oid localidxoid,
596 : : TupleTableSlot *remoteslot,
597 : : TransactionId *delete_xid,
598 : : RepOriginId *delete_origin,
599 : : TimestampTz *delete_time);
600 : : static void apply_handle_tuple_routing(ApplyExecutionData *edata,
601 : : TupleTableSlot *remoteslot,
602 : : LogicalRepTupleData *newtup,
603 : : CmdType operation);
604 : :
605 : : /* Functions for skipping changes */
606 : : static void maybe_start_skipping_changes(XLogRecPtr finish_lsn);
607 : : static void stop_skipping_changes(void);
608 : : static void clear_subscription_skip_lsn(XLogRecPtr finish_lsn);
609 : :
610 : : /* Functions for apply error callback */
611 : : static inline void set_apply_error_context_xact(TransactionId xid, XLogRecPtr lsn);
612 : : static inline void reset_apply_error_context_info(void);
613 : :
614 : : static TransApplyAction get_transaction_apply_action(TransactionId xid,
615 : : ParallelApplyWorkerInfo **winfo);
616 : :
617 : : static void replorigin_reset(int code, Datum arg);
618 : :
619 : : /*
620 : : * Form the origin name for the subscription.
621 : : *
622 : : * This is a common function for tablesync and other workers. Tablesync workers
623 : : * must pass a valid relid. Other callers must pass relid = InvalidOid.
624 : : *
625 : : * Return the name in the supplied buffer.
626 : : */
627 : : void
1061 akapila@postgresql.o 628 :CBC 1300 : ReplicationOriginNameForLogicalRep(Oid suboid, Oid relid,
629 : : char *originname, Size szoriginname)
630 : : {
631 [ + + ]: 1300 : if (OidIsValid(relid))
632 : : {
633 : : /* Replication origin name for tablesync workers. */
634 : 746 : snprintf(originname, szoriginname, "pg_%u_%u", suboid, relid);
635 : : }
636 : : else
637 : : {
638 : : /* Replication origin name for non-tablesync workers. */
639 : 554 : snprintf(originname, szoriginname, "pg_%u", suboid);
640 : : }
641 : 1300 : }
642 : :
643 : : /*
644 : : * Should this worker apply changes for given relation.
645 : : *
646 : : * This is mainly needed for initial relation data sync as that runs in
647 : : * separate worker process running in parallel and we need some way to skip
648 : : * changes coming to the leader apply worker during the sync of a table.
649 : : *
650 : : * Note we need to do smaller or equals comparison for SYNCDONE state because
651 : : * it might hold position of end of initial slot consistent point WAL
652 : : * record + 1 (ie start of next record) and next record can be COMMIT of
653 : : * transaction we are now processing (which is what we set remote_final_lsn
654 : : * to in apply_handle_begin).
655 : : *
656 : : * Note that for streaming transactions that are being applied in the parallel
657 : : * apply worker, we disallow applying changes if the target table in the
658 : : * subscription is not in the READY state, because we cannot decide whether to
659 : : * apply the change as we won't know remote_final_lsn by that time.
660 : : *
661 : : * We already checked this in pa_can_start() before assigning the
662 : : * streaming transaction to the parallel worker, but it also needs to be
663 : : * checked here because if the user executes ALTER SUBSCRIPTION ... REFRESH
664 : : * PUBLICATION in parallel, the new table can be added to pg_subscription_rel
665 : : * while applying this transaction.
666 : : */
667 : : static bool
3089 peter_e@gmx.net 668 : 148168 : should_apply_changes_for_rel(LogicalRepRelMapEntry *rel)
669 : : {
746 akapila@postgresql.o 670 [ - + + - : 148168 : switch (MyLogicalRepWorker->type)
- ]
671 : : {
746 akapila@postgresql.o 672 :LBC (10) : case WORKERTYPE_TABLESYNC:
673 : (10) : return MyLogicalRepWorker->relid == rel->localreloid;
674 : :
746 akapila@postgresql.o 675 :CBC 68420 : case WORKERTYPE_PARALLEL_APPLY:
676 : : /* We don't synchronize rel's that are in unknown state. */
677 [ - + ]: 68420 : if (rel->state != SUBREL_STATE_READY &&
746 akapila@postgresql.o 678 [ # # ]:UBC 0 : rel->state != SUBREL_STATE_UNKNOWN)
679 [ # # ]: 0 : ereport(ERROR,
680 : : (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
681 : : errmsg("logical replication parallel apply worker for subscription \"%s\" will stop",
682 : : MySubscription->name),
683 : : errdetail("Cannot handle streamed replication transactions using parallel apply workers until all tables have been synchronized.")));
684 : :
746 akapila@postgresql.o 685 :CBC 68420 : return rel->state == SUBREL_STATE_READY;
686 : :
687 : 79748 : case WORKERTYPE_APPLY:
688 [ + + ]: 79813 : return (rel->state == SUBREL_STATE_READY ||
689 [ + + ]: 65 : (rel->state == SUBREL_STATE_SYNCDONE &&
690 [ + - ]: 16 : rel->statelsn <= remote_final_lsn));
691 : :
746 akapila@postgresql.o 692 :UBC 0 : case WORKERTYPE_UNKNOWN:
693 : : /* Should never happen. */
694 [ # # ]: 0 : elog(ERROR, "Unknown worker type");
695 : : }
696 : :
697 : 0 : return false; /* dummy for compiler */
698 : : }
699 : :
700 : : /*
701 : : * Begin one step (one INSERT, UPDATE, etc) of a replication transaction.
702 : : *
703 : : * Start a transaction, if this is the first step (else we keep using the
704 : : * existing transaction).
705 : : * Also provide a global snapshot and ensure we run in ApplyMessageContext.
706 : : */
707 : : static void
1549 tgl@sss.pgh.pa.us 708 :CBC 148625 : begin_replication_step(void)
709 : : {
710 : 148625 : SetCurrentStatementStartTimestamp();
711 : :
712 [ + + ]: 148625 : if (!IsTransactionState())
713 : : {
714 : 955 : StartTransactionCommand();
715 : 955 : maybe_reread_subscription();
716 : : }
717 : :
718 : 148622 : PushActiveSnapshot(GetTransactionSnapshot());
719 : :
3042 peter_e@gmx.net 720 : 148622 : MemoryContextSwitchTo(ApplyMessageContext);
1549 tgl@sss.pgh.pa.us 721 : 148622 : }
722 : :
723 : : /*
724 : : * Finish up one step of a replication transaction.
725 : : * Callers of begin_replication_step() must also call this.
726 : : *
727 : : * We don't close out the transaction here, but we should increment
728 : : * the command counter to make the effects of this step visible.
729 : : */
730 : : static void
731 : 148574 : end_replication_step(void)
732 : : {
733 : 148574 : PopActiveSnapshot();
734 : :
735 : 148574 : CommandCounterIncrement();
3152 peter_e@gmx.net 736 : 148574 : }
737 : :
738 : : /*
739 : : * Handle streamed transactions for both the leader apply worker and the
740 : : * parallel apply workers.
741 : : *
742 : : * In the streaming case (receiving a block of the streamed transaction), for
743 : : * serialize mode, simply redirect it to a file for the proper toplevel
744 : : * transaction, and for parallel mode, the leader apply worker will send the
745 : : * changes to parallel apply workers and the parallel apply worker will define
746 : : * savepoints if needed. (LOGICAL_REP_MSG_RELATION or LOGICAL_REP_MSG_TYPE
747 : : * messages will be applied by both leader apply worker and parallel apply
748 : : * workers).
749 : : *
750 : : * Returns true for streamed transactions (when the change is either serialized
751 : : * to file or sent to parallel apply worker), false otherwise (regular mode or
752 : : * needs to be processed by parallel apply worker).
753 : : *
754 : : * Exception: If the message being processed is LOGICAL_REP_MSG_RELATION
755 : : * or LOGICAL_REP_MSG_TYPE, return false even if the message needs to be sent
756 : : * to a parallel apply worker.
757 : : */
758 : : static bool
1745 akapila@postgresql.o 759 : 324492 : handle_streamed_transaction(LogicalRepMsgType action, StringInfo s)
760 : : {
761 : : TransactionId current_xid;
762 : : ParallelApplyWorkerInfo *winfo;
763 : : TransApplyAction apply_action;
764 : : StringInfoData original_msg;
765 : :
971 766 : 324492 : apply_action = get_transaction_apply_action(stream_xid, &winfo);
767 : :
768 : : /* not in streaming mode */
769 [ + + ]: 324492 : if (apply_action == TRANS_LEADER_APPLY)
1829 770 : 80139 : return false;
771 : :
772 [ - + ]: 244353 : Assert(TransactionIdIsValid(stream_xid));
773 : :
774 : : /*
775 : : * The parallel apply worker needs the xid in this message to decide
776 : : * whether to define a savepoint, so save the original message that has
777 : : * not moved the cursor after the xid. We will serialize this message to a
778 : : * file in PARTIAL_SERIALIZE mode.
779 : : */
971 780 : 244353 : original_msg = *s;
781 : :
782 : : /*
783 : : * We should have received XID of the subxact as the first part of the
784 : : * message, so extract it.
785 : : */
786 : 244353 : current_xid = pq_getmsgint(s, 4);
787 : :
788 [ - + ]: 244353 : if (!TransactionIdIsValid(current_xid))
1547 tgl@sss.pgh.pa.us 789 [ # # ]:UBC 0 : ereport(ERROR,
790 : : (errcode(ERRCODE_PROTOCOL_VIOLATION),
791 : : errmsg_internal("invalid transaction ID in streamed replication transaction")));
792 : :
971 akapila@postgresql.o 793 [ + + + + :CBC 244353 : switch (apply_action)
- ]
794 : : {
795 : 102513 : case TRANS_LEADER_SERIALIZE:
796 [ - + ]: 102513 : Assert(stream_fd);
797 : :
798 : : /* Add the new subxact to the array (unless already there). */
799 : 102513 : subxact_info_add(current_xid);
800 : :
801 : : /* Write the change to the current file */
802 : 102513 : stream_write_change(action, s);
803 : 102513 : return true;
804 : :
805 : 68386 : case TRANS_LEADER_SEND_TO_PARALLEL:
806 [ - + ]: 68386 : Assert(winfo);
807 : :
808 : : /*
809 : : * XXX The publisher side doesn't always send relation/type update
810 : : * messages after the streaming transaction, so also update the
811 : : * relation/type in leader apply worker. See function
812 : : * cleanup_rel_sync_cache.
813 : : */
814 [ + - ]: 68386 : if (pa_send_data(winfo, s->len, s->data))
815 [ + + + - ]: 68386 : return (action != LOGICAL_REP_MSG_RELATION &&
816 : : action != LOGICAL_REP_MSG_TYPE);
817 : :
818 : : /*
819 : : * Switch to serialize mode when we are not able to send the
820 : : * change to parallel apply worker.
821 : : */
971 akapila@postgresql.o 822 :UBC 0 : pa_switch_to_partial_serialize(winfo, false);
823 : :
824 : : /* fall through */
971 akapila@postgresql.o 825 :CBC 5006 : case TRANS_LEADER_PARTIAL_SERIALIZE:
826 : 5006 : stream_write_change(action, &original_msg);
827 : :
828 : : /* Same reason as TRANS_LEADER_SEND_TO_PARALLEL case. */
829 [ + + + - ]: 5006 : return (action != LOGICAL_REP_MSG_RELATION &&
830 : : action != LOGICAL_REP_MSG_TYPE);
831 : :
832 : 68448 : case TRANS_PARALLEL_APPLY:
833 : 68448 : parallel_stream_nchanges += 1;
834 : :
835 : : /* Define a savepoint for a subxact if needed. */
836 : 68448 : pa_start_subtrans(current_xid, stream_xid);
837 : 68448 : return false;
838 : :
971 akapila@postgresql.o 839 :UBC 0 : default:
866 msawada@postgresql.o 840 [ # # ]: 0 : elog(ERROR, "unexpected apply action: %d", (int) apply_action);
841 : : return false; /* silence compiler warning */
842 : : }
843 : : }
844 : :
845 : : /*
846 : : * Executor state preparation for evaluation of constraint expressions,
847 : : * indexes and triggers for the specified relation.
848 : : *
849 : : * Note that the caller must open and close any indexes to be updated.
850 : : */
851 : : static ApplyExecutionData *
1568 tgl@sss.pgh.pa.us 852 :CBC 148090 : create_edata_for_relation(LogicalRepRelMapEntry *rel)
853 : : {
854 : : ApplyExecutionData *edata;
855 : : EState *estate;
856 : : RangeTblEntry *rte;
915 857 : 148090 : List *perminfos = NIL;
858 : : ResultRelInfo *resultRelInfo;
859 : :
1568 860 : 148090 : edata = (ApplyExecutionData *) palloc0(sizeof(ApplyExecutionData));
861 : 148090 : edata->targetRel = rel;
862 : :
863 : 148090 : edata->estate = estate = CreateExecutorState();
864 : :
3152 peter_e@gmx.net 865 : 148090 : rte = makeNode(RangeTblEntry);
866 : 148090 : rte->rtekind = RTE_RELATION;
867 : 148090 : rte->relid = RelationGetRelid(rel->localrel);
868 : 148090 : rte->relkind = rel->localrel->rd_rel->relkind;
2533 tgl@sss.pgh.pa.us 869 : 148090 : rte->rellockmode = AccessShareLock;
870 : :
915 871 : 148090 : addRTEPermissionInfo(&perminfos, rte);
872 : :
211 amitlan@postgresql.o 873 : 148090 : ExecInitRangeTable(estate, list_make1(rte), perminfos,
874 : : bms_make_singleton(1));
875 : :
1568 tgl@sss.pgh.pa.us 876 : 148090 : edata->targetRelInfo = resultRelInfo = makeNode(ResultRelInfo);
877 : :
878 : : /*
879 : : * Use Relation opened by logicalrep_rel_open() instead of opening it
880 : : * again.
881 : : */
882 : 148090 : InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0);
883 : :
884 : : /*
885 : : * We put the ResultRelInfo in the es_opened_result_relations list, even
886 : : * though we don't populate the es_result_relations array. That's a bit
887 : : * bogus, but it's enough to make ExecGetTriggerResultRel() find them.
888 : : *
889 : : * ExecOpenIndices() is not called here either, each execution path doing
890 : : * an apply operation being responsible for that.
891 : : */
1598 michael@paquier.xyz 892 : 148090 : estate->es_opened_result_relations =
1568 tgl@sss.pgh.pa.us 893 : 148090 : lappend(estate->es_opened_result_relations, resultRelInfo);
894 : :
2845 simon@2ndQuadrant.co 895 : 148090 : estate->es_output_cid = GetCurrentCommandId(true);
896 : :
897 : : /* Prepare to catch AFTER triggers. */
3109 peter_e@gmx.net 898 : 148090 : AfterTriggerBeginQuery();
899 : :
900 : : /* other fields of edata remain NULL for now */
901 : :
1568 tgl@sss.pgh.pa.us 902 : 148090 : return edata;
903 : : }
904 : :
905 : : /*
906 : : * Finish any operations related to the executor state created by
907 : : * create_edata_for_relation().
908 : : */
909 : : static void
910 : 148052 : finish_edata(ApplyExecutionData *edata)
911 : : {
912 : 148052 : EState *estate = edata->estate;
913 : :
914 : : /* Handle any queued AFTER triggers. */
1598 michael@paquier.xyz 915 : 148052 : AfterTriggerEndQuery(estate);
916 : :
917 : : /* Shut down tuple routing, if any was done. */
1568 tgl@sss.pgh.pa.us 918 [ + + ]: 148052 : if (edata->proute)
919 : 74 : ExecCleanupTupleRouting(edata->mtstate, edata->proute);
920 : :
921 : : /*
922 : : * Cleanup. It might seem that we should call ExecCloseResultRelations()
923 : : * here, but we intentionally don't. It would close the rel we added to
924 : : * es_opened_result_relations above, which is wrong because we took no
925 : : * corresponding refcount. We rely on ExecCleanupTupleRouting() to close
926 : : * any other relations opened during execution.
927 : : */
1598 michael@paquier.xyz 928 : 148052 : ExecResetTupleTable(estate->es_tupleTable, false);
929 : 148052 : FreeExecutorState(estate);
1568 tgl@sss.pgh.pa.us 930 : 148052 : pfree(edata);
1598 michael@paquier.xyz 931 : 148052 : }
932 : :
933 : : /*
934 : : * Executes default values for columns for which we can't map to remote
935 : : * relation columns.
936 : : *
937 : : * This allows us to support tables which have more columns on the downstream
938 : : * than on the upstream.
939 : : */
940 : : static void
3152 peter_e@gmx.net 941 : 75830 : slot_fill_defaults(LogicalRepRelMapEntry *rel, EState *estate,
942 : : TupleTableSlot *slot)
943 : : {
944 : 75830 : TupleDesc desc = RelationGetDescr(rel->localrel);
945 : 75830 : int num_phys_attrs = desc->natts;
946 : : int i;
947 : : int attnum,
948 : 75830 : num_defaults = 0;
949 : : int *defmap;
950 : : ExprState **defexprs;
951 : : ExprContext *econtext;
952 : :
953 [ + - ]: 75830 : econtext = GetPerTupleExprContext(estate);
954 : :
955 : : /* We got all the data via replication, no need to evaluate anything. */
956 [ + + ]: 75830 : if (num_phys_attrs == rel->remoterel.natts)
957 : 35685 : return;
958 : :
959 : 40145 : defmap = (int *) palloc(num_phys_attrs * sizeof(int));
960 : 40145 : defexprs = (ExprState **) palloc(num_phys_attrs * sizeof(ExprState *));
961 : :
2089 michael@paquier.xyz 962 [ - + ]: 40145 : Assert(rel->attrmap->maplen == num_phys_attrs);
3152 peter_e@gmx.net 963 [ + + ]: 210663 : for (attnum = 0; attnum < num_phys_attrs; attnum++)
964 : : {
965 : : Expr *defexpr;
966 : :
2352 peter@eisentraut.org 967 [ + - + + ]: 170518 : if (TupleDescAttr(desc, attnum)->attisdropped || TupleDescAttr(desc, attnum)->attgenerated)
3152 peter_e@gmx.net 968 : 9 : continue;
969 : :
2089 michael@paquier.xyz 970 [ + + ]: 170509 : if (rel->attrmap->attnums[attnum] >= 0)
3152 peter_e@gmx.net 971 : 92268 : continue;
972 : :
973 : 78241 : defexpr = (Expr *) build_column_default(rel->localrel, attnum + 1);
974 : :
975 [ + + ]: 78241 : if (defexpr != NULL)
976 : : {
977 : : /* Run the expression through planner */
978 : 70131 : defexpr = expression_planner(defexpr);
979 : :
980 : : /* Initialize executable expression in copycontext */
981 : 70131 : defexprs[num_defaults] = ExecInitExpr(defexpr, NULL);
982 : 70131 : defmap[num_defaults] = attnum;
983 : 70131 : num_defaults++;
984 : : }
985 : : }
986 : :
987 [ + + ]: 110276 : for (i = 0; i < num_defaults; i++)
988 : 70131 : slot->tts_values[defmap[i]] =
989 : 70131 : ExecEvalExpr(defexprs[i], econtext, &slot->tts_isnull[defmap[i]]);
990 : : }
991 : :
992 : : /*
993 : : * Store tuple data into slot.
994 : : *
995 : : * Incoming data can be either text or binary format.
996 : : */
997 : : static void
1876 tgl@sss.pgh.pa.us 998 : 148104 : slot_store_data(TupleTableSlot *slot, LogicalRepRelMapEntry *rel,
999 : : LogicalRepTupleData *tupleData)
1000 : : {
3034 bruce@momjian.us 1001 : 148104 : int natts = slot->tts_tupleDescriptor->natts;
1002 : : int i;
1003 : :
3152 peter_e@gmx.net 1004 : 148104 : ExecClearTuple(slot);
1005 : :
1006 : : /* Call the "in" function for each non-dropped, non-null attribute */
2089 michael@paquier.xyz 1007 [ - + ]: 148104 : Assert(natts == rel->attrmap->maplen);
3152 peter_e@gmx.net 1008 [ + + ]: 657726 : for (i = 0; i < natts; i++)
1009 : : {
2939 andres@anarazel.de 1010 : 509622 : Form_pg_attribute att = TupleDescAttr(slot->tts_tupleDescriptor, i);
2089 michael@paquier.xyz 1011 : 509622 : int remoteattnum = rel->attrmap->attnums[i];
1012 : :
1876 tgl@sss.pgh.pa.us 1013 [ + + + + ]: 509622 : if (!att->attisdropped && remoteattnum >= 0)
3152 peter_e@gmx.net 1014 : 302705 : {
1876 tgl@sss.pgh.pa.us 1015 : 302705 : StringInfo colvalue = &tupleData->colvalues[remoteattnum];
1016 : :
1874 1017 [ - + ]: 302705 : Assert(remoteattnum < tupleData->ncols);
1018 : :
1019 : : /* Set attnum for error callback */
1471 akapila@postgresql.o 1020 : 302705 : apply_error_callback_arg.remote_attnum = remoteattnum;
1021 : :
1876 tgl@sss.pgh.pa.us 1022 [ + + ]: 302705 : if (tupleData->colstatus[remoteattnum] == LOGICALREP_COLUMN_TEXT)
1023 : : {
1024 : : Oid typinput;
1025 : : Oid typioparam;
1026 : :
1027 : 142336 : getTypeInputInfo(att->atttypid, &typinput, &typioparam);
1028 : 284672 : slot->tts_values[i] =
1029 : 142336 : OidInputFunctionCall(typinput, colvalue->data,
1030 : : typioparam, att->atttypmod);
1031 : 142336 : slot->tts_isnull[i] = false;
1032 : : }
1033 [ + + ]: 160369 : else if (tupleData->colstatus[remoteattnum] == LOGICALREP_COLUMN_BINARY)
1034 : : {
1035 : : Oid typreceive;
1036 : : Oid typioparam;
1037 : :
1038 : : /*
1039 : : * In some code paths we may be asked to re-parse the same
1040 : : * tuple data. Reset the StringInfo's cursor so that works.
1041 : : */
1042 : 110035 : colvalue->cursor = 0;
1043 : :
1044 : 110035 : getTypeBinaryInputInfo(att->atttypid, &typreceive, &typioparam);
1045 : 220070 : slot->tts_values[i] =
1046 : 110035 : OidReceiveFunctionCall(typreceive, colvalue,
1047 : : typioparam, att->atttypmod);
1048 : :
1049 : : /* Trouble if it didn't eat the whole buffer */
1050 [ - + ]: 110035 : if (colvalue->cursor != colvalue->len)
1876 tgl@sss.pgh.pa.us 1051 [ # # ]:UBC 0 : ereport(ERROR,
1052 : : (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
1053 : : errmsg("incorrect binary data format in logical replication column %d",
1054 : : remoteattnum + 1)));
1876 tgl@sss.pgh.pa.us 1055 :CBC 110035 : slot->tts_isnull[i] = false;
1056 : : }
1057 : : else
1058 : : {
1059 : : /*
1060 : : * NULL value from remote. (We don't expect to see
1061 : : * LOGICALREP_COLUMN_UNCHANGED here, but if we do, treat it as
1062 : : * NULL.)
1063 : : */
1064 : 50334 : slot->tts_values[i] = (Datum) 0;
1065 : 50334 : slot->tts_isnull[i] = true;
1066 : : }
1067 : :
1068 : : /* Reset attnum for error callback */
1471 akapila@postgresql.o 1069 : 302705 : apply_error_callback_arg.remote_attnum = -1;
1070 : : }
1071 : : else
1072 : : {
1073 : : /*
1074 : : * We assign NULL to dropped attributes and missing values
1075 : : * (missing values should be later filled using
1076 : : * slot_fill_defaults).
1077 : : */
3152 peter_e@gmx.net 1078 : 206917 : slot->tts_values[i] = (Datum) 0;
1079 : 206917 : slot->tts_isnull[i] = true;
1080 : : }
1081 : : }
1082 : :
1083 : 148104 : ExecStoreVirtualTuple(slot);
1084 : 148104 : }
1085 : :
1086 : : /*
1087 : : * Replace updated columns with data from the LogicalRepTupleData struct.
1088 : : * This is somewhat similar to heap_modify_tuple but also calls the type
1089 : : * input functions on the user data.
1090 : : *
1091 : : * "slot" is filled with a copy of the tuple in "srcslot", replacing
1092 : : * columns provided in "tupleData" and leaving others as-is.
1093 : : *
1094 : : * Caution: unreplaced pass-by-ref columns in "slot" will point into the
1095 : : * storage for "srcslot". This is OK for current usage, but someday we may
1096 : : * need to materialize "slot" at the end to make it independent of "srcslot".
1097 : : */
1098 : : static void
1876 tgl@sss.pgh.pa.us 1099 : 31924 : slot_modify_data(TupleTableSlot *slot, TupleTableSlot *srcslot,
1100 : : LogicalRepRelMapEntry *rel,
1101 : : LogicalRepTupleData *tupleData)
1102 : : {
3034 bruce@momjian.us 1103 : 31924 : int natts = slot->tts_tupleDescriptor->natts;
1104 : : int i;
1105 : :
1106 : : /* We'll fill "slot" with a virtual tuple, so we must start with ... */
3152 peter_e@gmx.net 1107 : 31924 : ExecClearTuple(slot);
1108 : :
1109 : : /*
1110 : : * Copy all the column data from srcslot, so that we'll have valid values
1111 : : * for unreplaced columns.
1112 : : */
2115 tgl@sss.pgh.pa.us 1113 [ - + ]: 31924 : Assert(natts == srcslot->tts_tupleDescriptor->natts);
1114 : 31924 : slot_getallattrs(srcslot);
1115 : 31924 : memcpy(slot->tts_values, srcslot->tts_values, natts * sizeof(Datum));
1116 : 31924 : memcpy(slot->tts_isnull, srcslot->tts_isnull, natts * sizeof(bool));
1117 : :
1118 : : /* Call the "in" function for each replaced attribute */
2089 michael@paquier.xyz 1119 [ - + ]: 31924 : Assert(natts == rel->attrmap->maplen);
3152 peter_e@gmx.net 1120 [ + + ]: 159280 : for (i = 0; i < natts; i++)
1121 : : {
2939 andres@anarazel.de 1122 : 127356 : Form_pg_attribute att = TupleDescAttr(slot->tts_tupleDescriptor, i);
2089 michael@paquier.xyz 1123 : 127356 : int remoteattnum = rel->attrmap->attnums[i];
1124 : :
2864 peter_e@gmx.net 1125 [ + + ]: 127356 : if (remoteattnum < 0)
3152 1126 : 58519 : continue;
1127 : :
1874 tgl@sss.pgh.pa.us 1128 [ - + ]: 68837 : Assert(remoteattnum < tupleData->ncols);
1129 : :
1876 1130 [ + + ]: 68837 : if (tupleData->colstatus[remoteattnum] != LOGICALREP_COLUMN_UNCHANGED)
1131 : : {
1132 : 68834 : StringInfo colvalue = &tupleData->colvalues[remoteattnum];
1133 : :
1134 : : /* Set attnum for error callback */
1471 akapila@postgresql.o 1135 : 68834 : apply_error_callback_arg.remote_attnum = remoteattnum;
1136 : :
1876 tgl@sss.pgh.pa.us 1137 [ + + ]: 68834 : if (tupleData->colstatus[remoteattnum] == LOGICALREP_COLUMN_TEXT)
1138 : : {
1139 : : Oid typinput;
1140 : : Oid typioparam;
1141 : :
1142 : 25430 : getTypeInputInfo(att->atttypid, &typinput, &typioparam);
1143 : 50860 : slot->tts_values[i] =
1144 : 25430 : OidInputFunctionCall(typinput, colvalue->data,
1145 : : typioparam, att->atttypmod);
1146 : 25430 : slot->tts_isnull[i] = false;
1147 : : }
1148 [ + + ]: 43404 : else if (tupleData->colstatus[remoteattnum] == LOGICALREP_COLUMN_BINARY)
1149 : : {
1150 : : Oid typreceive;
1151 : : Oid typioparam;
1152 : :
1153 : : /*
1154 : : * In some code paths we may be asked to re-parse the same
1155 : : * tuple data. Reset the StringInfo's cursor so that works.
1156 : : */
1157 : 43356 : colvalue->cursor = 0;
1158 : :
1159 : 43356 : getTypeBinaryInputInfo(att->atttypid, &typreceive, &typioparam);
1160 : 86712 : slot->tts_values[i] =
1161 : 43356 : OidReceiveFunctionCall(typreceive, colvalue,
1162 : : typioparam, att->atttypmod);
1163 : :
1164 : : /* Trouble if it didn't eat the whole buffer */
1165 [ - + ]: 43356 : if (colvalue->cursor != colvalue->len)
1876 tgl@sss.pgh.pa.us 1166 [ # # ]:UBC 0 : ereport(ERROR,
1167 : : (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
1168 : : errmsg("incorrect binary data format in logical replication column %d",
1169 : : remoteattnum + 1)));
1876 tgl@sss.pgh.pa.us 1170 :CBC 43356 : slot->tts_isnull[i] = false;
1171 : : }
1172 : : else
1173 : : {
1174 : : /* must be LOGICALREP_COLUMN_NULL */
1175 : 48 : slot->tts_values[i] = (Datum) 0;
1176 : 48 : slot->tts_isnull[i] = true;
1177 : : }
1178 : :
1179 : : /* Reset attnum for error callback */
1471 akapila@postgresql.o 1180 : 68834 : apply_error_callback_arg.remote_attnum = -1;
1181 : : }
1182 : : }
1183 : :
1184 : : /* And finally, declare that "slot" contains a valid virtual tuple */
3152 peter_e@gmx.net 1185 : 31924 : ExecStoreVirtualTuple(slot);
1186 : 31924 : }
1187 : :
1188 : : /*
1189 : : * Handle BEGIN message.
1190 : : */
1191 : : static void
1192 : 481 : apply_handle_begin(StringInfo s)
1193 : : {
1194 : : LogicalRepBeginData begin_data;
1195 : :
1196 : : /* There must not be an active streaming transaction. */
963 akapila@postgresql.o 1197 [ - + ]: 481 : Assert(!TransactionIdIsValid(stream_xid));
1198 : :
3152 peter_e@gmx.net 1199 : 481 : logicalrep_read_begin(s, &begin_data);
1278 akapila@postgresql.o 1200 : 481 : set_apply_error_context_xact(begin_data.xid, begin_data.final_lsn);
1201 : :
3089 peter_e@gmx.net 1202 : 481 : remote_final_lsn = begin_data.final_lsn;
1203 : :
1264 akapila@postgresql.o 1204 : 481 : maybe_start_skipping_changes(begin_data.final_lsn);
1205 : :
3152 peter_e@gmx.net 1206 : 481 : in_remote_transaction = true;
1207 : :
1208 : 481 : pgstat_report_activity(STATE_RUNNING, NULL);
1209 : 481 : }
1210 : :
1211 : : /*
1212 : : * Handle COMMIT message.
1213 : : *
1214 : : * TODO, support tracking of multiple origins
1215 : : */
1216 : : static void
1217 : 432 : apply_handle_commit(StringInfo s)
1218 : : {
1219 : : LogicalRepCommitData commit_data;
1220 : :
1221 : 432 : logicalrep_read_commit(s, &commit_data);
1222 : :
1547 tgl@sss.pgh.pa.us 1223 [ - + ]: 432 : if (commit_data.commit_lsn != remote_final_lsn)
1547 tgl@sss.pgh.pa.us 1224 [ # # ]:UBC 0 : ereport(ERROR,
1225 : : (errcode(ERRCODE_PROTOCOL_VIOLATION),
1226 : : errmsg_internal("incorrect commit LSN %X/%08X in commit message (expected %X/%08X)",
1227 : : LSN_FORMAT_ARGS(commit_data.commit_lsn),
1228 : : LSN_FORMAT_ARGS(remote_final_lsn))));
1229 : :
1499 akapila@postgresql.o 1230 :CBC 432 : apply_handle_commit_internal(&commit_data);
1231 : :
1232 : : /* Process any tables that are being synchronized in parallel. */
3089 peter_e@gmx.net 1233 : 432 : process_syncing_tables(commit_data.end_lsn);
1234 : :
3152 1235 : 432 : pgstat_report_activity(STATE_IDLE, NULL);
1471 akapila@postgresql.o 1236 : 432 : reset_apply_error_context_info();
3152 peter_e@gmx.net 1237 : 432 : }
1238 : :
1239 : : /*
1240 : : * Handle BEGIN PREPARE message.
1241 : : */
1242 : : static void
1515 akapila@postgresql.o 1243 : 16 : apply_handle_begin_prepare(StringInfo s)
1244 : : {
1245 : : LogicalRepPreparedTxnData begin_data;
1246 : :
1247 : : /* Tablesync should never receive prepare. */
1248 [ - + ]: 16 : if (am_tablesync_worker())
1515 akapila@postgresql.o 1249 [ # # ]:UBC 0 : ereport(ERROR,
1250 : : (errcode(ERRCODE_PROTOCOL_VIOLATION),
1251 : : errmsg_internal("tablesync worker received a BEGIN PREPARE message")));
1252 : :
1253 : : /* There must not be an active streaming transaction. */
963 akapila@postgresql.o 1254 [ - + ]:CBC 16 : Assert(!TransactionIdIsValid(stream_xid));
1255 : :
1515 1256 : 16 : logicalrep_read_begin_prepare(s, &begin_data);
1278 1257 : 16 : set_apply_error_context_xact(begin_data.xid, begin_data.prepare_lsn);
1258 : :
1515 1259 : 16 : remote_final_lsn = begin_data.prepare_lsn;
1260 : :
1264 1261 : 16 : maybe_start_skipping_changes(begin_data.prepare_lsn);
1262 : :
1515 1263 : 16 : in_remote_transaction = true;
1264 : :
1265 : 16 : pgstat_report_activity(STATE_RUNNING, NULL);
1266 : 16 : }
1267 : :
1268 : : /*
1269 : : * Common function to prepare the GID.
1270 : : */
1271 : : static void
1500 1272 : 23 : apply_handle_prepare_internal(LogicalRepPreparedTxnData *prepare_data)
1273 : : {
1274 : : char gid[GIDSIZE];
1275 : :
1276 : : /*
1277 : : * Compute unique GID for two_phase transactions. We don't use GID of
1278 : : * prepared transaction sent by server as that can lead to deadlock when
1279 : : * we have multiple subscriptions from same node point to publications on
1280 : : * the same node. See comments atop worker.c
1281 : : */
1282 : 23 : TwoPhaseTransactionGid(MySubscription->oid, prepare_data->xid,
1283 : : gid, sizeof(gid));
1284 : :
1285 : : /*
1286 : : * BeginTransactionBlock is necessary to balance the EndTransactionBlock
1287 : : * called within the PrepareTransactionBlock below.
1288 : : */
971 1289 [ + - ]: 23 : if (!IsTransactionBlock())
1290 : : {
1291 : 23 : BeginTransactionBlock();
1292 : 23 : CommitTransactionCommand(); /* Completes the preceding Begin command. */
1293 : : }
1294 : :
1295 : : /*
1296 : : * Update origin state so we can restart streaming from correct position
1297 : : * in case of crash.
1298 : : */
1500 1299 : 23 : replorigin_session_origin_lsn = prepare_data->end_lsn;
1300 : 23 : replorigin_session_origin_timestamp = prepare_data->prepare_time;
1301 : :
1302 : 23 : PrepareTransactionBlock(gid);
1303 : 23 : }
1304 : :
1305 : : /*
1306 : : * Handle PREPARE message.
1307 : : */
1308 : : static void
1515 1309 : 15 : apply_handle_prepare(StringInfo s)
1310 : : {
1311 : : LogicalRepPreparedTxnData prepare_data;
1312 : :
1313 : 15 : logicalrep_read_prepare(s, &prepare_data);
1314 : :
1315 [ - + ]: 15 : if (prepare_data.prepare_lsn != remote_final_lsn)
1515 akapila@postgresql.o 1316 [ # # ]:UBC 0 : ereport(ERROR,
1317 : : (errcode(ERRCODE_PROTOCOL_VIOLATION),
1318 : : errmsg_internal("incorrect prepare LSN %X/%08X in prepare message (expected %X/%08X)",
1319 : : LSN_FORMAT_ARGS(prepare_data.prepare_lsn),
1320 : : LSN_FORMAT_ARGS(remote_final_lsn))));
1321 : :
1322 : : /*
1323 : : * Unlike commit, here, we always prepare the transaction even though no
1324 : : * change has happened in this transaction or all changes are skipped. It
1325 : : * is done this way because at commit prepared time, we won't know whether
1326 : : * we have skipped preparing a transaction because of those reasons.
1327 : : *
1328 : : * XXX, We can optimize such that at commit prepared time, we first check
1329 : : * whether we have prepared the transaction or not but that doesn't seem
1330 : : * worthwhile because such cases shouldn't be common.
1331 : : */
1515 akapila@postgresql.o 1332 :CBC 15 : begin_replication_step();
1333 : :
1500 1334 : 15 : apply_handle_prepare_internal(&prepare_data);
1335 : :
1515 1336 : 15 : end_replication_step();
1337 : 15 : CommitTransactionCommand();
1338 : 14 : pgstat_report_stat(false);
1339 : :
1340 : : /*
1341 : : * It is okay not to set the local_end LSN for the prepare because we
1342 : : * always flush the prepare record. So, we can send the acknowledgment of
1343 : : * the remote_end LSN as soon as prepare is finished.
1344 : : *
1345 : : * XXX For the sake of consistency with commit, we could have set it with
1346 : : * the LSN of prepare but as of now we don't track that value similar to
1347 : : * XactLastCommitEnd, and adding it for this purpose doesn't seems worth
1348 : : * it.
1349 : : */
393 1350 : 14 : store_flush_position(prepare_data.end_lsn, InvalidXLogRecPtr);
1351 : :
1515 1352 : 14 : in_remote_transaction = false;
1353 : :
1354 : : /* Process any tables that are being synchronized in parallel. */
1355 : 14 : process_syncing_tables(prepare_data.end_lsn);
1356 : :
1357 : : /*
1358 : : * Since we have already prepared the transaction, in a case where the
1359 : : * server crashes before clearing the subskiplsn, it will be left but the
1360 : : * transaction won't be resent. But that's okay because it's a rare case
1361 : : * and the subskiplsn will be cleared when finishing the next transaction.
1362 : : */
1264 1363 : 14 : stop_skipping_changes();
1364 : 14 : clear_subscription_skip_lsn(prepare_data.prepare_lsn);
1365 : :
1515 1366 : 14 : pgstat_report_activity(STATE_IDLE, NULL);
1471 1367 : 14 : reset_apply_error_context_info();
1515 1368 : 14 : }
1369 : :
1370 : : /*
1371 : : * Handle a COMMIT PREPARED of a previously PREPARED transaction.
1372 : : *
1373 : : * Note that we don't need to wait here if the transaction was prepared in a
1374 : : * parallel apply worker. In that case, we have already waited for the prepare
1375 : : * to finish in apply_handle_stream_prepare() which will ensure all the
1376 : : * operations in that transaction have happened in the subscriber, so no
1377 : : * concurrent transaction can cause deadlock or transaction dependency issues.
1378 : : */
1379 : : static void
1380 : 20 : apply_handle_commit_prepared(StringInfo s)
1381 : : {
1382 : : LogicalRepCommitPreparedTxnData prepare_data;
1383 : : char gid[GIDSIZE];
1384 : :
1385 : 20 : logicalrep_read_commit_prepared(s, &prepare_data);
1278 1386 : 20 : set_apply_error_context_xact(prepare_data.xid, prepare_data.commit_lsn);
1387 : :
1388 : : /* Compute GID for two_phase transactions. */
1515 1389 : 20 : TwoPhaseTransactionGid(MySubscription->oid, prepare_data.xid,
1390 : : gid, sizeof(gid));
1391 : :
1392 : : /* There is no transaction when COMMIT PREPARED is called */
1393 : 20 : begin_replication_step();
1394 : :
1395 : : /*
1396 : : * Update origin state so we can restart streaming from correct position
1397 : : * in case of crash.
1398 : : */
1399 : 20 : replorigin_session_origin_lsn = prepare_data.end_lsn;
1400 : 20 : replorigin_session_origin_timestamp = prepare_data.commit_time;
1401 : :
1402 : 20 : FinishPreparedTransaction(gid, true);
1403 : 20 : end_replication_step();
1404 : 20 : CommitTransactionCommand();
1405 : 20 : pgstat_report_stat(false);
1406 : :
971 1407 : 20 : store_flush_position(prepare_data.end_lsn, XactLastCommitEnd);
1515 1408 : 20 : in_remote_transaction = false;
1409 : :
1410 : : /* Process any tables that are being synchronized in parallel. */
1411 : 20 : process_syncing_tables(prepare_data.end_lsn);
1412 : :
1264 1413 : 20 : clear_subscription_skip_lsn(prepare_data.end_lsn);
1414 : :
1515 1415 : 20 : pgstat_report_activity(STATE_IDLE, NULL);
1471 1416 : 20 : reset_apply_error_context_info();
1515 1417 : 20 : }
1418 : :
1419 : : /*
1420 : : * Handle a ROLLBACK PREPARED of a previously PREPARED TRANSACTION.
1421 : : *
1422 : : * Note that we don't need to wait here if the transaction was prepared in a
1423 : : * parallel apply worker. In that case, we have already waited for the prepare
1424 : : * to finish in apply_handle_stream_prepare() which will ensure all the
1425 : : * operations in that transaction have happened in the subscriber, so no
1426 : : * concurrent transaction can cause deadlock or transaction dependency issues.
1427 : : */
1428 : : static void
1429 : 5 : apply_handle_rollback_prepared(StringInfo s)
1430 : : {
1431 : : LogicalRepRollbackPreparedTxnData rollback_data;
1432 : : char gid[GIDSIZE];
1433 : :
1434 : 5 : logicalrep_read_rollback_prepared(s, &rollback_data);
1278 1435 : 5 : set_apply_error_context_xact(rollback_data.xid, rollback_data.rollback_end_lsn);
1436 : :
1437 : : /* Compute GID for two_phase transactions. */
1515 1438 : 5 : TwoPhaseTransactionGid(MySubscription->oid, rollback_data.xid,
1439 : : gid, sizeof(gid));
1440 : :
1441 : : /*
1442 : : * It is possible that we haven't received prepare because it occurred
1443 : : * before walsender reached a consistent point or the two_phase was still
1444 : : * not enabled by that time, so in such cases, we need to skip rollback
1445 : : * prepared.
1446 : : */
1447 [ + - ]: 5 : if (LookupGXact(gid, rollback_data.prepare_end_lsn,
1448 : : rollback_data.prepare_time))
1449 : : {
1450 : : /*
1451 : : * Update origin state so we can restart streaming from correct
1452 : : * position in case of crash.
1453 : : */
1454 : 5 : replorigin_session_origin_lsn = rollback_data.rollback_end_lsn;
1455 : 5 : replorigin_session_origin_timestamp = rollback_data.rollback_time;
1456 : :
1457 : : /* There is no transaction when ABORT/ROLLBACK PREPARED is called */
1458 : 5 : begin_replication_step();
1459 : 5 : FinishPreparedTransaction(gid, false);
1460 : 5 : end_replication_step();
1461 : 5 : CommitTransactionCommand();
1462 : :
1264 1463 : 5 : clear_subscription_skip_lsn(rollback_data.rollback_end_lsn);
1464 : : }
1465 : :
1515 1466 : 5 : pgstat_report_stat(false);
1467 : :
1468 : : /*
1469 : : * It is okay not to set the local_end LSN for the rollback of prepared
1470 : : * transaction because we always flush the WAL record for it. See
1471 : : * apply_handle_prepare.
1472 : : */
393 1473 : 5 : store_flush_position(rollback_data.rollback_end_lsn, InvalidXLogRecPtr);
1515 1474 : 5 : in_remote_transaction = false;
1475 : :
1476 : : /* Process any tables that are being synchronized in parallel. */
1477 : 5 : process_syncing_tables(rollback_data.rollback_end_lsn);
1478 : :
1479 : 5 : pgstat_report_activity(STATE_IDLE, NULL);
1471 1480 : 5 : reset_apply_error_context_info();
1515 1481 : 5 : }
1482 : :
1483 : : /*
1484 : : * Handle STREAM PREPARE.
1485 : : */
1486 : : static void
1494 1487 : 11 : apply_handle_stream_prepare(StringInfo s)
1488 : : {
1489 : : LogicalRepPreparedTxnData prepare_data;
1490 : : ParallelApplyWorkerInfo *winfo;
1491 : : TransApplyAction apply_action;
1492 : :
1493 : : /* Save the message before it is consumed. */
971 1494 : 11 : StringInfoData original_msg = *s;
1495 : :
1494 1496 [ - + ]: 11 : if (in_streamed_transaction)
1494 akapila@postgresql.o 1497 [ # # ]:UBC 0 : ereport(ERROR,
1498 : : (errcode(ERRCODE_PROTOCOL_VIOLATION),
1499 : : errmsg_internal("STREAM PREPARE message without STREAM STOP")));
1500 : :
1501 : : /* Tablesync should never receive prepare. */
1494 akapila@postgresql.o 1502 [ - + ]:CBC 11 : if (am_tablesync_worker())
1494 akapila@postgresql.o 1503 [ # # ]:UBC 0 : ereport(ERROR,
1504 : : (errcode(ERRCODE_PROTOCOL_VIOLATION),
1505 : : errmsg_internal("tablesync worker received a STREAM PREPARE message")));
1506 : :
1494 akapila@postgresql.o 1507 :CBC 11 : logicalrep_read_stream_prepare(s, &prepare_data);
1278 1508 : 11 : set_apply_error_context_xact(prepare_data.xid, prepare_data.prepare_lsn);
1509 : :
971 1510 : 11 : apply_action = get_transaction_apply_action(prepare_data.xid, &winfo);
1511 : :
1512 [ + + + + : 11 : switch (apply_action)
- ]
1513 : : {
963 1514 : 5 : case TRANS_LEADER_APPLY:
1515 : :
1516 : : /*
1517 : : * The transaction has been serialized to file, so replay all the
1518 : : * spooled operations.
1519 : : */
971 1520 : 5 : apply_spooled_messages(MyLogicalRepWorker->stream_fileset,
1521 : : prepare_data.xid, prepare_data.prepare_lsn);
1522 : :
1523 : : /* Mark the transaction as prepared. */
1524 : 5 : apply_handle_prepare_internal(&prepare_data);
1525 : :
1526 : 5 : CommitTransactionCommand();
1527 : :
1528 : : /*
1529 : : * It is okay not to set the local_end LSN for the prepare because
1530 : : * we always flush the prepare record. See apply_handle_prepare.
1531 : : */
393 1532 : 5 : store_flush_position(prepare_data.end_lsn, InvalidXLogRecPtr);
1533 : :
971 1534 : 5 : in_remote_transaction = false;
1535 : :
1536 : : /* Unlink the files with serialized changes and subxact info. */
1537 : 5 : stream_cleanup_files(MyLogicalRepWorker->subid, prepare_data.xid);
1538 : :
1539 [ - + ]: 5 : elog(DEBUG1, "finished processing the STREAM PREPARE command");
1540 : 5 : break;
1541 : :
1542 : 2 : case TRANS_LEADER_SEND_TO_PARALLEL:
1543 [ - + ]: 2 : Assert(winfo);
1544 : :
1545 [ + - ]: 2 : if (pa_send_data(winfo, s->len, s->data))
1546 : : {
1547 : : /* Finish processing the streaming transaction. */
1548 : 2 : pa_xact_finish(winfo, prepare_data.end_lsn);
1549 : 2 : break;
1550 : : }
1551 : :
1552 : : /*
1553 : : * Switch to serialize mode when we are not able to send the
1554 : : * change to parallel apply worker.
1555 : : */
971 akapila@postgresql.o 1556 :UBC 0 : pa_switch_to_partial_serialize(winfo, true);
1557 : :
1558 : : /* fall through */
971 akapila@postgresql.o 1559 :CBC 1 : case TRANS_LEADER_PARTIAL_SERIALIZE:
1560 [ - + ]: 1 : Assert(winfo);
1561 : :
1562 : 1 : stream_open_and_write_change(prepare_data.xid,
1563 : : LOGICAL_REP_MSG_STREAM_PREPARE,
1564 : : &original_msg);
1565 : :
1566 : 1 : pa_set_fileset_state(winfo->shared, FS_SERIALIZE_DONE);
1567 : :
1568 : : /* Finish processing the streaming transaction. */
1569 : 1 : pa_xact_finish(winfo, prepare_data.end_lsn);
1570 : 1 : break;
1571 : :
1572 : 3 : case TRANS_PARALLEL_APPLY:
1573 : :
1574 : : /*
1575 : : * If the parallel apply worker is applying spooled messages then
1576 : : * close the file before preparing.
1577 : : */
1578 [ + + ]: 3 : if (stream_fd)
1579 : 1 : stream_close_file();
1580 : :
1581 : 3 : begin_replication_step();
1582 : :
1583 : : /* Mark the transaction as prepared. */
1584 : 3 : apply_handle_prepare_internal(&prepare_data);
1585 : :
1586 : 3 : end_replication_step();
1587 : :
1588 : 3 : CommitTransactionCommand();
1589 : :
1590 : : /*
1591 : : * It is okay not to set the local_end LSN for the prepare because
1592 : : * we always flush the prepare record. See apply_handle_prepare.
1593 : : */
393 1594 : 3 : MyParallelShared->last_commit_end = InvalidXLogRecPtr;
1595 : :
971 1596 : 3 : pa_set_xact_state(MyParallelShared, PARALLEL_TRANS_FINISHED);
1597 : 3 : pa_unlock_transaction(MyParallelShared->xid, AccessExclusiveLock);
1598 : :
1599 : 3 : pa_reset_subtrans();
1600 : :
1601 [ + + ]: 3 : elog(DEBUG1, "finished processing the STREAM PREPARE command");
1602 : 3 : break;
1603 : :
971 akapila@postgresql.o 1604 :UBC 0 : default:
963 1605 [ # # ]: 0 : elog(ERROR, "unexpected apply action: %d", (int) apply_action);
1606 : : break;
1607 : : }
1608 : :
971 akapila@postgresql.o 1609 :CBC 11 : pgstat_report_stat(false);
1610 : :
1611 : : /* Process any tables that are being synchronized in parallel. */
1494 1612 : 11 : process_syncing_tables(prepare_data.end_lsn);
1613 : :
1614 : : /*
1615 : : * Similar to prepare case, the subskiplsn could be left in a case of
1616 : : * server crash but it's okay. See the comments in apply_handle_prepare().
1617 : : */
1264 1618 : 11 : stop_skipping_changes();
1619 : 11 : clear_subscription_skip_lsn(prepare_data.prepare_lsn);
1620 : :
1494 1621 : 11 : pgstat_report_activity(STATE_IDLE, NULL);
1622 : :
1471 1623 : 11 : reset_apply_error_context_info();
1494 1624 : 11 : }
1625 : :
1626 : : /*
1627 : : * Handle ORIGIN message.
1628 : : *
1629 : : * TODO, support tracking of multiple origins
1630 : : */
1631 : : static void
3152 peter_e@gmx.net 1632 : 7 : apply_handle_origin(StringInfo s)
1633 : : {
1634 : : /*
1635 : : * ORIGIN message can only come inside streaming transaction or inside
1636 : : * remote transaction and before any actual writes.
1637 : : */
1829 akapila@postgresql.o 1638 [ + + ]: 7 : if (!in_streamed_transaction &&
1639 [ + - - + ]: 10 : (!in_remote_transaction ||
1640 [ - - ]: 5 : (IsTransactionState() && !am_tablesync_worker())))
3152 peter_e@gmx.net 1641 [ # # ]:UBC 0 : ereport(ERROR,
1642 : : (errcode(ERRCODE_PROTOCOL_VIOLATION),
1643 : : errmsg_internal("ORIGIN message sent out of order")));
3152 peter_e@gmx.net 1644 :CBC 7 : }
1645 : :
1646 : : /*
1647 : : * Initialize fileset (if not already done).
1648 : : *
1649 : : * Create a new file when first_segment is true, otherwise open the existing
1650 : : * file.
1651 : : */
1652 : : void
971 akapila@postgresql.o 1653 : 363 : stream_start_internal(TransactionId xid, bool first_segment)
1654 : : {
1655 : 363 : begin_replication_step();
1656 : :
1657 : : /*
1658 : : * Initialize the worker's stream_fileset if we haven't yet. This will be
1659 : : * used for the entire duration of the worker so create it in a permanent
1660 : : * context. We create this on the very first streaming message from any
1661 : : * transaction and then use it for this and other streaming transactions.
1662 : : * Now, we could create a fileset at the start of the worker as well but
1663 : : * then we won't be sure that it will ever be used.
1664 : : */
1665 [ + + ]: 363 : if (!MyLogicalRepWorker->stream_fileset)
1666 : : {
1667 : : MemoryContext oldctx;
1668 : :
1669 : 14 : oldctx = MemoryContextSwitchTo(ApplyContext);
1670 : :
1671 : 14 : MyLogicalRepWorker->stream_fileset = palloc(sizeof(FileSet));
1672 : 14 : FileSetInit(MyLogicalRepWorker->stream_fileset);
1673 : :
1674 : 14 : MemoryContextSwitchTo(oldctx);
1675 : : }
1676 : :
1677 : : /* Open the spool file for this transaction. */
1678 : 363 : stream_open_file(MyLogicalRepWorker->subid, xid, first_segment);
1679 : :
1680 : : /* If this is not the first segment, open existing subxact file. */
1681 [ + + ]: 363 : if (!first_segment)
1682 : 331 : subxact_info_read(MyLogicalRepWorker->subid, xid);
1683 : :
1684 : 363 : end_replication_step();
1685 : 363 : }
1686 : :
1687 : : /*
1688 : : * Handle STREAM START message.
1689 : : */
1690 : : static void
1829 1691 : 857 : apply_handle_stream_start(StringInfo s)
1692 : : {
1693 : : bool first_segment;
1694 : : ParallelApplyWorkerInfo *winfo;
1695 : : TransApplyAction apply_action;
1696 : :
1697 : : /* Save the message before it is consumed. */
971 1698 : 857 : StringInfoData original_msg = *s;
1699 : :
1547 tgl@sss.pgh.pa.us 1700 [ - + ]: 857 : if (in_streamed_transaction)
1547 tgl@sss.pgh.pa.us 1701 [ # # ]:UBC 0 : ereport(ERROR,
1702 : : (errcode(ERRCODE_PROTOCOL_VIOLATION),
1703 : : errmsg_internal("duplicate STREAM START message")));
1704 : :
1705 : : /* There must not be an active streaming transaction. */
963 akapila@postgresql.o 1706 [ - + ]:CBC 857 : Assert(!TransactionIdIsValid(stream_xid));
1707 : :
1708 : : /* notify handle methods we're processing a remote transaction */
1829 1709 : 857 : in_streamed_transaction = true;
1710 : :
1711 : : /* extract XID of the top-level transaction */
1712 : 857 : stream_xid = logicalrep_read_stream_start(s, &first_segment);
1713 : :
1547 tgl@sss.pgh.pa.us 1714 [ - + ]: 857 : if (!TransactionIdIsValid(stream_xid))
1547 tgl@sss.pgh.pa.us 1715 [ # # ]:UBC 0 : ereport(ERROR,
1716 : : (errcode(ERRCODE_PROTOCOL_VIOLATION),
1717 : : errmsg_internal("invalid transaction ID in streamed replication transaction")));
1718 : :
1278 akapila@postgresql.o 1719 :CBC 857 : set_apply_error_context_xact(stream_xid, InvalidXLogRecPtr);
1720 : :
1721 : : /* Try to allocate a worker for the streaming transaction. */
971 1722 [ + + ]: 857 : if (first_segment)
1723 : 82 : pa_allocate_worker(stream_xid);
1724 : :
1725 : 857 : apply_action = get_transaction_apply_action(stream_xid, &winfo);
1726 : :
1727 [ + + + + : 857 : switch (apply_action)
- ]
1728 : : {
1729 : 343 : case TRANS_LEADER_SERIALIZE:
1730 : :
1731 : : /*
1732 : : * Function stream_start_internal starts a transaction. This
1733 : : * transaction will be committed on the stream stop unless it is a
1734 : : * tablesync worker in which case it will be committed after
1735 : : * processing all the messages. We need this transaction for
1736 : : * handling the BufFile, used for serializing the streaming data
1737 : : * and subxact info.
1738 : : */
1739 : 343 : stream_start_internal(stream_xid, first_segment);
1740 : 343 : break;
1741 : :
1742 : 251 : case TRANS_LEADER_SEND_TO_PARALLEL:
1743 [ - + ]: 251 : Assert(winfo);
1744 : :
1745 : : /*
1746 : : * Once we start serializing the changes, the parallel apply
1747 : : * worker will wait for the leader to release the stream lock
1748 : : * until the end of the transaction. So, we don't need to release
1749 : : * the lock or increment the stream count in that case.
1750 : : */
1751 [ + + ]: 251 : if (pa_send_data(winfo, s->len, s->data))
1752 : : {
1753 : : /*
1754 : : * Unlock the shared object lock so that the parallel apply
1755 : : * worker can continue to receive changes.
1756 : : */
1757 [ + + ]: 247 : if (!first_segment)
1758 : 224 : pa_unlock_stream(winfo->shared->xid, AccessExclusiveLock);
1759 : :
1760 : : /*
1761 : : * Increment the number of streaming blocks waiting to be
1762 : : * processed by parallel apply worker.
1763 : : */
1764 : 247 : pg_atomic_add_fetch_u32(&winfo->shared->pending_stream_count, 1);
1765 : :
1766 : : /* Cache the parallel apply worker for this transaction. */
1767 : 247 : pa_set_stream_apply_worker(winfo);
1768 : 247 : break;
1769 : : }
1770 : :
1771 : : /*
1772 : : * Switch to serialize mode when we are not able to send the
1773 : : * change to parallel apply worker.
1774 : : */
1775 : 4 : pa_switch_to_partial_serialize(winfo, !first_segment);
1776 : :
1777 : : /* fall through */
1778 : 15 : case TRANS_LEADER_PARTIAL_SERIALIZE:
1779 [ - + ]: 15 : Assert(winfo);
1780 : :
1781 : : /*
1782 : : * Open the spool file unless it was already opened when switching
1783 : : * to serialize mode. The transaction started in
1784 : : * stream_start_internal will be committed on the stream stop.
1785 : : */
1786 [ + + ]: 15 : if (apply_action != TRANS_LEADER_SEND_TO_PARALLEL)
1787 : 11 : stream_start_internal(stream_xid, first_segment);
1788 : :
1789 : 15 : stream_write_change(LOGICAL_REP_MSG_STREAM_START, &original_msg);
1790 : :
1791 : : /* Cache the parallel apply worker for this transaction. */
1792 : 15 : pa_set_stream_apply_worker(winfo);
1793 : 15 : break;
1794 : :
1795 : 252 : case TRANS_PARALLEL_APPLY:
1796 [ + + ]: 252 : if (first_segment)
1797 : : {
1798 : : /* Hold the lock until the end of the transaction. */
1799 : 27 : pa_lock_transaction(MyParallelShared->xid, AccessExclusiveLock);
1800 : 27 : pa_set_xact_state(MyParallelShared, PARALLEL_TRANS_STARTED);
1801 : :
1802 : : /*
1803 : : * Signal the leader apply worker, as it may be waiting for
1804 : : * us.
1805 : : */
1806 : 27 : logicalrep_worker_wakeup(MyLogicalRepWorker->subid, InvalidOid);
1807 : : }
1808 : :
1809 : 252 : parallel_stream_nchanges = 0;
1810 : 252 : break;
1811 : :
971 akapila@postgresql.o 1812 :UBC 0 : default:
963 1813 [ # # ]: 0 : elog(ERROR, "unexpected apply action: %d", (int) apply_action);
1814 : : break;
1815 : : }
1816 : :
971 akapila@postgresql.o 1817 :CBC 857 : pgstat_report_activity(STATE_RUNNING, NULL);
1829 1818 : 857 : }
1819 : :
1820 : : /*
1821 : : * Update the information about subxacts and close the file.
1822 : : *
1823 : : * This function should be called when the stream_start_internal function has
1824 : : * been called.
1825 : : */
1826 : : void
971 1827 : 363 : stream_stop_internal(TransactionId xid)
1828 : : {
1829 : : /*
1830 : : * Serialize information about subxacts for the toplevel transaction, then
1831 : : * close the stream messages spool file.
1832 : : */
1833 : 363 : subxact_info_write(MyLogicalRepWorker->subid, xid);
1829 1834 : 363 : stream_close_file();
1835 : :
1836 : : /* We must be in a valid transaction state */
1837 [ - + ]: 363 : Assert(IsTransactionState());
1838 : :
1839 : : /* Commit the per-stream transaction */
1667 1840 : 363 : CommitTransactionCommand();
1841 : :
1842 : : /* Reset per-stream context */
1829 1843 : 363 : MemoryContextReset(LogicalStreamingContext);
1844 : 363 : }
1845 : :
1846 : : /*
1847 : : * Handle STREAM STOP message.
1848 : : */
1849 : : static void
971 1850 : 856 : apply_handle_stream_stop(StringInfo s)
1851 : : {
1852 : : ParallelApplyWorkerInfo *winfo;
1853 : : TransApplyAction apply_action;
1854 : :
1855 [ - + ]: 856 : if (!in_streamed_transaction)
1547 tgl@sss.pgh.pa.us 1856 [ # # ]:UBC 0 : ereport(ERROR,
1857 : : (errcode(ERRCODE_PROTOCOL_VIOLATION),
1858 : : errmsg_internal("STREAM STOP message without STREAM START")));
1859 : :
971 akapila@postgresql.o 1860 :CBC 856 : apply_action = get_transaction_apply_action(stream_xid, &winfo);
1861 : :
1862 [ + + + + : 856 : switch (apply_action)
- ]
1863 : : {
1864 : 343 : case TRANS_LEADER_SERIALIZE:
1865 : 343 : stream_stop_internal(stream_xid);
1866 : 343 : break;
1867 : :
1868 : 247 : case TRANS_LEADER_SEND_TO_PARALLEL:
1869 [ - + ]: 247 : Assert(winfo);
1870 : :
1871 : : /*
1872 : : * Lock before sending the STREAM_STOP message so that the leader
1873 : : * can hold the lock first and the parallel apply worker will wait
1874 : : * for leader to release the lock. See Locking Considerations atop
1875 : : * applyparallelworker.c.
1876 : : */
1877 : 247 : pa_lock_stream(winfo->shared->xid, AccessExclusiveLock);
1878 : :
1879 [ + - ]: 247 : if (pa_send_data(winfo, s->len, s->data))
1880 : : {
1881 : 247 : pa_set_stream_apply_worker(NULL);
1882 : 247 : break;
1883 : : }
1884 : :
1885 : : /*
1886 : : * Switch to serialize mode when we are not able to send the
1887 : : * change to parallel apply worker.
1888 : : */
971 akapila@postgresql.o 1889 :UBC 0 : pa_switch_to_partial_serialize(winfo, true);
1890 : :
1891 : : /* fall through */
971 akapila@postgresql.o 1892 :CBC 15 : case TRANS_LEADER_PARTIAL_SERIALIZE:
1893 : 15 : stream_write_change(LOGICAL_REP_MSG_STREAM_STOP, s);
1894 : 15 : stream_stop_internal(stream_xid);
1895 : 15 : pa_set_stream_apply_worker(NULL);
1896 : 15 : break;
1897 : :
1898 : 251 : case TRANS_PARALLEL_APPLY:
1899 [ + + ]: 251 : elog(DEBUG1, "applied %u changes in the streaming chunk",
1900 : : parallel_stream_nchanges);
1901 : :
1902 : : /*
1903 : : * By the time parallel apply worker is processing the changes in
1904 : : * the current streaming block, the leader apply worker may have
1905 : : * sent multiple streaming blocks. This can lead to parallel apply
1906 : : * worker start waiting even when there are more chunk of streams
1907 : : * in the queue. So, try to lock only if there is no message left
1908 : : * in the queue. See Locking Considerations atop
1909 : : * applyparallelworker.c.
1910 : : *
1911 : : * Note that here we have a race condition where we can start
1912 : : * waiting even when there are pending streaming chunks. This can
1913 : : * happen if the leader sends another streaming block and acquires
1914 : : * the stream lock again after the parallel apply worker checks
1915 : : * that there is no pending streaming block and before it actually
1916 : : * starts waiting on a lock. We can handle this case by not
1917 : : * allowing the leader to increment the stream block count during
1918 : : * the time parallel apply worker acquires the lock but it is not
1919 : : * clear whether that is worth the complexity.
1920 : : *
1921 : : * Now, if this missed chunk contains rollback to savepoint, then
1922 : : * there is a risk of deadlock which probably shouldn't happen
1923 : : * after restart.
1924 : : */
1925 : 251 : pa_decr_and_wait_stream_block();
1926 : 249 : break;
1927 : :
971 akapila@postgresql.o 1928 :UBC 0 : default:
963 1929 [ # # ]: 0 : elog(ERROR, "unexpected apply action: %d", (int) apply_action);
1930 : : break;
1931 : : }
1932 : :
971 akapila@postgresql.o 1933 :CBC 854 : in_streamed_transaction = false;
963 1934 : 854 : stream_xid = InvalidTransactionId;
1935 : :
1936 : : /*
1937 : : * The parallel apply worker could be in a transaction in which case we
1938 : : * need to report the state as STATE_IDLEINTRANSACTION.
1939 : : */
971 1940 [ + + ]: 854 : if (IsTransactionOrTransactionBlock())
1941 : 249 : pgstat_report_activity(STATE_IDLEINTRANSACTION, NULL);
1942 : : else
1943 : 605 : pgstat_report_activity(STATE_IDLE, NULL);
1944 : :
1945 : 854 : reset_apply_error_context_info();
1946 : 854 : }
1947 : :
1948 : : /*
1949 : : * Helper function to handle STREAM ABORT message when the transaction was
1950 : : * serialized to file.
1951 : : */
1952 : : static void
1953 : 14 : stream_abort_internal(TransactionId xid, TransactionId subxid)
1954 : : {
1955 : : /*
1956 : : * If the two XIDs are the same, it's in fact abort of toplevel xact, so
1957 : : * just delete the files with serialized info.
1958 : : */
1829 1959 [ + + ]: 14 : if (xid == subxid)
1960 : 1 : stream_cleanup_files(MyLogicalRepWorker->subid, xid);
1961 : : else
1962 : : {
1963 : : /*
1964 : : * OK, so it's a subxact. We need to read the subxact file for the
1965 : : * toplevel transaction, determine the offset tracked for the subxact,
1966 : : * and truncate the file with changes. We also remove the subxacts
1967 : : * with higher offsets (or rather higher XIDs).
1968 : : *
1969 : : * We intentionally scan the array from the tail, because we're likely
1970 : : * aborting a change for the most recent subtransactions.
1971 : : *
1972 : : * We can't use the binary search here as subxact XIDs won't
1973 : : * necessarily arrive in sorted order, consider the case where we have
1974 : : * released the savepoint for multiple subtransactions and then
1975 : : * performed rollback to savepoint for one of the earlier
1976 : : * sub-transaction.
1977 : : */
1978 : : int64 i;
1979 : : int64 subidx;
1980 : : BufFile *fd;
1981 : 13 : bool found = false;
1982 : : char path[MAXPGPATH];
1983 : :
1984 : 13 : subidx = -1;
1549 tgl@sss.pgh.pa.us 1985 : 13 : begin_replication_step();
1829 akapila@postgresql.o 1986 : 13 : subxact_info_read(MyLogicalRepWorker->subid, xid);
1987 : :
1988 [ + + ]: 15 : for (i = subxact_data.nsubxacts; i > 0; i--)
1989 : : {
1990 [ + + ]: 11 : if (subxact_data.subxacts[i - 1].xid == subxid)
1991 : : {
1992 : 9 : subidx = (i - 1);
1993 : 9 : found = true;
1994 : 9 : break;
1995 : : }
1996 : : }
1997 : :
1998 : : /*
1999 : : * If it's an empty sub-transaction then we will not find the subxid
2000 : : * here so just cleanup the subxact info and return.
2001 : : */
2002 [ + + ]: 13 : if (!found)
2003 : : {
2004 : : /* Cleanup the subxact info */
2005 : 4 : cleanup_subxact_info();
1549 tgl@sss.pgh.pa.us 2006 : 4 : end_replication_step();
1667 akapila@postgresql.o 2007 : 4 : CommitTransactionCommand();
1829 2008 : 4 : return;
2009 : : }
2010 : :
2011 : : /* open the changes file */
971 2012 : 9 : changes_filename(path, MyLogicalRepWorker->subid, xid);
2013 : 9 : fd = BufFileOpenFileSet(MyLogicalRepWorker->stream_fileset, path,
2014 : : O_RDWR, false);
2015 : :
2016 : : /* OK, truncate the file at the right offset */
2017 : 9 : BufFileTruncateFileSet(fd, subxact_data.subxacts[subidx].fileno,
2018 : 9 : subxact_data.subxacts[subidx].offset);
2019 : 9 : BufFileClose(fd);
2020 : :
2021 : : /* discard the subxacts added later */
2022 : 9 : subxact_data.nsubxacts = subidx;
2023 : :
2024 : : /* write the updated subxact list */
2025 : 9 : subxact_info_write(MyLogicalRepWorker->subid, xid);
2026 : :
2027 : 9 : end_replication_step();
2028 : 9 : CommitTransactionCommand();
2029 : : }
2030 : : }
2031 : :
2032 : : /*
2033 : : * Handle STREAM ABORT message.
2034 : : */
2035 : : static void
2036 : 38 : apply_handle_stream_abort(StringInfo s)
2037 : : {
2038 : : TransactionId xid;
2039 : : TransactionId subxid;
2040 : : LogicalRepStreamAbortData abort_data;
2041 : : ParallelApplyWorkerInfo *winfo;
2042 : : TransApplyAction apply_action;
2043 : :
2044 : : /* Save the message before it is consumed. */
2045 : 38 : StringInfoData original_msg = *s;
2046 : : bool toplevel_xact;
2047 : :
2048 [ - + ]: 38 : if (in_streamed_transaction)
971 akapila@postgresql.o 2049 [ # # ]:UBC 0 : ereport(ERROR,
2050 : : (errcode(ERRCODE_PROTOCOL_VIOLATION),
2051 : : errmsg_internal("STREAM ABORT message without STREAM STOP")));
2052 : :
2053 : : /* We receive abort information only when we can apply in parallel. */
971 akapila@postgresql.o 2054 :CBC 38 : logicalrep_read_stream_abort(s, &abort_data,
2055 : 38 : MyLogicalRepWorker->parallel_apply);
2056 : :
2057 : 38 : xid = abort_data.xid;
2058 : 38 : subxid = abort_data.subxid;
2059 : 38 : toplevel_xact = (xid == subxid);
2060 : :
2061 : 38 : set_apply_error_context_xact(subxid, abort_data.abort_lsn);
2062 : :
2063 : 38 : apply_action = get_transaction_apply_action(xid, &winfo);
2064 : :
2065 [ + + + + : 38 : switch (apply_action)
- ]
2066 : : {
963 2067 : 14 : case TRANS_LEADER_APPLY:
2068 : :
2069 : : /*
2070 : : * We are in the leader apply worker and the transaction has been
2071 : : * serialized to file.
2072 : : */
971 2073 : 14 : stream_abort_internal(xid, subxid);
2074 : :
2075 [ - + ]: 14 : elog(DEBUG1, "finished processing the STREAM ABORT command");
2076 : 14 : break;
2077 : :
2078 : 10 : case TRANS_LEADER_SEND_TO_PARALLEL:
2079 [ - + ]: 10 : Assert(winfo);
2080 : :
2081 : : /*
2082 : : * For the case of aborting the subtransaction, we increment the
2083 : : * number of streaming blocks and take the lock again before
2084 : : * sending the STREAM_ABORT to ensure that the parallel apply
2085 : : * worker will wait on the lock for the next set of changes after
2086 : : * processing the STREAM_ABORT message if it is not already
2087 : : * waiting for STREAM_STOP message.
2088 : : *
2089 : : * It is important to perform this locking before sending the
2090 : : * STREAM_ABORT message so that the leader can hold the lock first
2091 : : * and the parallel apply worker will wait for the leader to
2092 : : * release the lock. This is the same as what we do in
2093 : : * apply_handle_stream_stop. See Locking Considerations atop
2094 : : * applyparallelworker.c.
2095 : : */
2096 [ + + ]: 10 : if (!toplevel_xact)
2097 : : {
2098 : 9 : pa_unlock_stream(xid, AccessExclusiveLock);
2099 : 9 : pg_atomic_add_fetch_u32(&winfo->shared->pending_stream_count, 1);
2100 : 9 : pa_lock_stream(xid, AccessExclusiveLock);
2101 : : }
2102 : :
2103 [ + - ]: 10 : if (pa_send_data(winfo, s->len, s->data))
2104 : : {
2105 : : /*
2106 : : * Unlike STREAM_COMMIT and STREAM_PREPARE, we don't need to
2107 : : * wait here for the parallel apply worker to finish as that
2108 : : * is not required to maintain the commit order and won't have
2109 : : * the risk of failures due to transaction dependencies and
2110 : : * deadlocks. However, it is possible that before the parallel
2111 : : * worker finishes and we clear the worker info, the xid
2112 : : * wraparound happens on the upstream and a new transaction
2113 : : * with the same xid can appear and that can lead to duplicate
2114 : : * entries in ParallelApplyTxnHash. Yet another problem could
2115 : : * be that we may have serialized the changes in partial
2116 : : * serialize mode and the file containing xact changes may
2117 : : * already exist, and after xid wraparound trying to create
2118 : : * the file for the same xid can lead to an error. To avoid
2119 : : * these problems, we decide to wait for the aborts to finish.
2120 : : *
2121 : : * Note, it is okay to not update the flush location position
2122 : : * for aborts as in worst case that means such a transaction
2123 : : * won't be sent again after restart.
2124 : : */
2125 [ + + ]: 10 : if (toplevel_xact)
2126 : 1 : pa_xact_finish(winfo, InvalidXLogRecPtr);
2127 : :
2128 : 10 : break;
2129 : : }
2130 : :
2131 : : /*
2132 : : * Switch to serialize mode when we are not able to send the
2133 : : * change to parallel apply worker.
2134 : : */
971 akapila@postgresql.o 2135 :UBC 0 : pa_switch_to_partial_serialize(winfo, true);
2136 : :
2137 : : /* fall through */
971 akapila@postgresql.o 2138 :CBC 2 : case TRANS_LEADER_PARTIAL_SERIALIZE:
2139 [ - + ]: 2 : Assert(winfo);
2140 : :
2141 : : /*
2142 : : * Parallel apply worker might have applied some changes, so write
2143 : : * the STREAM_ABORT message so that it can rollback the
2144 : : * subtransaction if needed.
2145 : : */
2146 : 2 : stream_open_and_write_change(xid, LOGICAL_REP_MSG_STREAM_ABORT,
2147 : : &original_msg);
2148 : :
2149 [ + + ]: 2 : if (toplevel_xact)
2150 : : {
2151 : 1 : pa_set_fileset_state(winfo->shared, FS_SERIALIZE_DONE);
2152 : 1 : pa_xact_finish(winfo, InvalidXLogRecPtr);
2153 : : }
2154 : 2 : break;
2155 : :
2156 : 12 : case TRANS_PARALLEL_APPLY:
2157 : :
2158 : : /*
2159 : : * If the parallel apply worker is applying spooled messages then
2160 : : * close the file before aborting.
2161 : : */
2162 [ + + + + ]: 12 : if (toplevel_xact && stream_fd)
2163 : 1 : stream_close_file();
2164 : :
2165 : 12 : pa_stream_abort(&abort_data);
2166 : :
2167 : : /*
2168 : : * We need to wait after processing rollback to savepoint for the
2169 : : * next set of changes.
2170 : : *
2171 : : * We have a race condition here due to which we can start waiting
2172 : : * here when there are more chunk of streams in the queue. See
2173 : : * apply_handle_stream_stop.
2174 : : */
2175 [ + + ]: 12 : if (!toplevel_xact)
2176 : 10 : pa_decr_and_wait_stream_block();
2177 : :
2178 [ + + ]: 12 : elog(DEBUG1, "finished processing the STREAM ABORT command");
2179 : 12 : break;
2180 : :
971 akapila@postgresql.o 2181 :UBC 0 : default:
963 2182 [ # # ]: 0 : elog(ERROR, "unexpected apply action: %d", (int) apply_action);
2183 : : break;
2184 : : }
2185 : :
971 akapila@postgresql.o 2186 :CBC 38 : reset_apply_error_context_info();
2187 : 38 : }
2188 : :
2189 : : /*
2190 : : * Ensure that the passed location is fileset's end.
2191 : : */
2192 : : static void
2193 : 4 : ensure_last_message(FileSet *stream_fileset, TransactionId xid, int fileno,
2194 : : off_t offset)
2195 : : {
2196 : : char path[MAXPGPATH];
2197 : : BufFile *fd;
2198 : : int last_fileno;
2199 : : off_t last_offset;
2200 : :
2201 [ - + ]: 4 : Assert(!IsTransactionState());
2202 : :
2203 : 4 : begin_replication_step();
2204 : :
2205 : 4 : changes_filename(path, MyLogicalRepWorker->subid, xid);
2206 : :
2207 : 4 : fd = BufFileOpenFileSet(stream_fileset, path, O_RDONLY, false);
2208 : :
2209 : 4 : BufFileSeek(fd, 0, 0, SEEK_END);
2210 : 4 : BufFileTell(fd, &last_fileno, &last_offset);
2211 : :
2212 : 4 : BufFileClose(fd);
2213 : :
2214 : 4 : end_replication_step();
2215 : :
2216 [ + - - + ]: 4 : if (last_fileno != fileno || last_offset != offset)
971 akapila@postgresql.o 2217 [ # # ]:UBC 0 : elog(ERROR, "unexpected message left in streaming transaction's changes file \"%s\"",
2218 : : path);
1829 akapila@postgresql.o 2219 :CBC 4 : }
2220 : :
2221 : : /*
2222 : : * Common spoolfile processing.
2223 : : */
2224 : : void
971 2225 : 31 : apply_spooled_messages(FileSet *stream_fileset, TransactionId xid,
2226 : : XLogRecPtr lsn)
2227 : : {
2228 : : int nchanges;
2229 : : char path[MAXPGPATH];
1829 2230 : 31 : char *buffer = NULL;
2231 : : MemoryContext oldcxt;
2232 : : ResourceOwner oldowner;
2233 : : int fileno;
2234 : : off_t offset;
2235 : :
971 2236 [ + + ]: 31 : if (!am_parallel_apply_worker())
2237 : 27 : maybe_start_skipping_changes(lsn);
2238 : :
2239 : : /* Make sure we have an open transaction */
1549 tgl@sss.pgh.pa.us 2240 : 31 : begin_replication_step();
2241 : :
2242 : : /*
2243 : : * Allocate file handle and memory required to process all the messages in
2244 : : * TopTransactionContext to avoid them getting reset after each message is
2245 : : * processed.
2246 : : */
1829 akapila@postgresql.o 2247 : 31 : oldcxt = MemoryContextSwitchTo(TopTransactionContext);
2248 : :
2249 : : /* Open the spool file for the committed/prepared transaction */
2250 : 31 : changes_filename(path, MyLogicalRepWorker->subid, xid);
2251 [ - + ]: 31 : elog(DEBUG1, "replaying changes from file \"%s\"", path);
2252 : :
2253 : : /*
2254 : : * Make sure the file is owned by the toplevel transaction so that the
2255 : : * file will not be accidentally closed when aborting a subtransaction.
2256 : : */
971 2257 : 31 : oldowner = CurrentResourceOwner;
2258 : 31 : CurrentResourceOwner = TopTransactionResourceOwner;
2259 : :
2260 : 31 : stream_fd = BufFileOpenFileSet(stream_fileset, path, O_RDONLY, false);
2261 : :
2262 : 31 : CurrentResourceOwner = oldowner;
2263 : :
1829 2264 : 31 : buffer = palloc(BLCKSZ);
2265 : :
2266 : 31 : MemoryContextSwitchTo(oldcxt);
2267 : :
1500 2268 : 31 : remote_final_lsn = lsn;
2269 : :
2270 : : /*
2271 : : * Make sure the handle apply_dispatch methods are aware we're in a remote
2272 : : * transaction.
2273 : : */
1829 2274 : 31 : in_remote_transaction = true;
2275 : 31 : pgstat_report_activity(STATE_RUNNING, NULL);
2276 : :
1549 tgl@sss.pgh.pa.us 2277 : 31 : end_replication_step();
2278 : :
2279 : : /*
2280 : : * Read the entries one by one and pass them through the same logic as in
2281 : : * apply_dispatch.
2282 : : */
1829 akapila@postgresql.o 2283 : 31 : nchanges = 0;
2284 : : while (true)
2285 : 88470 : {
2286 : : StringInfoData s2;
2287 : : size_t nbytes;
2288 : : int len;
2289 : :
2290 [ - + ]: 88501 : CHECK_FOR_INTERRUPTS();
2291 : :
2292 : : /* read length of the on-disk record */
964 peter@eisentraut.org 2293 : 88501 : nbytes = BufFileReadMaybeEOF(stream_fd, &len, sizeof(len), true);
2294 : :
2295 : : /* have we reached end of the file? */
1829 akapila@postgresql.o 2296 [ + + ]: 88501 : if (nbytes == 0)
2297 : 26 : break;
2298 : :
2299 : : /* do we have a correct length? */
1547 tgl@sss.pgh.pa.us 2300 [ - + ]: 88475 : if (len <= 0)
1547 tgl@sss.pgh.pa.us 2301 [ # # ]:UBC 0 : elog(ERROR, "incorrect length %d in streaming transaction's changes file \"%s\"",
2302 : : len, path);
2303 : :
2304 : : /* make sure we have sufficiently large buffer */
1829 akapila@postgresql.o 2305 :CBC 88475 : buffer = repalloc(buffer, len);
2306 : :
2307 : : /* and finally read the data into the buffer */
964 peter@eisentraut.org 2308 : 88475 : BufFileReadExact(stream_fd, buffer, len);
2309 : :
971 akapila@postgresql.o 2310 : 88475 : BufFileTell(stream_fd, &fileno, &offset);
2311 : :
2312 : : /* init a stringinfo using the buffer and call apply_dispatch */
669 drowley@postgresql.o 2313 : 88475 : initReadOnlyStringInfo(&s2, buffer, len);
2314 : :
2315 : : /* Ensure we are reading the data into our memory context. */
1829 akapila@postgresql.o 2316 : 88475 : oldcxt = MemoryContextSwitchTo(ApplyMessageContext);
2317 : :
2318 : 88475 : apply_dispatch(&s2);
2319 : :
2320 : 88474 : MemoryContextReset(ApplyMessageContext);
2321 : :
2322 : 88474 : MemoryContextSwitchTo(oldcxt);
2323 : :
2324 : 88474 : nchanges++;
2325 : :
2326 : : /*
2327 : : * It is possible the file has been closed because we have processed
2328 : : * the transaction end message like stream_commit in which case that
2329 : : * must be the last message.
2330 : : */
971 2331 [ + + ]: 88474 : if (!stream_fd)
2332 : : {
2333 : 4 : ensure_last_message(stream_fileset, xid, fileno, offset);
2334 : 4 : break;
2335 : : }
2336 : :
1829 2337 [ + + ]: 88470 : if (nchanges % 1000 == 0)
1547 tgl@sss.pgh.pa.us 2338 [ - + ]: 84 : elog(DEBUG1, "replayed %d changes from file \"%s\"",
2339 : : nchanges, path);
2340 : : }
2341 : :
971 akapila@postgresql.o 2342 [ + + ]: 30 : if (stream_fd)
2343 : 26 : stream_close_file();
2344 : :
1829 2345 [ - + ]: 30 : elog(DEBUG1, "replayed %d (all) changes from file \"%s\"",
2346 : : nchanges, path);
2347 : :
1500 2348 : 30 : return;
2349 : : }
2350 : :
2351 : : /*
2352 : : * Handle STREAM COMMIT message.
2353 : : */
2354 : : static void
2355 : 61 : apply_handle_stream_commit(StringInfo s)
2356 : : {
2357 : : TransactionId xid;
2358 : : LogicalRepCommitData commit_data;
2359 : : ParallelApplyWorkerInfo *winfo;
2360 : : TransApplyAction apply_action;
2361 : :
2362 : : /* Save the message before it is consumed. */
971 2363 : 61 : StringInfoData original_msg = *s;
2364 : :
1500 2365 [ - + ]: 61 : if (in_streamed_transaction)
1500 akapila@postgresql.o 2366 [ # # ]:UBC 0 : ereport(ERROR,
2367 : : (errcode(ERRCODE_PROTOCOL_VIOLATION),
2368 : : errmsg_internal("STREAM COMMIT message without STREAM STOP")));
2369 : :
1500 akapila@postgresql.o 2370 :CBC 61 : xid = logicalrep_read_stream_commit(s, &commit_data);
1278 2371 : 61 : set_apply_error_context_xact(xid, commit_data.commit_lsn);
2372 : :
971 2373 : 61 : apply_action = get_transaction_apply_action(xid, &winfo);
2374 : :
2375 [ + + + + : 61 : switch (apply_action)
- ]
2376 : : {
963 2377 : 22 : case TRANS_LEADER_APPLY:
2378 : :
2379 : : /*
2380 : : * The transaction has been serialized to file, so replay all the
2381 : : * spooled operations.
2382 : : */
971 2383 : 22 : apply_spooled_messages(MyLogicalRepWorker->stream_fileset, xid,
2384 : : commit_data.commit_lsn);
2385 : :
2386 : 21 : apply_handle_commit_internal(&commit_data);
2387 : :
2388 : : /* Unlink the files with serialized changes and subxact info. */
2389 : 21 : stream_cleanup_files(MyLogicalRepWorker->subid, xid);
2390 : :
2391 [ - + ]: 21 : elog(DEBUG1, "finished processing the STREAM COMMIT command");
2392 : 21 : break;
2393 : :
2394 : 18 : case TRANS_LEADER_SEND_TO_PARALLEL:
2395 [ - + ]: 18 : Assert(winfo);
2396 : :
2397 [ + - ]: 18 : if (pa_send_data(winfo, s->len, s->data))
2398 : : {
2399 : : /* Finish processing the streaming transaction. */
2400 : 18 : pa_xact_finish(winfo, commit_data.end_lsn);
2401 : 17 : break;
2402 : : }
2403 : :
2404 : : /*
2405 : : * Switch to serialize mode when we are not able to send the
2406 : : * change to parallel apply worker.
2407 : : */
971 akapila@postgresql.o 2408 :UBC 0 : pa_switch_to_partial_serialize(winfo, true);
2409 : :
2410 : : /* fall through */
971 akapila@postgresql.o 2411 :CBC 2 : case TRANS_LEADER_PARTIAL_SERIALIZE:
2412 [ - + ]: 2 : Assert(winfo);
2413 : :
2414 : 2 : stream_open_and_write_change(xid, LOGICAL_REP_MSG_STREAM_COMMIT,
2415 : : &original_msg);
2416 : :
2417 : 2 : pa_set_fileset_state(winfo->shared, FS_SERIALIZE_DONE);
2418 : :
2419 : : /* Finish processing the streaming transaction. */
2420 : 2 : pa_xact_finish(winfo, commit_data.end_lsn);
2421 : 2 : break;
2422 : :
2423 : 19 : case TRANS_PARALLEL_APPLY:
2424 : :
2425 : : /*
2426 : : * If the parallel apply worker is applying spooled messages then
2427 : : * close the file before committing.
2428 : : */
2429 [ + + ]: 19 : if (stream_fd)
2430 : 2 : stream_close_file();
2431 : :
2432 : 19 : apply_handle_commit_internal(&commit_data);
2433 : :
2434 : 19 : MyParallelShared->last_commit_end = XactLastCommitEnd;
2435 : :
2436 : : /*
2437 : : * It is important to set the transaction state as finished before
2438 : : * releasing the lock. See pa_wait_for_xact_finish.
2439 : : */
2440 : 19 : pa_set_xact_state(MyParallelShared, PARALLEL_TRANS_FINISHED);
2441 : 19 : pa_unlock_transaction(xid, AccessExclusiveLock);
2442 : :
2443 : 19 : pa_reset_subtrans();
2444 : :
2445 [ + + ]: 19 : elog(DEBUG1, "finished processing the STREAM COMMIT command");
2446 : 19 : break;
2447 : :
971 akapila@postgresql.o 2448 :UBC 0 : default:
963 2449 [ # # ]: 0 : elog(ERROR, "unexpected apply action: %d", (int) apply_action);
2450 : : break;
2451 : : }
2452 : :
2453 : : /* Process any tables that are being synchronized in parallel. */
1744 akapila@postgresql.o 2454 :CBC 59 : process_syncing_tables(commit_data.end_lsn);
2455 : :
1829 2456 : 59 : pgstat_report_activity(STATE_IDLE, NULL);
2457 : :
1471 2458 : 59 : reset_apply_error_context_info();
1829 2459 : 59 : }
2460 : :
2461 : : /*
2462 : : * Helper function for apply_handle_commit and apply_handle_stream_commit.
2463 : : */
2464 : : static void
1499 2465 : 472 : apply_handle_commit_internal(LogicalRepCommitData *commit_data)
2466 : : {
1264 2467 [ + + ]: 472 : if (is_skipping_changes())
2468 : : {
2469 : 2 : stop_skipping_changes();
2470 : :
2471 : : /*
2472 : : * Start a new transaction to clear the subskiplsn, if not started
2473 : : * yet.
2474 : : */
2475 [ + + ]: 2 : if (!IsTransactionState())
2476 : 1 : StartTransactionCommand();
2477 : : }
2478 : :
1667 2479 [ + - ]: 472 : if (IsTransactionState())
2480 : : {
2481 : : /*
2482 : : * The transaction is either non-empty or skipped, so we clear the
2483 : : * subskiplsn.
2484 : : */
1264 2485 : 472 : clear_subscription_skip_lsn(commit_data->commit_lsn);
2486 : :
2487 : : /*
2488 : : * Update origin state so we can restart streaming from correct
2489 : : * position in case of crash.
2490 : : */
1744 2491 : 472 : replorigin_session_origin_lsn = commit_data->end_lsn;
2492 : 472 : replorigin_session_origin_timestamp = commit_data->committime;
2493 : :
2494 : 472 : CommitTransactionCommand();
2495 : :
971 2496 [ + + ]: 472 : if (IsTransactionBlock())
2497 : : {
2498 : 4 : EndTransactionBlock(false);
2499 : 4 : CommitTransactionCommand();
2500 : : }
2501 : :
1744 2502 : 472 : pgstat_report_stat(false);
2503 : :
971 2504 : 472 : store_flush_position(commit_data->end_lsn, XactLastCommitEnd);
2505 : : }
2506 : : else
2507 : : {
2508 : : /* Process any invalidation messages that might have accumulated. */
1744 akapila@postgresql.o 2509 :UBC 0 : AcceptInvalidationMessages();
2510 : 0 : maybe_reread_subscription();
2511 : : }
2512 : :
1744 akapila@postgresql.o 2513 :CBC 472 : in_remote_transaction = false;
2514 : 472 : }
2515 : :
2516 : : /*
2517 : : * Handle RELATION message.
2518 : : *
2519 : : * Note we don't do validation against local schema here. The validation
2520 : : * against local schema is postponed until first change for given relation
2521 : : * comes as we only care about it when applying changes for it anyway and we
2522 : : * do less locking this way.
2523 : : */
2524 : : static void
3152 peter_e@gmx.net 2525 : 462 : apply_handle_relation(StringInfo s)
2526 : : {
2527 : : LogicalRepRelation *rel;
2528 : :
1745 akapila@postgresql.o 2529 [ + + ]: 462 : if (handle_streamed_transaction(LOGICAL_REP_MSG_RELATION, s))
1829 2530 : 36 : return;
2531 : :
3152 peter_e@gmx.net 2532 : 426 : rel = logicalrep_read_rel(s);
2533 : 426 : logicalrep_relmap_update(rel);
2534 : :
2535 : : /* Also reset all entries in the partition map that refer to remoterel. */
1178 akapila@postgresql.o 2536 : 426 : logicalrep_partmap_reset_relmap(rel);
2537 : : }
2538 : :
2539 : : /*
2540 : : * Handle TYPE message.
2541 : : *
2542 : : * This implementation pays no attention to TYPE messages; we expect the user
2543 : : * to have set things up so that the incoming data is acceptable to the input
2544 : : * functions for the locally subscribed tables. Hence, we just read and
2545 : : * discard the message.
2546 : : */
2547 : : static void
3152 peter_e@gmx.net 2548 : 18 : apply_handle_type(StringInfo s)
2549 : : {
2550 : : LogicalRepTyp typ;
2551 : :
1745 akapila@postgresql.o 2552 [ - + ]: 18 : if (handle_streamed_transaction(LOGICAL_REP_MSG_TYPE, s))
1829 akapila@postgresql.o 2553 :UBC 0 : return;
2554 : :
3152 peter_e@gmx.net 2555 :CBC 18 : logicalrep_read_typ(s, &typ);
2556 : : }
2557 : :
2558 : : /*
2559 : : * Check that we (the subscription owner) have sufficient privileges on the
2560 : : * target relation to perform the given operation.
2561 : : */
2562 : : static void
1338 jdavis@postgresql.or 2563 : 220361 : TargetPrivilegesCheck(Relation rel, AclMode mode)
2564 : : {
2565 : : Oid relid;
2566 : : AclResult aclresult;
2567 : :
2568 : 220361 : relid = RelationGetRelid(rel);
2569 : 220361 : aclresult = pg_class_aclcheck(relid, GetUserId(), mode);
2570 [ + + ]: 220361 : if (aclresult != ACLCHECK_OK)
2571 : 9 : aclcheck_error(aclresult,
2572 : 9 : get_relkind_objtype(rel->rd_rel->relkind),
2573 : 9 : get_rel_name(relid));
2574 : :
2575 : : /*
2576 : : * We lack the infrastructure to honor RLS policies. It might be possible
2577 : : * to add such infrastructure here, but tablesync workers lack it, too, so
2578 : : * we don't bother. RLS does not ordinarily apply to TRUNCATE commands,
2579 : : * but it seems dangerous to replicate a TRUNCATE and then refuse to
2580 : : * replicate subsequent INSERTs, so we forbid all commands the same.
2581 : : */
2582 [ + + ]: 220352 : if (check_enable_rls(relid, InvalidOid, false) == RLS_ENABLED)
2583 [ + - ]: 4 : ereport(ERROR,
2584 : : (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
2585 : : errmsg("user \"%s\" cannot replicate into relation with row-level security enabled: \"%s\"",
2586 : : GetUserNameFromId(GetUserId(), true),
2587 : : RelationGetRelationName(rel))));
2588 : 220348 : }
2589 : :
2590 : : /*
2591 : : * Handle INSERT message.
2592 : : */
2593 : :
2594 : : static void
3152 peter_e@gmx.net 2595 : 185897 : apply_handle_insert(StringInfo s)
2596 : : {
2597 : : LogicalRepRelMapEntry *rel;
2598 : : LogicalRepTupleData newtup;
2599 : : LogicalRepRelId relid;
2600 : : UserContext ucxt;
2601 : : ApplyExecutionData *edata;
2602 : : EState *estate;
2603 : : TupleTableSlot *remoteslot;
2604 : : MemoryContext oldctx;
2605 : : bool run_as_owner;
2606 : :
2607 : : /*
2608 : : * Quick return if we are skipping data modification changes or handling
2609 : : * streamed transactions.
2610 : : */
1264 akapila@postgresql.o 2611 [ + + + + ]: 361793 : if (is_skipping_changes() ||
2612 : 175896 : handle_streamed_transaction(LOGICAL_REP_MSG_INSERT, s))
1829 2613 : 110056 : return;
2614 : :
1549 tgl@sss.pgh.pa.us 2615 : 75890 : begin_replication_step();
2616 : :
3152 peter_e@gmx.net 2617 : 75888 : relid = logicalrep_read_insert(s, &newtup);
2618 : 75888 : rel = logicalrep_rel_open(relid, RowExclusiveLock);
3089 2619 [ + + ]: 75879 : if (!should_apply_changes_for_rel(rel))
2620 : : {
2621 : : /*
2622 : : * The relation can't become interesting in the middle of the
2623 : : * transaction so it's safe to unlock it.
2624 : : */
2625 : 49 : logicalrep_rel_close(rel, RowExclusiveLock);
1549 tgl@sss.pgh.pa.us 2626 : 49 : end_replication_step();
3089 peter_e@gmx.net 2627 : 49 : return;
2628 : : }
2629 : :
2630 : : /*
2631 : : * Make sure that any user-supplied code runs as the table owner, unless
2632 : : * the user has opted out of that behavior.
2633 : : */
886 rhaas@postgresql.org 2634 : 75830 : run_as_owner = MySubscription->runasowner;
2635 [ + + ]: 75830 : if (!run_as_owner)
2636 : 75821 : SwitchToUntrustedUser(rel->localrel->rd_rel->relowner, &ucxt);
2637 : :
2638 : : /* Set relation for error callback */
1471 akapila@postgresql.o 2639 : 75830 : apply_error_callback_arg.rel = rel;
2640 : :
2641 : : /* Initialize the executor state. */
1568 tgl@sss.pgh.pa.us 2642 : 75830 : edata = create_edata_for_relation(rel);
2643 : 75830 : estate = edata->estate;
2759 andres@anarazel.de 2644 : 75830 : remoteslot = ExecInitExtraTupleSlot(estate,
2487 2645 : 75830 : RelationGetDescr(rel->localrel),
2646 : : &TTSOpsVirtual);
2647 : :
2648 : : /* Process and store remote tuple in the slot */
3152 peter_e@gmx.net 2649 [ - + ]: 75830 : oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
1876 tgl@sss.pgh.pa.us 2650 : 75830 : slot_store_data(remoteslot, rel, &newtup);
3152 peter_e@gmx.net 2651 : 75830 : slot_fill_defaults(rel, estate, remoteslot);
2652 : 75830 : MemoryContextSwitchTo(oldctx);
2653 : :
2654 : : /* For a partitioned table, insert the tuple into a partition. */
1979 peter@eisentraut.org 2655 [ + + ]: 75830 : if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
1568 tgl@sss.pgh.pa.us 2656 : 58 : apply_handle_tuple_routing(edata,
2657 : : remoteslot, NULL, CMD_INSERT);
2658 : : else
2659 : : {
199 2660 : 75772 : ResultRelInfo *relinfo = edata->targetRelInfo;
2661 : :
151 akapila@postgresql.o 2662 : 75772 : ExecOpenIndices(relinfo, false);
199 tgl@sss.pgh.pa.us 2663 : 75772 : apply_handle_insert_internal(edata, relinfo, remoteslot);
2664 : 75755 : ExecCloseIndices(relinfo);
2665 : : }
2666 : :
1568 2667 : 75799 : finish_edata(edata);
2668 : :
2669 : : /* Reset relation for error callback */
1471 akapila@postgresql.o 2670 : 75799 : apply_error_callback_arg.rel = NULL;
2671 : :
886 rhaas@postgresql.org 2672 [ + + ]: 75799 : if (!run_as_owner)
2673 : 75794 : RestoreUserContext(&ucxt);
2674 : :
3152 peter_e@gmx.net 2675 : 75799 : logicalrep_rel_close(rel, NoLock);
2676 : :
1549 tgl@sss.pgh.pa.us 2677 : 75799 : end_replication_step();
2678 : : }
2679 : :
2680 : : /*
2681 : : * Workhorse for apply_handle_insert()
2682 : : * relinfo is for the relation we're actually inserting into
2683 : : * (could be a child partition of edata->targetRelInfo)
2684 : : */
2685 : : static void
1568 2686 : 75831 : apply_handle_insert_internal(ApplyExecutionData *edata,
2687 : : ResultRelInfo *relinfo,
2688 : : TupleTableSlot *remoteslot)
2689 : : {
2690 : 75831 : EState *estate = edata->estate;
2691 : :
2692 : : /* Caller should have opened indexes already. */
199 2693 [ + + + + : 75831 : Assert(relinfo->ri_IndexRelationDescs != NULL ||
- + ]
2694 : : !relinfo->ri_RelationDesc->rd_rel->relhasindex ||
2695 : : RelationGetIndexList(relinfo->ri_RelationDesc) == NIL);
2696 : :
2697 : : /* Caller will not have done this bit. */
2698 [ - + ]: 75831 : Assert(relinfo->ri_onConflictArbiterIndexes == NIL);
382 akapila@postgresql.o 2699 : 75831 : InitConflictIndexes(relinfo);
2700 : :
2701 : : /* Do the insert. */
1338 jdavis@postgresql.or 2702 : 75831 : TargetPrivilegesCheck(relinfo->ri_RelationDesc, ACL_INSERT);
1788 heikki.linnakangas@i 2703 : 75823 : ExecSimpleRelationInsert(relinfo, estate, remoteslot);
1992 peter@eisentraut.org 2704 : 75800 : }
2705 : :
2706 : : /*
2707 : : * Check if the logical replication relation is updatable and throw
2708 : : * appropriate error if it isn't.
2709 : : */
2710 : : static void
3152 peter_e@gmx.net 2711 : 72291 : check_relation_updatable(LogicalRepRelMapEntry *rel)
2712 : : {
2713 : : /*
2714 : : * For partitioned tables, we only need to care if the target partition is
2715 : : * updatable (aka has PK or RI defined for it).
2716 : : */
1173 akapila@postgresql.o 2717 [ + + ]: 72291 : if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
2718 : 30 : return;
2719 : :
2720 : : /* Updatable, no error. */
3152 peter_e@gmx.net 2721 [ + - ]: 72261 : if (rel->updatable)
2722 : 72261 : return;
2723 : :
2724 : : /*
2725 : : * We are in error mode so it's fine this is somewhat slow. It's better to
2726 : : * give user correct error.
2727 : : */
3152 peter_e@gmx.net 2728 [ # # ]:UBC 0 : if (OidIsValid(GetRelationIdentityOrPK(rel->localrel)))
2729 : : {
2730 [ # # ]: 0 : ereport(ERROR,
2731 : : (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
2732 : : errmsg("publisher did not send replica identity column "
2733 : : "expected by the logical replication target relation \"%s.%s\"",
2734 : : rel->remoterel.nspname, rel->remoterel.relname)));
2735 : : }
2736 : :
2737 [ # # ]: 0 : ereport(ERROR,
2738 : : (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
2739 : : errmsg("logical replication target relation \"%s.%s\" has "
2740 : : "neither REPLICA IDENTITY index nor PRIMARY "
2741 : : "KEY and published relation does not have "
2742 : : "REPLICA IDENTITY FULL",
2743 : : rel->remoterel.nspname, rel->remoterel.relname)));
2744 : : }
2745 : :
2746 : : /*
2747 : : * Handle UPDATE message.
2748 : : *
2749 : : * TODO: FDW support
2750 : : */
2751 : : static void
3152 peter_e@gmx.net 2752 :CBC 66165 : apply_handle_update(StringInfo s)
2753 : : {
2754 : : LogicalRepRelMapEntry *rel;
2755 : : LogicalRepRelId relid;
2756 : : UserContext ucxt;
2757 : : ApplyExecutionData *edata;
2758 : : EState *estate;
2759 : : LogicalRepTupleData oldtup;
2760 : : LogicalRepTupleData newtup;
2761 : : bool has_oldtup;
2762 : : TupleTableSlot *remoteslot;
2763 : : RTEPermissionInfo *target_perminfo;
2764 : : MemoryContext oldctx;
2765 : : bool run_as_owner;
2766 : :
2767 : : /*
2768 : : * Quick return if we are skipping data modification changes or handling
2769 : : * streamed transactions.
2770 : : */
1264 akapila@postgresql.o 2771 [ + + + + ]: 132327 : if (is_skipping_changes() ||
2772 : 66162 : handle_streamed_transaction(LOGICAL_REP_MSG_UPDATE, s))
1829 2773 : 34223 : return;
2774 : :
1549 tgl@sss.pgh.pa.us 2775 : 31942 : begin_replication_step();
2776 : :
3152 peter_e@gmx.net 2777 : 31941 : relid = logicalrep_read_update(s, &has_oldtup, &oldtup,
2778 : : &newtup);
2779 : 31941 : rel = logicalrep_rel_open(relid, RowExclusiveLock);
3089 2780 [ - + ]: 31941 : if (!should_apply_changes_for_rel(rel))
2781 : : {
2782 : : /*
2783 : : * The relation can't become interesting in the middle of the
2784 : : * transaction so it's safe to unlock it.
2785 : : */
3089 peter_e@gmx.net 2786 :UBC 0 : logicalrep_rel_close(rel, RowExclusiveLock);
1549 tgl@sss.pgh.pa.us 2787 : 0 : end_replication_step();
3089 peter_e@gmx.net 2788 : 0 : return;
2789 : : }
2790 : :
2791 : : /* Set relation for error callback */
1471 akapila@postgresql.o 2792 :CBC 31941 : apply_error_callback_arg.rel = rel;
2793 : :
2794 : : /* Check if we can do the update. */
3152 peter_e@gmx.net 2795 : 31941 : check_relation_updatable(rel);
2796 : :
2797 : : /*
2798 : : * Make sure that any user-supplied code runs as the table owner, unless
2799 : : * the user has opted out of that behavior.
2800 : : */
886 rhaas@postgresql.org 2801 : 31941 : run_as_owner = MySubscription->runasowner;
2802 [ + + ]: 31941 : if (!run_as_owner)
2803 : 31938 : SwitchToUntrustedUser(rel->localrel->rd_rel->relowner, &ucxt);
2804 : :
2805 : : /* Initialize the executor state. */
1568 tgl@sss.pgh.pa.us 2806 : 31940 : edata = create_edata_for_relation(rel);
2807 : 31940 : estate = edata->estate;
2759 andres@anarazel.de 2808 : 31940 : remoteslot = ExecInitExtraTupleSlot(estate,
2487 2809 : 31940 : RelationGetDescr(rel->localrel),
2810 : : &TTSOpsVirtual);
2811 : :
2812 : : /*
2813 : : * Populate updatedCols so that per-column triggers can fire, and so
2814 : : * executor can correctly pass down indexUnchanged hint. This could
2815 : : * include more columns than were actually changed on the publisher
2816 : : * because the logical replication protocol doesn't contain that
2817 : : * information. But it would for example exclude columns that only exist
2818 : : * on the subscriber, since we are not touching those.
2819 : : */
1005 alvherre@alvh.no-ip. 2820 : 31940 : target_perminfo = list_nth(estate->es_rteperminfos, 0);
2070 peter@eisentraut.org 2821 [ + + ]: 159324 : for (int i = 0; i < remoteslot->tts_tupleDescriptor->natts; i++)
2822 : : {
1874 tgl@sss.pgh.pa.us 2823 : 127384 : Form_pg_attribute att = TupleDescAttr(remoteslot->tts_tupleDescriptor, i);
2824 : 127384 : int remoteattnum = rel->attrmap->attnums[i];
2825 : :
2826 [ + + + + ]: 127384 : if (!att->attisdropped && remoteattnum >= 0)
2827 : : {
2828 [ - + ]: 68867 : Assert(remoteattnum < newtup.ncols);
2829 [ + + ]: 68867 : if (newtup.colstatus[remoteattnum] != LOGICALREP_COLUMN_UNCHANGED)
1005 alvherre@alvh.no-ip. 2830 : 68864 : target_perminfo->updatedCols =
2831 : 68864 : bms_add_member(target_perminfo->updatedCols,
2832 : : i + 1 - FirstLowInvalidHeapAttributeNumber);
2833 : : }
2834 : : }
2835 : :
2836 : : /* Build the search tuple. */
3152 peter_e@gmx.net 2837 [ - + ]: 31940 : oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
1876 tgl@sss.pgh.pa.us 2838 : 31940 : slot_store_data(remoteslot, rel,
2839 [ + + ]: 31940 : has_oldtup ? &oldtup : &newtup);
3152 peter_e@gmx.net 2840 : 31940 : MemoryContextSwitchTo(oldctx);
2841 : :
2842 : : /* For a partitioned table, apply update to correct partition. */
1979 peter@eisentraut.org 2843 [ + + ]: 31940 : if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
1568 tgl@sss.pgh.pa.us 2844 : 13 : apply_handle_tuple_routing(edata,
2845 : : remoteslot, &newtup, CMD_UPDATE);
2846 : : else
2847 : 31927 : apply_handle_update_internal(edata, edata->targetRelInfo,
2848 : : remoteslot, &newtup, rel->localindexoid);
2849 : :
2850 : 31933 : finish_edata(edata);
2851 : :
2852 : : /* Reset relation for error callback */
1471 akapila@postgresql.o 2853 : 31933 : apply_error_callback_arg.rel = NULL;
2854 : :
886 rhaas@postgresql.org 2855 [ + + ]: 31933 : if (!run_as_owner)
2856 : 31931 : RestoreUserContext(&ucxt);
2857 : :
1992 peter@eisentraut.org 2858 : 31933 : logicalrep_rel_close(rel, NoLock);
2859 : :
1549 tgl@sss.pgh.pa.us 2860 : 31933 : end_replication_step();
2861 : : }
2862 : :
2863 : : /*
2864 : : * Workhorse for apply_handle_update()
2865 : : * relinfo is for the relation we're actually updating in
2866 : : * (could be a child partition of edata->targetRelInfo)
2867 : : */
2868 : : static void
1568 2869 : 31927 : apply_handle_update_internal(ApplyExecutionData *edata,
2870 : : ResultRelInfo *relinfo,
2871 : : TupleTableSlot *remoteslot,
2872 : : LogicalRepTupleData *newtup,
2873 : : Oid localindexoid)
2874 : : {
2875 : 31927 : EState *estate = edata->estate;
2876 : 31927 : LogicalRepRelMapEntry *relmapentry = edata->targetRel;
1992 peter@eisentraut.org 2877 : 31927 : Relation localrel = relinfo->ri_RelationDesc;
2878 : : EPQState epqstate;
166 akapila@postgresql.o 2879 : 31927 : TupleTableSlot *localslot = NULL;
2880 : 31927 : ConflictTupleInfo conflicttuple = {0};
2881 : : bool found;
2882 : : MemoryContext oldctx;
2883 : :
841 tgl@sss.pgh.pa.us 2884 : 31927 : EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1, NIL);
151 akapila@postgresql.o 2885 : 31927 : ExecOpenIndices(relinfo, false);
2886 : :
774 msawada@postgresql.o 2887 : 31927 : found = FindReplTupleInLocalRel(edata, localrel,
2888 : : &relmapentry->remoterel,
2889 : : localindexoid,
2890 : : remoteslot, &localslot);
2891 : :
2892 : : /*
2893 : : * Tuple found.
2894 : : *
2895 : : * Note this will fail if there are other conflicting unique indexes.
2896 : : */
3152 peter_e@gmx.net 2897 [ + + ]: 31922 : if (found)
2898 : : {
2899 : : /*
2900 : : * Report the conflict if the tuple was modified by a different
2901 : : * origin.
2902 : : */
166 akapila@postgresql.o 2903 [ + + ]: 31913 : if (GetTupleTransactionInfo(localslot, &conflicttuple.xmin,
2904 : 2 : &conflicttuple.origin, &conflicttuple.ts) &&
2905 [ + - ]: 2 : conflicttuple.origin != replorigin_session_origin)
2906 : : {
2907 : : TupleTableSlot *newslot;
2908 : :
2909 : : /* Store the new tuple for conflict reporting */
382 2910 : 2 : newslot = table_slot_create(localrel, &estate->es_tupleTable);
2911 : 2 : slot_store_data(newslot, relmapentry, newtup);
2912 : :
166 2913 : 2 : conflicttuple.slot = localslot;
2914 : :
373 2915 : 2 : ReportApplyConflict(estate, relinfo, LOG, CT_UPDATE_ORIGIN_DIFFERS,
2916 : : remoteslot, newslot,
166 2917 : 2 : list_make1(&conflicttuple));
2918 : : }
2919 : :
2920 : : /* Process and store remote tuple in the slot */
3152 peter_e@gmx.net 2921 [ + - ]: 31913 : oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
1876 tgl@sss.pgh.pa.us 2922 : 31913 : slot_modify_data(remoteslot, localslot, relmapentry, newtup);
3152 peter_e@gmx.net 2923 : 31913 : MemoryContextSwitchTo(oldctx);
2924 : :
2925 : 31913 : EvalPlanQualSetSlot(&epqstate, remoteslot);
2926 : :
382 akapila@postgresql.o 2927 : 31913 : InitConflictIndexes(relinfo);
2928 : :
2929 : : /* Do the actual update. */
1338 jdavis@postgresql.or 2930 : 31913 : TargetPrivilegesCheck(relinfo->ri_RelationDesc, ACL_UPDATE);
1788 heikki.linnakangas@i 2931 : 31913 : ExecSimpleRelationUpdate(relinfo, estate, &epqstate, localslot,
2932 : : remoteslot);
2933 : : }
2934 : : else
2935 : : {
2936 : : ConflictType type;
382 akapila@postgresql.o 2937 : 9 : TupleTableSlot *newslot = localslot;
2938 : :
2939 : : /*
2940 : : * Detecting whether the tuple was recently deleted or never existed
2941 : : * is crucial to avoid misleading the user during conflict handling.
2942 : : */
33 akapila@postgresql.o 2943 [ + + ]:GNC 9 : if (FindDeletedTupleInLocalRel(localrel, localindexoid, remoteslot,
2944 : : &conflicttuple.xmin,
2945 : : &conflicttuple.origin,
2946 : 2 : &conflicttuple.ts) &&
2947 [ + - ]: 2 : conflicttuple.origin != replorigin_session_origin)
2948 : 2 : type = CT_UPDATE_DELETED;
2949 : : else
2950 : 7 : type = CT_UPDATE_MISSING;
2951 : :
2952 : : /* Store the new tuple for conflict reporting */
382 akapila@postgresql.o 2953 :CBC 9 : slot_store_data(newslot, relmapentry, newtup);
2954 : :
2955 : : /*
2956 : : * The tuple to be updated could not be found or was deleted. Do
2957 : : * nothing except for emitting a log message.
2958 : : */
33 akapila@postgresql.o 2959 :GNC 9 : ReportApplyConflict(estate, relinfo, LOG, type, remoteslot, newslot,
2960 : 9 : list_make1(&conflicttuple));
2961 : : }
2962 : :
2963 : : /* Cleanup. */
1992 peter@eisentraut.org 2964 :CBC 31920 : ExecCloseIndices(relinfo);
3152 peter_e@gmx.net 2965 : 31920 : EvalPlanQualEnd(&epqstate);
2966 : 31920 : }
2967 : :
2968 : : /*
2969 : : * Handle DELETE message.
2970 : : *
2971 : : * TODO: FDW support
2972 : : */
2973 : : static void
2974 : 81935 : apply_handle_delete(StringInfo s)
2975 : : {
2976 : : LogicalRepRelMapEntry *rel;
2977 : : LogicalRepTupleData oldtup;
2978 : : LogicalRepRelId relid;
2979 : : UserContext ucxt;
2980 : : ApplyExecutionData *edata;
2981 : : EState *estate;
2982 : : TupleTableSlot *remoteslot;
2983 : : MemoryContext oldctx;
2984 : : bool run_as_owner;
2985 : :
2986 : : /*
2987 : : * Quick return if we are skipping data modification changes or handling
2988 : : * streamed transactions.
2989 : : */
1264 akapila@postgresql.o 2990 [ + - + + ]: 163870 : if (is_skipping_changes() ||
2991 : 81935 : handle_streamed_transaction(LOGICAL_REP_MSG_DELETE, s))
1829 2992 : 41615 : return;
2993 : :
1549 tgl@sss.pgh.pa.us 2994 : 40320 : begin_replication_step();
2995 : :
3152 peter_e@gmx.net 2996 : 40320 : relid = logicalrep_read_delete(s, &oldtup);
2997 : 40320 : rel = logicalrep_rel_open(relid, RowExclusiveLock);
3089 2998 [ - + ]: 40320 : if (!should_apply_changes_for_rel(rel))
2999 : : {
3000 : : /*
3001 : : * The relation can't become interesting in the middle of the
3002 : : * transaction so it's safe to unlock it.
3003 : : */
3089 peter_e@gmx.net 3004 :UBC 0 : logicalrep_rel_close(rel, RowExclusiveLock);
1549 tgl@sss.pgh.pa.us 3005 : 0 : end_replication_step();
3089 peter_e@gmx.net 3006 : 0 : return;
3007 : : }
3008 : :
3009 : : /* Set relation for error callback */
1471 akapila@postgresql.o 3010 :CBC 40320 : apply_error_callback_arg.rel = rel;
3011 : :
3012 : : /* Check if we can do the delete. */
3152 peter_e@gmx.net 3013 : 40320 : check_relation_updatable(rel);
3014 : :
3015 : : /*
3016 : : * Make sure that any user-supplied code runs as the table owner, unless
3017 : : * the user has opted out of that behavior.
3018 : : */
886 rhaas@postgresql.org 3019 : 40320 : run_as_owner = MySubscription->runasowner;
3020 [ + + ]: 40320 : if (!run_as_owner)
3021 : 40318 : SwitchToUntrustedUser(rel->localrel->rd_rel->relowner, &ucxt);
3022 : :
3023 : : /* Initialize the executor state. */
1568 tgl@sss.pgh.pa.us 3024 : 40320 : edata = create_edata_for_relation(rel);
3025 : 40320 : estate = edata->estate;
2759 andres@anarazel.de 3026 : 40320 : remoteslot = ExecInitExtraTupleSlot(estate,
2487 3027 : 40320 : RelationGetDescr(rel->localrel),
3028 : : &TTSOpsVirtual);
3029 : :
3030 : : /* Build the search tuple. */
3152 peter_e@gmx.net 3031 [ - + ]: 40320 : oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
1876 tgl@sss.pgh.pa.us 3032 : 40320 : slot_store_data(remoteslot, rel, &oldtup);
3152 peter_e@gmx.net 3033 : 40320 : MemoryContextSwitchTo(oldctx);
3034 : :
3035 : : /* For a partitioned table, apply delete to correct partition. */
1979 peter@eisentraut.org 3036 [ + + ]: 40320 : if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
1568 tgl@sss.pgh.pa.us 3037 : 17 : apply_handle_tuple_routing(edata,
3038 : : remoteslot, NULL, CMD_DELETE);
3039 : : else
3040 : : {
199 3041 : 40303 : ResultRelInfo *relinfo = edata->targetRelInfo;
3042 : :
3043 : 40303 : ExecOpenIndices(relinfo, false);
3044 : 40303 : apply_handle_delete_internal(edata, relinfo,
3045 : : remoteslot, rel->localindexoid);
3046 : 40303 : ExecCloseIndices(relinfo);
3047 : : }
3048 : :
1568 3049 : 40320 : finish_edata(edata);
3050 : :
3051 : : /* Reset relation for error callback */
1471 akapila@postgresql.o 3052 : 40320 : apply_error_callback_arg.rel = NULL;
3053 : :
886 rhaas@postgresql.org 3054 [ + + ]: 40320 : if (!run_as_owner)
3055 : 40318 : RestoreUserContext(&ucxt);
3056 : :
1992 peter@eisentraut.org 3057 : 40320 : logicalrep_rel_close(rel, NoLock);
3058 : :
1549 tgl@sss.pgh.pa.us 3059 : 40320 : end_replication_step();
3060 : : }
3061 : :
3062 : : /*
3063 : : * Workhorse for apply_handle_delete()
3064 : : * relinfo is for the relation we're actually deleting from
3065 : : * (could be a child partition of edata->targetRelInfo)
3066 : : */
3067 : : static void
1568 3068 : 40320 : apply_handle_delete_internal(ApplyExecutionData *edata,
3069 : : ResultRelInfo *relinfo,
3070 : : TupleTableSlot *remoteslot,
3071 : : Oid localindexoid)
3072 : : {
3073 : 40320 : EState *estate = edata->estate;
1992 peter@eisentraut.org 3074 : 40320 : Relation localrel = relinfo->ri_RelationDesc;
1568 tgl@sss.pgh.pa.us 3075 : 40320 : LogicalRepRelation *remoterel = &edata->targetRel->remoterel;
3076 : : EPQState epqstate;
3077 : : TupleTableSlot *localslot;
166 akapila@postgresql.o 3078 : 40320 : ConflictTupleInfo conflicttuple = {0};
3079 : : bool found;
3080 : :
841 tgl@sss.pgh.pa.us 3081 : 40320 : EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1, NIL);
3082 : :
3083 : : /* Caller should have opened indexes already. */
199 3084 [ + + + + : 40320 : Assert(relinfo->ri_IndexRelationDescs != NULL ||
- + ]
3085 : : !localrel->rd_rel->relhasindex ||
3086 : : RelationGetIndexList(localrel) == NIL);
3087 : :
774 msawada@postgresql.o 3088 : 40320 : found = FindReplTupleInLocalRel(edata, localrel, remoterel, localindexoid,
3089 : : remoteslot, &localslot);
3090 : :
3091 : : /* If found delete it. */
3152 peter_e@gmx.net 3092 [ + + ]: 40320 : if (found)
3093 : : {
3094 : : /*
3095 : : * Report the conflict if the tuple was modified by a different
3096 : : * origin.
3097 : : */
166 akapila@postgresql.o 3098 [ + + ]: 40311 : if (GetTupleTransactionInfo(localslot, &conflicttuple.xmin,
3099 : 5 : &conflicttuple.origin, &conflicttuple.ts) &&
3100 [ + + ]: 5 : conflicttuple.origin != replorigin_session_origin)
3101 : : {
3102 : 4 : conflicttuple.slot = localslot;
373 3103 : 4 : ReportApplyConflict(estate, relinfo, LOG, CT_DELETE_ORIGIN_DIFFERS,
3104 : : remoteslot, NULL,
166 3105 : 4 : list_make1(&conflicttuple));
3106 : : }
3107 : :
3152 peter_e@gmx.net 3108 : 40311 : EvalPlanQualSetSlot(&epqstate, localslot);
3109 : :
3110 : : /* Do the actual delete. */
1338 jdavis@postgresql.or 3111 : 40311 : TargetPrivilegesCheck(relinfo->ri_RelationDesc, ACL_DELETE);
1788 heikki.linnakangas@i 3112 : 40311 : ExecSimpleRelationDelete(relinfo, estate, &epqstate, localslot);
3113 : : }
3114 : : else
3115 : : {
3116 : : /*
3117 : : * The tuple to be deleted could not be found. Do nothing except for
3118 : : * emitting a log message.
3119 : : */
382 akapila@postgresql.o 3120 : 9 : ReportApplyConflict(estate, relinfo, LOG, CT_DELETE_MISSING,
166 3121 : 9 : remoteslot, NULL, list_make1(&conflicttuple));
3122 : : }
3123 : :
3124 : : /* Cleanup. */
3152 peter_e@gmx.net 3125 : 40320 : EvalPlanQualEnd(&epqstate);
3126 : 40320 : }
3127 : :
3128 : : /*
3129 : : * Try to find a tuple received from the publication side (in 'remoteslot') in
3130 : : * the corresponding local relation using either replica identity index,
3131 : : * primary key, index or if needed, sequential scan.
3132 : : *
3133 : : * Local tuple, if found, is returned in '*localslot'.
3134 : : */
3135 : : static bool
774 msawada@postgresql.o 3136 : 72260 : FindReplTupleInLocalRel(ApplyExecutionData *edata, Relation localrel,
3137 : : LogicalRepRelation *remoterel,
3138 : : Oid localidxoid,
3139 : : TupleTableSlot *remoteslot,
3140 : : TupleTableSlot **localslot)
3141 : : {
3142 : 72260 : EState *estate = edata->estate;
3143 : : bool found;
3144 : :
3145 : : /*
3146 : : * Regardless of the top-level operation, we're performing a read here, so
3147 : : * check for SELECT privileges.
3148 : : */
1337 jdavis@postgresql.or 3149 : 72260 : TargetPrivilegesCheck(localrel, ACL_SELECT);
3150 : :
1984 peter@eisentraut.org 3151 : 72255 : *localslot = table_slot_create(localrel, &estate->es_tupleTable);
3152 : :
906 akapila@postgresql.o 3153 [ + + - + ]: 72255 : Assert(OidIsValid(localidxoid) ||
3154 : : (remoterel->replident == REPLICA_IDENTITY_FULL));
3155 : :
3156 [ + + ]: 72255 : if (OidIsValid(localidxoid))
3157 : : {
3158 : : #ifdef USE_ASSERT_CHECKING
774 msawada@postgresql.o 3159 : 72105 : Relation idxrel = index_open(localidxoid, AccessShareLock);
3160 : :
3161 : : /* Index must be PK, RI, or usable for REPLICA IDENTITY FULL tables */
360 akapila@postgresql.o 3162 [ + + + - : 72105 : Assert(GetRelationIdentityOrPK(localrel) == localidxoid ||
- + ]
3163 : : (remoterel->replident == REPLICA_IDENTITY_FULL &&
3164 : : IsIndexUsableForReplicaIdentityFull(idxrel,
3165 : : edata->targetRel->attrmap)));
774 msawada@postgresql.o 3166 : 72105 : index_close(idxrel, AccessShareLock);
3167 : : #endif
3168 : :
906 akapila@postgresql.o 3169 : 72105 : found = RelationFindReplTupleByIndex(localrel, localidxoid,
3170 : : LockTupleExclusive,
3171 : : remoteslot, *localslot);
3172 : : }
3173 : : else
1984 peter@eisentraut.org 3174 : 150 : found = RelationFindReplTupleSeq(localrel, LockTupleExclusive,
3175 : : remoteslot, *localslot);
3176 : :
3177 : 72255 : return found;
3178 : : }
3179 : :
3180 : : /*
3181 : : * Determine whether the index can reliably locate the deleted tuple in the
3182 : : * local relation.
3183 : : *
3184 : : * An index may exclude deleted tuples if it was re-indexed or re-created during
3185 : : * change application. Therefore, an index is considered usable only if the
3186 : : * conflict detection slot.xmin (conflict_detection_xmin) is greater than the
3187 : : * index tuple's xmin. This ensures that any tuples deleted prior to the index
3188 : : * creation or re-indexing are not relevant for conflict detection in the
3189 : : * current apply worker.
3190 : : *
3191 : : * Note that indexes may also be excluded if they were modified by other DDL
3192 : : * operations, such as ALTER INDEX. However, this is acceptable, as the
3193 : : * likelihood of such DDL changes coinciding with the need to scan dead
3194 : : * tuples for the update_deleted is low.
3195 : : */
3196 : : static bool
33 akapila@postgresql.o 3197 :GNC 1 : IsIndexUsableForFindingDeletedTuple(Oid localindexoid,
3198 : : TransactionId conflict_detection_xmin)
3199 : : {
3200 : : HeapTuple index_tuple;
3201 : : TransactionId index_xmin;
3202 : :
3203 : 1 : index_tuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(localindexoid));
3204 : :
3205 [ - + ]: 1 : if (!HeapTupleIsValid(index_tuple)) /* should not happen */
33 akapila@postgresql.o 3206 [ # # ]:UNC 0 : elog(ERROR, "cache lookup failed for index %u", localindexoid);
3207 : :
3208 : : /*
3209 : : * No need to check for a frozen transaction ID, as
3210 : : * TransactionIdPrecedes() manages it internally, treating it as falling
3211 : : * behind the conflict_detection_xmin.
3212 : : */
33 akapila@postgresql.o 3213 :GNC 1 : index_xmin = HeapTupleHeaderGetXmin(index_tuple->t_data);
3214 : :
3215 : 1 : ReleaseSysCache(index_tuple);
3216 : :
3217 : 1 : return TransactionIdPrecedes(index_xmin, conflict_detection_xmin);
3218 : : }
3219 : :
3220 : : /*
3221 : : * Attempts to locate a deleted tuple in the local relation that matches the
3222 : : * values of the tuple received from the publication side (in 'remoteslot').
3223 : : * The search is performed using either the replica identity index, primary
3224 : : * key, other available index, or a sequential scan if necessary.
3225 : : *
3226 : : * Returns true if the deleted tuple is found. If found, the transaction ID,
3227 : : * origin, and commit timestamp of the deletion are stored in '*delete_xid',
3228 : : * '*delete_origin', and '*delete_time' respectively.
3229 : : */
3230 : : static bool
3231 : 11 : FindDeletedTupleInLocalRel(Relation localrel, Oid localidxoid,
3232 : : TupleTableSlot *remoteslot,
3233 : : TransactionId *delete_xid, RepOriginId *delete_origin,
3234 : : TimestampTz *delete_time)
3235 : : {
3236 : : TransactionId oldestxmin;
3237 : :
3238 : : /*
3239 : : * Return false if either dead tuples are not retained or commit timestamp
3240 : : * data is not available.
3241 : : */
3242 [ + + - + ]: 11 : if (!MySubscription->retaindeadtuples || !track_commit_timestamp)
3243 : 9 : return false;
3244 : :
3245 : : /*
3246 : : * For conflict detection, we use the leader worker's
3247 : : * oldest_nonremovable_xid value instead of invoking
3248 : : * GetOldestNonRemovableTransactionId() or using the conflict detection
3249 : : * slot's xmin. The oldest_nonremovable_xid acts as a threshold to
3250 : : * identify tuples that were recently deleted. These deleted tuples are no
3251 : : * longer visible to concurrent transactions. However, if a remote update
3252 : : * matches such a tuple, we log an update_deleted conflict.
3253 : : *
3254 : : * While GetOldestNonRemovableTransactionId() and slot.xmin may return
3255 : : * transaction IDs older than oldest_nonremovable_xid, for our current
3256 : : * purpose, it is acceptable to treat tuples deleted by transactions prior
3257 : : * to oldest_nonremovable_xid as update_missing conflicts.
3258 : : */
4 3259 [ + - ]: 2 : if (am_leader_apply_worker())
3260 : : {
3261 : 2 : oldestxmin = MyLogicalRepWorker->oldest_nonremovable_xid;
3262 : : }
3263 : : else
3264 : : {
3265 : : LogicalRepWorker *leader;
3266 : :
3267 : : /*
3268 : : * Obtain the information from the leader apply worker as only the
3269 : : * leader manages conflict retention (see
3270 : : * maybe_advance_nonremovable_xid() for details).
3271 : : */
4 akapila@postgresql.o 3272 :UNC 0 : LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
3273 : 0 : leader = logicalrep_worker_find(MyLogicalRepWorker->subid,
3274 : : InvalidOid, false);
3275 : :
3276 [ # # ]: 0 : SpinLockAcquire(&leader->relmutex);
3277 : 0 : oldestxmin = leader->oldest_nonremovable_xid;
3278 : 0 : SpinLockRelease(&leader->relmutex);
3279 : 0 : LWLockRelease(LogicalRepWorkerLock);
3280 : : }
3281 : :
3282 : : /*
3283 : : * Return false if the leader apply worker has stopped retaining
3284 : : * information for detecting conflicts. This implies that update_deleted
3285 : : * can no longer be reliably detected.
3286 : : */
4 akapila@postgresql.o 3287 [ - + ]:GNC 2 : if (!TransactionIdIsValid(oldestxmin))
4 akapila@postgresql.o 3288 :UNC 0 : return false;
3289 : :
33 akapila@postgresql.o 3290 [ + + + - ]:GNC 3 : if (OidIsValid(localidxoid) &&
3291 : 1 : IsIndexUsableForFindingDeletedTuple(localidxoid, oldestxmin))
3292 : 1 : return RelationFindDeletedTupleInfoByIndex(localrel, localidxoid,
3293 : : remoteslot, oldestxmin,
3294 : : delete_xid, delete_origin,
3295 : : delete_time);
3296 : : else
3297 : 1 : return RelationFindDeletedTupleInfoSeq(localrel, remoteslot,
3298 : : oldestxmin, delete_xid,
3299 : : delete_origin, delete_time);
3300 : : }
3301 : :
3302 : : /*
3303 : : * This handles insert, update, delete on a partitioned table.
3304 : : */
3305 : : static void
1568 tgl@sss.pgh.pa.us 3306 :CBC 88 : apply_handle_tuple_routing(ApplyExecutionData *edata,
3307 : : TupleTableSlot *remoteslot,
3308 : : LogicalRepTupleData *newtup,
3309 : : CmdType operation)
3310 : : {
3311 : 88 : EState *estate = edata->estate;
3312 : 88 : LogicalRepRelMapEntry *relmapentry = edata->targetRel;
3313 : 88 : ResultRelInfo *relinfo = edata->targetRelInfo;
1979 peter@eisentraut.org 3314 : 88 : Relation parentrel = relinfo->ri_RelationDesc;
3315 : : ModifyTableState *mtstate;
3316 : : PartitionTupleRouting *proute;
3317 : : ResultRelInfo *partrelinfo;
3318 : : Relation partrel;
3319 : : TupleTableSlot *remoteslot_part;
3320 : : TupleConversionMap *map;
3321 : : MemoryContext oldctx;
1173 akapila@postgresql.o 3322 : 88 : LogicalRepRelMapEntry *part_entry = NULL;
3323 : 88 : AttrMap *attrmap = NULL;
3324 : :
3325 : : /* ModifyTableState is needed for ExecFindPartition(). */
1568 tgl@sss.pgh.pa.us 3326 : 88 : edata->mtstate = mtstate = makeNode(ModifyTableState);
1979 peter@eisentraut.org 3327 : 88 : mtstate->ps.plan = NULL;
3328 : 88 : mtstate->ps.state = estate;
3329 : 88 : mtstate->operation = operation;
3330 : 88 : mtstate->resultRelInfo = relinfo;
3331 : :
3332 : : /* ... as is PartitionTupleRouting. */
1568 tgl@sss.pgh.pa.us 3333 : 88 : edata->proute = proute = ExecSetupPartitionTupleRouting(estate, parentrel);
3334 : :
3335 : : /*
3336 : : * Find the partition to which the "search tuple" belongs.
3337 : : */
1979 peter@eisentraut.org 3338 [ - + ]: 88 : Assert(remoteslot != NULL);
3339 [ + - ]: 88 : oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
3340 : 88 : partrelinfo = ExecFindPartition(mtstate, relinfo, proute,
3341 : : remoteslot, estate);
3342 [ - + ]: 88 : Assert(partrelinfo != NULL);
3343 : 88 : partrel = partrelinfo->ri_RelationDesc;
3344 : :
3345 : : /*
3346 : : * Check for supported relkind. We need this since partitions might be of
3347 : : * unsupported relkinds; and the set of partitions can change, so checking
3348 : : * at CREATE/ALTER SUBSCRIPTION would be insufficient.
3349 : : */
1039 tgl@sss.pgh.pa.us 3350 : 88 : CheckSubscriptionRelkind(partrel->rd_rel->relkind,
3351 : 88 : get_namespace_name(RelationGetNamespace(partrel)),
3352 : 88 : RelationGetRelationName(partrel));
3353 : :
3354 : : /*
3355 : : * To perform any of the operations below, the tuple must match the
3356 : : * partition's rowtype. Convert if needed or just copy, using a dedicated
3357 : : * slot to store the tuple in any case.
3358 : : */
1783 heikki.linnakangas@i 3359 : 88 : remoteslot_part = partrelinfo->ri_PartitionTupleSlot;
1979 peter@eisentraut.org 3360 [ + + ]: 88 : if (remoteslot_part == NULL)
3361 : 55 : remoteslot_part = table_slot_create(partrel, &estate->es_tupleTable);
1009 alvherre@alvh.no-ip. 3362 : 88 : map = ExecGetRootToChildMap(partrelinfo, estate);
1979 peter@eisentraut.org 3363 [ + + ]: 88 : if (map != NULL)
3364 : : {
1173 akapila@postgresql.o 3365 : 33 : attrmap = map->attrMap;
3366 : 33 : remoteslot_part = execute_attr_map_slot(attrmap, remoteslot,
3367 : : remoteslot_part);
3368 : : }
3369 : : else
3370 : : {
1979 peter@eisentraut.org 3371 : 55 : remoteslot_part = ExecCopySlot(remoteslot_part, remoteslot);
3372 : 55 : slot_getallattrs(remoteslot_part);
3373 : : }
3374 : 88 : MemoryContextSwitchTo(oldctx);
3375 : :
3376 : : /* Check if we can do the update or delete on the leaf partition. */
1173 akapila@postgresql.o 3377 [ + + + + ]: 88 : if (operation == CMD_UPDATE || operation == CMD_DELETE)
3378 : : {
3379 : 30 : part_entry = logicalrep_partition_open(relmapentry, partrel,
3380 : : attrmap);
3381 : 30 : check_relation_updatable(part_entry);
3382 : : }
3383 : :
1979 peter@eisentraut.org 3384 [ + + + - ]: 88 : switch (operation)
3385 : : {
3386 : 58 : case CMD_INSERT:
1568 tgl@sss.pgh.pa.us 3387 : 58 : apply_handle_insert_internal(edata, partrelinfo,
3388 : : remoteslot_part);
1979 peter@eisentraut.org 3389 : 44 : break;
3390 : :
3391 : 17 : case CMD_DELETE:
1568 tgl@sss.pgh.pa.us 3392 : 17 : apply_handle_delete_internal(edata, partrelinfo,
3393 : : remoteslot_part,
3394 : : part_entry->localindexoid);
1979 peter@eisentraut.org 3395 : 17 : break;
3396 : :
3397 : 13 : case CMD_UPDATE:
3398 : :
3399 : : /*
3400 : : * For UPDATE, depending on whether or not the updated tuple
3401 : : * satisfies the partition's constraint, perform a simple UPDATE
3402 : : * of the partition or move the updated tuple into a different
3403 : : * suitable partition.
3404 : : */
3405 : : {
3406 : : TupleTableSlot *localslot;
3407 : : ResultRelInfo *partrelinfo_new;
3408 : : Relation partrel_new;
3409 : : bool found;
3410 : : EPQState epqstate;
166 akapila@postgresql.o 3411 : 13 : ConflictTupleInfo conflicttuple = {0};
3412 : :
3413 : : /* Get the matching local tuple from the partition. */
774 msawada@postgresql.o 3414 : 13 : found = FindReplTupleInLocalRel(edata, partrel,
3415 : : &part_entry->remoterel,
3416 : : part_entry->localindexoid,
3417 : : remoteslot_part, &localslot);
1548 tgl@sss.pgh.pa.us 3418 [ + + ]: 13 : if (!found)
3419 : : {
3420 : : ConflictType type;
382 akapila@postgresql.o 3421 : 2 : TupleTableSlot *newslot = localslot;
3422 : :
3423 : : /*
3424 : : * Detecting whether the tuple was recently deleted or
3425 : : * never existed is crucial to avoid misleading the user
3426 : : * during conflict handling.
3427 : : */
33 akapila@postgresql.o 3428 [ - + ]:GNC 2 : if (FindDeletedTupleInLocalRel(partrel,
3429 : : part_entry->localindexoid,
3430 : : remoteslot_part,
3431 : : &conflicttuple.xmin,
3432 : : &conflicttuple.origin,
33 akapila@postgresql.o 3433 :UNC 0 : &conflicttuple.ts) &&
3434 [ # # ]: 0 : conflicttuple.origin != replorigin_session_origin)
3435 : 0 : type = CT_UPDATE_DELETED;
3436 : : else
33 akapila@postgresql.o 3437 :GNC 2 : type = CT_UPDATE_MISSING;
3438 : :
3439 : : /* Store the new tuple for conflict reporting */
382 akapila@postgresql.o 3440 :CBC 2 : slot_store_data(newslot, part_entry, newtup);
3441 : :
3442 : : /*
3443 : : * The tuple to be updated could not be found or was
3444 : : * deleted. Do nothing except for emitting a log message.
3445 : : */
166 3446 : 2 : ReportApplyConflict(estate, partrelinfo, LOG,
3447 : : type, remoteslot_part, newslot,
33 akapila@postgresql.o 3448 :GNC 2 : list_make1(&conflicttuple));
3449 : :
1548 tgl@sss.pgh.pa.us 3450 :CBC 2 : return;
3451 : : }
3452 : :
3453 : : /*
3454 : : * Report the conflict if the tuple was modified by a
3455 : : * different origin.
3456 : : */
166 akapila@postgresql.o 3457 [ + + ]: 11 : if (GetTupleTransactionInfo(localslot, &conflicttuple.xmin,
3458 : : &conflicttuple.origin,
3459 : 1 : &conflicttuple.ts) &&
3460 [ + - ]: 1 : conflicttuple.origin != replorigin_session_origin)
3461 : : {
3462 : : TupleTableSlot *newslot;
3463 : :
3464 : : /* Store the new tuple for conflict reporting */
382 3465 : 1 : newslot = table_slot_create(partrel, &estate->es_tupleTable);
3466 : 1 : slot_store_data(newslot, part_entry, newtup);
3467 : :
166 3468 : 1 : conflicttuple.slot = localslot;
3469 : :
373 3470 : 1 : ReportApplyConflict(estate, partrelinfo, LOG, CT_UPDATE_ORIGIN_DIFFERS,
3471 : : remoteslot_part, newslot,
166 3472 : 1 : list_make1(&conflicttuple));
3473 : : }
3474 : :
3475 : : /*
3476 : : * Apply the update to the local tuple, putting the result in
3477 : : * remoteslot_part.
3478 : : */
1548 tgl@sss.pgh.pa.us 3479 [ + - ]: 11 : oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
3480 : 11 : slot_modify_data(remoteslot_part, localslot, part_entry,
3481 : : newtup);
3482 : 11 : MemoryContextSwitchTo(oldctx);
3483 : :
401 akapila@postgresql.o 3484 : 11 : EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1, NIL);
3485 : :
3486 : : /*
3487 : : * Does the updated tuple still satisfy the current
3488 : : * partition's constraint?
3489 : : */
1816 tgl@sss.pgh.pa.us 3490 [ + - + + ]: 22 : if (!partrel->rd_rel->relispartition ||
1979 peter@eisentraut.org 3491 : 11 : ExecPartitionCheck(partrelinfo, remoteslot_part, estate,
3492 : : false))
3493 : : {
3494 : : /*
3495 : : * Yes, so simply UPDATE the partition. We don't call
3496 : : * apply_handle_update_internal() here, which would
3497 : : * normally do the following work, to avoid repeating some
3498 : : * work already done above to find the local tuple in the
3499 : : * partition.
3500 : : */
382 akapila@postgresql.o 3501 : 10 : InitConflictIndexes(partrelinfo);
3502 : :
1979 peter@eisentraut.org 3503 : 10 : EvalPlanQualSetSlot(&epqstate, remoteslot_part);
1338 jdavis@postgresql.or 3504 : 10 : TargetPrivilegesCheck(partrelinfo->ri_RelationDesc,
3505 : : ACL_UPDATE);
1788 heikki.linnakangas@i 3506 : 10 : ExecSimpleRelationUpdate(partrelinfo, estate, &epqstate,
3507 : : localslot, remoteslot_part);
3508 : : }
3509 : : else
3510 : : {
3511 : : /* Move the tuple into the new partition. */
3512 : :
3513 : : /*
3514 : : * New partition will be found using tuple routing, which
3515 : : * can only occur via the parent table. We might need to
3516 : : * convert the tuple to the parent's rowtype. Note that
3517 : : * this is the tuple found in the partition, not the
3518 : : * original search tuple received by this function.
3519 : : */
1979 peter@eisentraut.org 3520 [ + - ]: 1 : if (map)
3521 : : {
3522 : : TupleConversionMap *PartitionToRootMap =
841 tgl@sss.pgh.pa.us 3523 : 1 : convert_tuples_by_name(RelationGetDescr(partrel),
3524 : : RelationGetDescr(parentrel));
3525 : :
3526 : : remoteslot =
1979 peter@eisentraut.org 3527 : 1 : execute_attr_map_slot(PartitionToRootMap->attrMap,
3528 : : remoteslot_part, remoteslot);
3529 : : }
3530 : : else
3531 : : {
1979 peter@eisentraut.org 3532 :UBC 0 : remoteslot = ExecCopySlot(remoteslot, remoteslot_part);
3533 : 0 : slot_getallattrs(remoteslot);
3534 : : }
3535 : :
3536 : : /* Find the new partition. */
1979 peter@eisentraut.org 3537 [ + - ]:CBC 1 : oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
3538 : 1 : partrelinfo_new = ExecFindPartition(mtstate, relinfo,
3539 : : proute, remoteslot,
3540 : : estate);
3541 : 1 : MemoryContextSwitchTo(oldctx);
3542 [ - + ]: 1 : Assert(partrelinfo_new != partrelinfo);
1039 tgl@sss.pgh.pa.us 3543 : 1 : partrel_new = partrelinfo_new->ri_RelationDesc;
3544 : :
3545 : : /* Check that new partition also has supported relkind. */
3546 : 1 : CheckSubscriptionRelkind(partrel_new->rd_rel->relkind,
3547 : 1 : get_namespace_name(RelationGetNamespace(partrel_new)),
3548 : 1 : RelationGetRelationName(partrel_new));
3549 : :
3550 : : /* DELETE old tuple found in the old partition. */
401 akapila@postgresql.o 3551 : 1 : EvalPlanQualSetSlot(&epqstate, localslot);
3552 : 1 : TargetPrivilegesCheck(partrelinfo->ri_RelationDesc, ACL_DELETE);
3553 : 1 : ExecSimpleRelationDelete(partrelinfo, estate, &epqstate, localslot);
3554 : :
3555 : : /* INSERT new tuple into the new partition. */
3556 : :
3557 : : /*
3558 : : * Convert the replacement tuple to match the destination
3559 : : * partition rowtype.
3560 : : */
1979 peter@eisentraut.org 3561 [ + - ]: 1 : oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
1783 heikki.linnakangas@i 3562 : 1 : remoteslot_part = partrelinfo_new->ri_PartitionTupleSlot;
1979 peter@eisentraut.org 3563 [ + - ]: 1 : if (remoteslot_part == NULL)
1039 tgl@sss.pgh.pa.us 3564 : 1 : remoteslot_part = table_slot_create(partrel_new,
3565 : : &estate->es_tupleTable);
1009 alvherre@alvh.no-ip. 3566 : 1 : map = ExecGetRootToChildMap(partrelinfo_new, estate);
1979 peter@eisentraut.org 3567 [ - + ]: 1 : if (map != NULL)
3568 : : {
1979 peter@eisentraut.org 3569 :UBC 0 : remoteslot_part = execute_attr_map_slot(map->attrMap,
3570 : : remoteslot,
3571 : : remoteslot_part);
3572 : : }
3573 : : else
3574 : : {
1979 peter@eisentraut.org 3575 :CBC 1 : remoteslot_part = ExecCopySlot(remoteslot_part,
3576 : : remoteslot);
3577 : 1 : slot_getallattrs(remoteslot);
3578 : : }
3579 : 1 : MemoryContextSwitchTo(oldctx);
1568 tgl@sss.pgh.pa.us 3580 : 1 : apply_handle_insert_internal(edata, partrelinfo_new,
3581 : : remoteslot_part);
3582 : : }
3583 : :
401 akapila@postgresql.o 3584 : 11 : EvalPlanQualEnd(&epqstate);
3585 : : }
1979 peter@eisentraut.org 3586 : 11 : break;
3587 : :
1979 peter@eisentraut.org 3588 :UBC 0 : default:
3589 [ # # ]: 0 : elog(ERROR, "unrecognized CmdType: %d", (int) operation);
3590 : : break;
3591 : : }
3592 : : }
3593 : :
3594 : : /*
3595 : : * Handle TRUNCATE message.
3596 : : *
3597 : : * TODO: FDW support
3598 : : */
3599 : : static void
2709 peter_e@gmx.net 3600 :CBC 19 : apply_handle_truncate(StringInfo s)
3601 : : {
2690 tgl@sss.pgh.pa.us 3602 : 19 : bool cascade = false;
3603 : 19 : bool restart_seqs = false;
3604 : 19 : List *remote_relids = NIL;
3605 : 19 : List *remote_rels = NIL;
3606 : 19 : List *rels = NIL;
1979 peter@eisentraut.org 3607 : 19 : List *part_rels = NIL;
2690 tgl@sss.pgh.pa.us 3608 : 19 : List *relids = NIL;
3609 : 19 : List *relids_logged = NIL;
3610 : : ListCell *lc;
1569 3611 : 19 : LOCKMODE lockmode = AccessExclusiveLock;
3612 : :
3613 : : /*
3614 : : * Quick return if we are skipping data modification changes or handling
3615 : : * streamed transactions.
3616 : : */
1264 akapila@postgresql.o 3617 [ + - - + ]: 38 : if (is_skipping_changes() ||
3618 : 19 : handle_streamed_transaction(LOGICAL_REP_MSG_TRUNCATE, s))
1829 akapila@postgresql.o 3619 :UBC 0 : return;
3620 : :
1549 tgl@sss.pgh.pa.us 3621 :CBC 19 : begin_replication_step();
3622 : :
2709 peter_e@gmx.net 3623 : 19 : remote_relids = logicalrep_read_truncate(s, &cascade, &restart_seqs);
3624 : :
3625 [ + - + + : 47 : foreach(lc, remote_relids)
+ + ]
3626 : : {
3627 : 28 : LogicalRepRelId relid = lfirst_oid(lc);
3628 : : LogicalRepRelMapEntry *rel;
3629 : :
1569 akapila@postgresql.o 3630 : 28 : rel = logicalrep_rel_open(relid, lockmode);
2709 peter_e@gmx.net 3631 [ - + ]: 28 : if (!should_apply_changes_for_rel(rel))
3632 : : {
3633 : : /*
3634 : : * The relation can't become interesting in the middle of the
3635 : : * transaction so it's safe to unlock it.
3636 : : */
1569 akapila@postgresql.o 3637 :UBC 0 : logicalrep_rel_close(rel, lockmode);
2709 peter_e@gmx.net 3638 : 0 : continue;
3639 : : }
3640 : :
2709 peter_e@gmx.net 3641 :CBC 28 : remote_rels = lappend(remote_rels, rel);
1338 jdavis@postgresql.or 3642 : 28 : TargetPrivilegesCheck(rel->localrel, ACL_TRUNCATE);
2709 peter_e@gmx.net 3643 : 28 : rels = lappend(rels, rel->localrel);
3644 : 28 : relids = lappend_oid(relids, rel->localreloid);
3645 [ - + - - : 28 : if (RelationIsLogicallyLogged(rel->localrel))
- - - - -
- - - -
- ]
2693 peter_e@gmx.net 3646 :UBC 0 : relids_logged = lappend_oid(relids_logged, rel->localreloid);
3647 : :
3648 : : /*
3649 : : * Truncate partitions if we got a message to truncate a partitioned
3650 : : * table.
3651 : : */
1979 peter@eisentraut.org 3652 [ + + ]:CBC 28 : if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
3653 : : {
3654 : : ListCell *child;
3655 : 4 : List *children = find_all_inheritors(rel->localreloid,
3656 : : lockmode,
3657 : : NULL);
3658 : :
3659 [ + - + + : 15 : foreach(child, children)
+ + ]
3660 : : {
3661 : 11 : Oid childrelid = lfirst_oid(child);
3662 : : Relation childrel;
3663 : :
3664 [ + + ]: 11 : if (list_member_oid(relids, childrelid))
3665 : 4 : continue;
3666 : :
3667 : : /* find_all_inheritors already got lock */
3668 : 7 : childrel = table_open(childrelid, NoLock);
3669 : :
3670 : : /*
3671 : : * Ignore temp tables of other backends. See similar code in
3672 : : * ExecuteTruncate().
3673 : : */
3674 [ - + - - ]: 7 : if (RELATION_IS_OTHER_TEMP(childrel))
3675 : : {
1569 akapila@postgresql.o 3676 :UBC 0 : table_close(childrel, lockmode);
1979 peter@eisentraut.org 3677 : 0 : continue;
3678 : : }
3679 : :
1338 jdavis@postgresql.or 3680 :CBC 7 : TargetPrivilegesCheck(childrel, ACL_TRUNCATE);
1979 peter@eisentraut.org 3681 : 7 : rels = lappend(rels, childrel);
3682 : 7 : part_rels = lappend(part_rels, childrel);
3683 : 7 : relids = lappend_oid(relids, childrelid);
3684 : : /* Log this relation only if needed for logical decoding */
3685 [ - + - - : 7 : if (RelationIsLogicallyLogged(childrel))
- - - - -
- - - -
- ]
1979 peter@eisentraut.org 3686 :UBC 0 : relids_logged = lappend_oid(relids_logged, childrelid);
3687 : : }
3688 : : }
3689 : : }
3690 : :
3691 : : /*
3692 : : * Even if we used CASCADE on the upstream primary we explicitly default
3693 : : * to replaying changes without further cascading. This might be later
3694 : : * changeable with a user specified option.
3695 : : *
3696 : : * MySubscription->runasowner tells us whether we want to execute
3697 : : * replication actions as the subscription owner; the last argument to
3698 : : * TruncateGuts tells it whether we want to switch to the table owner.
3699 : : * Those are exactly opposite conditions.
3700 : : */
1612 fujii@postgresql.org 3701 :CBC 19 : ExecuteTruncateGuts(rels,
3702 : : relids,
3703 : : relids_logged,
3704 : : DROP_RESTRICT,
3705 : : restart_seqs,
886 rhaas@postgresql.org 3706 : 19 : !MySubscription->runasowner);
2709 peter_e@gmx.net 3707 [ + - + + : 47 : foreach(lc, remote_rels)
+ + ]
3708 : : {
3709 : 28 : LogicalRepRelMapEntry *rel = lfirst(lc);
3710 : :
3711 : 28 : logicalrep_rel_close(rel, NoLock);
3712 : : }
1979 peter@eisentraut.org 3713 [ + + + + : 26 : foreach(lc, part_rels)
+ + ]
3714 : : {
3715 : 7 : Relation rel = lfirst(lc);
3716 : :
3717 : 7 : table_close(rel, NoLock);
3718 : : }
3719 : :
1549 tgl@sss.pgh.pa.us 3720 : 19 : end_replication_step();
3721 : : }
3722 : :
3723 : :
3724 : : /*
3725 : : * Logical replication protocol message dispatcher.
3726 : : */
3727 : : void
3152 peter_e@gmx.net 3728 : 337295 : apply_dispatch(StringInfo s)
3729 : : {
1769 akapila@postgresql.o 3730 : 337295 : LogicalRepMsgType action = pq_getmsgbyte(s);
3731 : : LogicalRepMsgType saved_command;
3732 : :
3733 : : /*
3734 : : * Set the current command being applied. Since this function can be
3735 : : * called recursively when applying spooled changes, save the current
3736 : : * command.
3737 : : */
1471 3738 : 337295 : saved_command = apply_error_callback_arg.command;
3739 : 337295 : apply_error_callback_arg.command = action;
3740 : :
3152 peter_e@gmx.net 3741 [ + + + + : 337295 : switch (action)
+ + + + +
- + + + +
+ + + + +
- ]
3742 : : {
1769 akapila@postgresql.o 3743 : 481 : case LOGICAL_REP_MSG_BEGIN:
3152 peter_e@gmx.net 3744 : 481 : apply_handle_begin(s);
1471 akapila@postgresql.o 3745 : 481 : break;
3746 : :
1769 3747 : 432 : case LOGICAL_REP_MSG_COMMIT:
3152 peter_e@gmx.net 3748 : 432 : apply_handle_commit(s);
1471 akapila@postgresql.o 3749 : 432 : break;
3750 : :
1769 3751 : 185897 : case LOGICAL_REP_MSG_INSERT:
3152 peter_e@gmx.net 3752 : 185897 : apply_handle_insert(s);
1471 akapila@postgresql.o 3753 : 185855 : break;
3754 : :
1769 3755 : 66165 : case LOGICAL_REP_MSG_UPDATE:
3152 peter_e@gmx.net 3756 : 66165 : apply_handle_update(s);
1471 akapila@postgresql.o 3757 : 66156 : break;
3758 : :
1769 3759 : 81935 : case LOGICAL_REP_MSG_DELETE:
3152 peter_e@gmx.net 3760 : 81935 : apply_handle_delete(s);
1471 akapila@postgresql.o 3761 : 81935 : break;
3762 : :
1769 3763 : 19 : case LOGICAL_REP_MSG_TRUNCATE:
2709 peter_e@gmx.net 3764 : 19 : apply_handle_truncate(s);
1471 akapila@postgresql.o 3765 : 19 : break;
3766 : :
1769 3767 : 462 : case LOGICAL_REP_MSG_RELATION:
3152 peter_e@gmx.net 3768 : 462 : apply_handle_relation(s);
1471 akapila@postgresql.o 3769 : 462 : break;
3770 : :
1769 3771 : 18 : case LOGICAL_REP_MSG_TYPE:
3152 peter_e@gmx.net 3772 : 18 : apply_handle_type(s);
1471 akapila@postgresql.o 3773 : 18 : break;
3774 : :
1769 3775 : 7 : case LOGICAL_REP_MSG_ORIGIN:
3152 peter_e@gmx.net 3776 : 7 : apply_handle_origin(s);
1471 akapila@postgresql.o 3777 : 7 : break;
3778 : :
1614 akapila@postgresql.o 3779 :UBC 0 : case LOGICAL_REP_MSG_MESSAGE:
3780 : :
3781 : : /*
3782 : : * Logical replication does not use generic logical messages yet.
3783 : : * Although, it could be used by other applications that use this
3784 : : * output plugin.
3785 : : */
1471 3786 : 0 : break;
3787 : :
1769 akapila@postgresql.o 3788 :CBC 857 : case LOGICAL_REP_MSG_STREAM_START:
1829 3789 : 857 : apply_handle_stream_start(s);
1471 3790 : 857 : break;
3791 : :
1479 3792 : 856 : case LOGICAL_REP_MSG_STREAM_STOP:
1829 3793 : 856 : apply_handle_stream_stop(s);
1471 3794 : 854 : break;
3795 : :
1769 3796 : 38 : case LOGICAL_REP_MSG_STREAM_ABORT:
1829 3797 : 38 : apply_handle_stream_abort(s);
1471 3798 : 38 : break;
3799 : :
1769 3800 : 61 : case LOGICAL_REP_MSG_STREAM_COMMIT:
1829 3801 : 61 : apply_handle_stream_commit(s);
1471 3802 : 59 : break;
3803 : :
1515 3804 : 16 : case LOGICAL_REP_MSG_BEGIN_PREPARE:
3805 : 16 : apply_handle_begin_prepare(s);
1471 3806 : 16 : break;
3807 : :
1515 3808 : 15 : case LOGICAL_REP_MSG_PREPARE:
3809 : 15 : apply_handle_prepare(s);
1471 3810 : 14 : break;
3811 : :
1515 3812 : 20 : case LOGICAL_REP_MSG_COMMIT_PREPARED:
3813 : 20 : apply_handle_commit_prepared(s);
1471 3814 : 20 : break;
3815 : :
1515 3816 : 5 : case LOGICAL_REP_MSG_ROLLBACK_PREPARED:
3817 : 5 : apply_handle_rollback_prepared(s);
1471 3818 : 5 : break;
3819 : :
1494 3820 : 11 : case LOGICAL_REP_MSG_STREAM_PREPARE:
3821 : 11 : apply_handle_stream_prepare(s);
1471 3822 : 11 : break;
3823 : :
1471 akapila@postgresql.o 3824 :UBC 0 : default:
3825 [ # # ]: 0 : ereport(ERROR,
3826 : : (errcode(ERRCODE_PROTOCOL_VIOLATION),
3827 : : errmsg("invalid logical replication message type \"??? (%d)\"", action)));
3828 : : }
3829 : :
3830 : : /* Reset the current command */
1471 akapila@postgresql.o 3831 :CBC 337239 : apply_error_callback_arg.command = saved_command;
3152 peter_e@gmx.net 3832 : 337239 : }
3833 : :
3834 : : /*
3835 : : * Figure out which write/flush positions to report to the walsender process.
3836 : : *
3837 : : * We can't simply report back the last LSN the walsender sent us because the
3838 : : * local transaction might not yet be flushed to disk locally. Instead we
3839 : : * build a list that associates local with remote LSNs for every commit. When
3840 : : * reporting back the flush position to the sender we iterate that list and
3841 : : * check which entries on it are already locally flushed. Those we can report
3842 : : * as having been flushed.
3843 : : *
3844 : : * The have_pending_txes is true if there are outstanding transactions that
3845 : : * need to be flushed.
3846 : : */
3847 : : static void
3848 : 18651 : get_flush_position(XLogRecPtr *write, XLogRecPtr *flush,
3849 : : bool *have_pending_txes)
3850 : : {
3851 : : dlist_mutable_iter iter;
1401 rhaas@postgresql.org 3852 : 18651 : XLogRecPtr local_flush = GetFlushRecPtr(NULL);
3853 : :
3152 peter_e@gmx.net 3854 : 18651 : *write = InvalidXLogRecPtr;
3855 : 18651 : *flush = InvalidXLogRecPtr;
3856 : :
3857 [ + - + + ]: 19155 : dlist_foreach_modify(iter, &lsn_mapping)
3858 : : {
3859 : 4520 : FlushPosition *pos =
841 tgl@sss.pgh.pa.us 3860 : 4520 : dlist_container(FlushPosition, node, iter.cur);
3861 : :
3152 peter_e@gmx.net 3862 : 4520 : *write = pos->remote_end;
3863 : :
3864 [ + + ]: 4520 : if (pos->local_end <= local_flush)
3865 : : {
3866 : 504 : *flush = pos->remote_end;
3867 : 504 : dlist_delete(iter.cur);
3868 : 504 : pfree(pos);
3869 : : }
3870 : : else
3871 : : {
3872 : : /*
3873 : : * Don't want to uselessly iterate over the rest of the list which
3874 : : * could potentially be long. Instead get the last element and
3875 : : * grab the write position from there.
3876 : : */
3877 : 4016 : pos = dlist_tail_element(FlushPosition, node,
3878 : : &lsn_mapping);
3879 : 4016 : *write = pos->remote_end;
3880 : 4016 : *have_pending_txes = true;
3881 : 4016 : return;
3882 : : }
3883 : : }
3884 : :
3885 : 14635 : *have_pending_txes = !dlist_is_empty(&lsn_mapping);
3886 : : }
3887 : :
3888 : : /*
3889 : : * Store current remote/local lsn pair in the tracking list.
3890 : : */
3891 : : void
971 akapila@postgresql.o 3892 : 538 : store_flush_position(XLogRecPtr remote_lsn, XLogRecPtr local_lsn)
3893 : : {
3894 : : FlushPosition *flushpos;
3895 : :
3896 : : /*
3897 : : * Skip for parallel apply workers, because the lsn_mapping is maintained
3898 : : * by the leader apply worker.
3899 : : */
3900 [ + + ]: 538 : if (am_parallel_apply_worker())
3901 : 19 : return;
3902 : :
3903 : : /* Need to do this in permanent context */
3042 peter_e@gmx.net 3904 : 519 : MemoryContextSwitchTo(ApplyContext);
3905 : :
3906 : : /* Track commit lsn */
3152 3907 : 519 : flushpos = (FlushPosition *) palloc(sizeof(FlushPosition));
971 akapila@postgresql.o 3908 : 519 : flushpos->local_end = local_lsn;
3152 peter_e@gmx.net 3909 : 519 : flushpos->remote_end = remote_lsn;
3910 : :
3911 : 519 : dlist_push_tail(&lsn_mapping, &flushpos->node);
3042 3912 : 519 : MemoryContextSwitchTo(ApplyMessageContext);
3913 : : }
3914 : :
3915 : :
3916 : : /* Update statistics of the worker. */
3917 : : static void
3152 3918 : 187434 : UpdateWorkerStats(XLogRecPtr last_lsn, TimestampTz send_time, bool reply)
3919 : : {
3920 : 187434 : MyLogicalRepWorker->last_lsn = last_lsn;
3921 : 187434 : MyLogicalRepWorker->last_send_time = send_time;
3922 : 187434 : MyLogicalRepWorker->last_recv_time = GetCurrentTimestamp();
3923 [ + + ]: 187434 : if (reply)
3924 : : {
3925 : 2348 : MyLogicalRepWorker->reply_lsn = last_lsn;
3926 : 2348 : MyLogicalRepWorker->reply_time = send_time;
3927 : : }
3928 : 187434 : }
3929 : :
3930 : : /*
3931 : : * Apply main loop.
3932 : : */
3933 : : static void
3089 3934 : 402 : LogicalRepApplyLoop(XLogRecPtr last_received)
3935 : : {
2150 michael@paquier.xyz 3936 : 402 : TimestampTz last_recv_timestamp = GetCurrentTimestamp();
1828 tgl@sss.pgh.pa.us 3937 : 402 : bool ping_sent = false;
3938 : : TimeLineID tli;
3939 : : ErrorContextCallback errcallback;
45 akapila@postgresql.o 3940 :GNC 402 : RetainDeadTuplesData rdt_data = {0};
3941 : :
3942 : : /*
3943 : : * Init the ApplyMessageContext which we clean up after each replication
3944 : : * protocol message.
3945 : : */
3042 peter_e@gmx.net 3946 :CBC 402 : ApplyMessageContext = AllocSetContextCreate(ApplyContext,
3947 : : "ApplyMessageContext",
3948 : : ALLOCSET_DEFAULT_SIZES);
3949 : :
3950 : : /*
3951 : : * This memory context is used for per-stream data when the streaming mode
3952 : : * is enabled. This context is reset on each stream stop.
3953 : : */
1829 akapila@postgresql.o 3954 : 402 : LogicalStreamingContext = AllocSetContextCreate(ApplyContext,
3955 : : "LogicalStreamingContext",
3956 : : ALLOCSET_DEFAULT_SIZES);
3957 : :
3958 : : /* mark as idle, before starting to loop */
3152 peter_e@gmx.net 3959 : 402 : pgstat_report_activity(STATE_IDLE, NULL);
3960 : :
3961 : : /*
3962 : : * Push apply error context callback. Fields will be filled while applying
3963 : : * a change.
3964 : : */
1471 akapila@postgresql.o 3965 : 402 : errcallback.callback = apply_error_callback;
3966 : 402 : errcallback.previous = error_context_stack;
3967 : 402 : error_context_stack = &errcallback;
971 3968 : 402 : apply_error_context_stack = error_context_stack;
3969 : :
3970 : : /* This outer loop iterates once per wait. */
3971 : : for (;;)
3152 peter_e@gmx.net 3972 : 15752 : {
3973 : 16154 : pgsocket fd = PGINVALID_SOCKET;
3974 : : int rc;
3975 : : int len;
3976 : 16154 : char *buf = NULL;
3977 : 16154 : bool endofstream = false;
3978 : : long wait_time;
3979 : :
3018 3980 [ - + ]: 16154 : CHECK_FOR_INTERRUPTS();
3981 : :
3042 3982 : 16154 : MemoryContextSwitchTo(ApplyMessageContext);
3983 : :
1578 alvherre@alvh.no-ip. 3984 : 16154 : len = walrcv_receive(LogRepWorkerWalRcvConn, &buf, &fd);
3985 : :
3152 peter_e@gmx.net 3986 [ + + ]: 16136 : if (len != 0)
3987 : : {
3988 : : /* Loop to process all available data (without blocking). */
3989 : : for (;;)
3990 : : {
3991 [ - + ]: 202591 : CHECK_FOR_INTERRUPTS();
3992 : :
3993 [ + + ]: 202591 : if (len == 0)
3994 : : {
3995 : 15148 : break;
3996 : : }
3997 [ + + ]: 187443 : else if (len < 0)
3998 : : {
3999 [ + - ]: 9 : ereport(LOG,
4000 : : (errmsg("data stream from publisher has ended")));
4001 : 9 : endofstream = true;
4002 : 9 : break;
4003 : : }
4004 : : else
4005 : : {
4006 : : int c;
4007 : : StringInfoData s;
4008 : :
822 akapila@postgresql.o 4009 [ - + ]: 187434 : if (ConfigReloadPending)
4010 : : {
822 akapila@postgresql.o 4011 :UBC 0 : ConfigReloadPending = false;
4012 : 0 : ProcessConfigFile(PGC_SIGHUP);
4013 : : }
4014 : :
4015 : : /* Reset timeout. */
3152 peter_e@gmx.net 4016 :CBC 187434 : last_recv_timestamp = GetCurrentTimestamp();
4017 : 187434 : ping_sent = false;
4018 : :
45 akapila@postgresql.o 4019 :GNC 187434 : rdt_data.last_recv_time = last_recv_timestamp;
4020 : :
4021 : : /* Ensure we are reading the data into our memory context. */
3042 peter_e@gmx.net 4022 :CBC 187434 : MemoryContextSwitchTo(ApplyMessageContext);
4023 : :
681 drowley@postgresql.o 4024 : 187434 : initReadOnlyStringInfo(&s, buf, len);
4025 : :
3152 peter_e@gmx.net 4026 : 187434 : c = pq_getmsgbyte(&s);
4027 : :
31 nathan@postgresql.or 4028 [ + + ]:GNC 187434 : if (c == PqReplMsg_WALData)
4029 : : {
4030 : : XLogRecPtr start_lsn;
4031 : : XLogRecPtr end_lsn;
4032 : : TimestampTz send_time;
4033 : :
3152 peter_e@gmx.net 4034 :CBC 184876 : start_lsn = pq_getmsgint64(&s);
4035 : 184876 : end_lsn = pq_getmsgint64(&s);
3117 tgl@sss.pgh.pa.us 4036 : 184876 : send_time = pq_getmsgint64(&s);
4037 : :
3152 peter_e@gmx.net 4038 [ + + ]: 184876 : if (last_received < start_lsn)
4039 : 149049 : last_received = start_lsn;
4040 : :
4041 [ - + ]: 184876 : if (last_received < end_lsn)
3152 peter_e@gmx.net 4042 :UBC 0 : last_received = end_lsn;
4043 : :
3152 peter_e@gmx.net 4044 :CBC 184876 : UpdateWorkerStats(last_received, send_time, false);
4045 : :
4046 : 184876 : apply_dispatch(&s);
4047 : :
45 akapila@postgresql.o 4048 :GNC 184823 : maybe_advance_nonremovable_xid(&rdt_data, false);
4049 : : }
31 nathan@postgresql.or 4050 [ + + ]: 2558 : else if (c == PqReplMsg_Keepalive)
4051 : : {
4052 : : XLogRecPtr end_lsn;
4053 : : TimestampTz timestamp;
4054 : : bool reply_requested;
4055 : :
3089 peter_e@gmx.net 4056 :CBC 2348 : end_lsn = pq_getmsgint64(&s);
3117 tgl@sss.pgh.pa.us 4057 : 2348 : timestamp = pq_getmsgint64(&s);
3152 peter_e@gmx.net 4058 : 2348 : reply_requested = pq_getmsgbyte(&s);
4059 : :
3089 4060 [ + + ]: 2348 : if (last_received < end_lsn)
4061 : 964 : last_received = end_lsn;
4062 : :
4063 : 2348 : send_feedback(last_received, reply_requested, false);
4064 : :
45 akapila@postgresql.o 4065 :GNC 2348 : maybe_advance_nonremovable_xid(&rdt_data, false);
4066 : :
3152 peter_e@gmx.net 4067 :CBC 2348 : UpdateWorkerStats(last_received, timestamp, true);
4068 : : }
31 nathan@postgresql.or 4069 [ + - ]:GNC 210 : else if (c == PqReplMsg_PrimaryStatusUpdate)
4070 : : {
45 akapila@postgresql.o 4071 : 210 : rdt_data.remote_lsn = pq_getmsgint64(&s);
4072 : 210 : rdt_data.remote_oldestxid = FullTransactionIdFromU64((uint64) pq_getmsgint64(&s));
4073 : 210 : rdt_data.remote_nextxid = FullTransactionIdFromU64((uint64) pq_getmsgint64(&s));
4074 : 210 : rdt_data.reply_time = pq_getmsgint64(&s);
4075 : :
4076 : : /*
4077 : : * This should never happen, see
4078 : : * ProcessStandbyPSRequestMessage. But if it happens
4079 : : * due to a bug, we don't want to proceed as it can
4080 : : * incorrectly advance oldest_nonremovable_xid.
4081 : : */
4082 [ - + ]: 210 : if (XLogRecPtrIsInvalid(rdt_data.remote_lsn))
45 akapila@postgresql.o 4083 [ # # ]:UNC 0 : elog(ERROR, "cannot get the latest WAL position from the publisher");
4084 : :
45 akapila@postgresql.o 4085 :GNC 210 : maybe_advance_nonremovable_xid(&rdt_data, true);
4086 : :
4087 : 210 : UpdateWorkerStats(last_received, rdt_data.reply_time, false);
4088 : : }
4089 : : /* other message types are purposefully ignored */
4090 : :
3042 peter_e@gmx.net 4091 :CBC 187381 : MemoryContextReset(ApplyMessageContext);
4092 : : }
4093 : :
1578 alvherre@alvh.no-ip. 4094 : 187381 : len = walrcv_receive(LogRepWorkerWalRcvConn, &buf, &fd);
4095 : : }
4096 : : }
4097 : :
4098 : : /* confirm all writes so far */
2989 tgl@sss.pgh.pa.us 4099 : 16083 : send_feedback(last_received, false, false);
4100 : :
4101 : : /* Reset the timestamp if no message was received */
45 akapila@postgresql.o 4102 :GNC 16083 : rdt_data.last_recv_time = 0;
4103 : :
4104 : 16083 : maybe_advance_nonremovable_xid(&rdt_data, false);
4105 : :
1829 akapila@postgresql.o 4106 [ + + + + ]:CBC 16083 : if (!in_remote_transaction && !in_streamed_transaction)
4107 : : {
4108 : : /*
4109 : : * If we didn't get any transactions for a while there might be
4110 : : * unconsumed invalidation messages in the queue, consume them
4111 : : * now.
4112 : : */
3089 peter_e@gmx.net 4113 : 3280 : AcceptInvalidationMessages();
3017 4114 : 3280 : maybe_reread_subscription();
4115 : :
4116 : : /* Process any table synchronization changes. */
3089 4117 : 3240 : process_syncing_tables(last_received);
4118 : : }
4119 : :
4120 : : /* Cleanup the memory. */
661 nathan@postgresql.or 4121 : 15851 : MemoryContextReset(ApplyMessageContext);
3152 peter_e@gmx.net 4122 : 15851 : MemoryContextSwitchTo(TopMemoryContext);
4123 : :
4124 : : /* Check if we need to exit the streaming loop. */
4125 [ + + ]: 15851 : if (endofstream)
4126 : 9 : break;
4127 : :
4128 : : /*
4129 : : * Wait for more data or latch. If we have unflushed transactions,
4130 : : * wake up after WalWriterDelay to see if they've been flushed yet (in
4131 : : * which case we should send a feedback message). Otherwise, there's
4132 : : * no particular urgency about waking up unless we get data or a
4133 : : * signal.
4134 : : */
2989 tgl@sss.pgh.pa.us 4135 [ + + ]: 15842 : if (!dlist_is_empty(&lsn_mapping))
4136 : 2972 : wait_time = WalWriterDelay;
4137 : : else
4138 : 12870 : wait_time = NAPTIME_PER_CYCLE;
4139 : :
4140 : : /*
4141 : : * Ensure to wake up when it's possible to advance the non-removable
4142 : : * transaction ID, or when the retention duration may have exceeded
4143 : : * max_retention_duration.
4144 : : */
4 akapila@postgresql.o 4145 [ + + ]:GNC 15842 : if (MySubscription->retentionactive)
4146 : : {
4147 [ + + ]: 217 : if (rdt_data.phase == RDT_GET_CANDIDATE_XID &&
4148 [ - + ]: 38 : rdt_data.xid_advance_interval)
4 akapila@postgresql.o 4149 :UNC 0 : wait_time = Min(wait_time, rdt_data.xid_advance_interval);
4 akapila@postgresql.o 4150 [ + + ]:GNC 217 : else if (MySubscription->maxretention > 0)
4151 : 1 : wait_time = Min(wait_time, MySubscription->maxretention);
4152 : : }
4153 : :
3014 andres@anarazel.de 4154 :CBC 15842 : rc = WaitLatchOrSocket(MyLatch,
4155 : : WL_SOCKET_READABLE | WL_LATCH_SET |
4156 : : WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
4157 : : fd, wait_time,
4158 : : WAIT_EVENT_LOGICAL_APPLY_MAIN);
4159 : :
4160 [ + + ]: 15842 : if (rc & WL_LATCH_SET)
4161 : : {
4162 : 694 : ResetLatch(MyLatch);
4163 [ + + ]: 694 : CHECK_FOR_INTERRUPTS();
4164 : : }
4165 : :
2090 rhaas@postgresql.org 4166 [ + + ]: 15752 : if (ConfigReloadPending)
4167 : : {
4168 : 10 : ConfigReloadPending = false;
3071 peter_e@gmx.net 4169 : 10 : ProcessConfigFile(PGC_SIGHUP);
4170 : : }
4171 : :
3152 4172 [ + + ]: 15752 : if (rc & WL_TIMEOUT)
4173 : : {
4174 : : /*
4175 : : * We didn't receive anything new. If we haven't heard anything
4176 : : * from the server for more than wal_receiver_timeout / 2, ping
4177 : : * the server. Also, if it's been longer than
4178 : : * wal_receiver_status_interval since the last update we sent,
4179 : : * send a status update to the primary anyway, to report any
4180 : : * progress in applying WAL.
4181 : : */
4182 : 213 : bool requestReply = false;
4183 : :
4184 : : /*
4185 : : * Check if time since last receive from primary has reached the
4186 : : * configured limit.
4187 : : */
4188 [ + - ]: 213 : if (wal_receiver_timeout > 0)
4189 : : {
4190 : 213 : TimestampTz now = GetCurrentTimestamp();
4191 : : TimestampTz timeout;
4192 : :
4193 : 213 : timeout =
4194 : 213 : TimestampTzPlusMilliseconds(last_recv_timestamp,
4195 : : wal_receiver_timeout);
4196 : :
4197 [ - + ]: 213 : if (now >= timeout)
3152 peter_e@gmx.net 4198 [ # # ]:UBC 0 : ereport(ERROR,
4199 : : (errcode(ERRCODE_CONNECTION_FAILURE),
4200 : : errmsg("terminating logical replication worker due to timeout")));
4201 : :
4202 : : /* Check to see if it's time for a ping. */
3152 peter_e@gmx.net 4203 [ + - ]:CBC 213 : if (!ping_sent)
4204 : : {
4205 : 213 : timeout = TimestampTzPlusMilliseconds(last_recv_timestamp,
4206 : : (wal_receiver_timeout / 2));
4207 [ - + ]: 213 : if (now >= timeout)
4208 : : {
3152 peter_e@gmx.net 4209 :UBC 0 : requestReply = true;
4210 : 0 : ping_sent = true;
4211 : : }
4212 : : }
4213 : : }
4214 : :
3152 peter_e@gmx.net 4215 :CBC 213 : send_feedback(last_received, requestReply, requestReply);
4216 : :
45 akapila@postgresql.o 4217 :GNC 213 : maybe_advance_nonremovable_xid(&rdt_data, false);
4218 : :
4219 : : /*
4220 : : * Force reporting to ensure long idle periods don't lead to
4221 : : * arbitrarily delayed stats. Stats can only be reported outside
4222 : : * of (implicit or explicit) transactions. That shouldn't lead to
4223 : : * stats being delayed for long, because transactions are either
4224 : : * sent as a whole on commit or streamed. Streamed transactions
4225 : : * are spilled to disk and applied on commit.
4226 : : */
1213 andres@anarazel.de 4227 [ + - ]:CBC 213 : if (!IsTransactionState())
4228 : 213 : pgstat_report_stat(true);
4229 : : }
4230 : : }
4231 : :
4232 : : /* Pop the error context stack */
1471 akapila@postgresql.o 4233 : 9 : error_context_stack = errcallback.previous;
971 4234 : 9 : apply_error_context_stack = error_context_stack;
4235 : :
4236 : : /* All done */
1578 alvherre@alvh.no-ip. 4237 : 9 : walrcv_endstreaming(LogRepWorkerWalRcvConn, &tli);
3152 peter_e@gmx.net 4238 :UBC 0 : }
4239 : :
4240 : : /*
4241 : : * Send a Standby Status Update message to server.
4242 : : *
4243 : : * 'recvpos' is the latest LSN we've received data to, force is set if we need
4244 : : * to send a response to avoid timeouts.
4245 : : */
4246 : : static void
3152 peter_e@gmx.net 4247 :CBC 18644 : send_feedback(XLogRecPtr recvpos, bool force, bool requestReply)
4248 : : {
4249 : : static StringInfo reply_message = NULL;
4250 : : static TimestampTz send_time = 0;
4251 : :
4252 : : static XLogRecPtr last_recvpos = InvalidXLogRecPtr;
4253 : : static XLogRecPtr last_writepos = InvalidXLogRecPtr;
4254 : :
4255 : : XLogRecPtr writepos;
4256 : : XLogRecPtr flushpos;
4257 : : TimestampTz now;
4258 : : bool have_pending_txes;
4259 : :
4260 : : /*
4261 : : * If the user doesn't want status to be reported to the publisher, be
4262 : : * sure to exit before doing anything at all.
4263 : : */
4264 [ + + - + ]: 18644 : if (!force && wal_receiver_status_interval <= 0)
4265 : 7283 : return;
4266 : :
4267 : : /* It's legal to not pass a recvpos */
4268 [ - + ]: 18644 : if (recvpos < last_recvpos)
3152 peter_e@gmx.net 4269 :UBC 0 : recvpos = last_recvpos;
4270 : :
3152 peter_e@gmx.net 4271 :CBC 18644 : get_flush_position(&writepos, &flushpos, &have_pending_txes);
4272 : :
4273 : : /*
4274 : : * No outstanding transactions to flush, we can report the latest received
4275 : : * position. This is important for synchronous replication.
4276 : : */
4277 [ + + ]: 18644 : if (!have_pending_txes)
4278 : 14633 : flushpos = writepos = recvpos;
4279 : :
4280 [ - + ]: 18644 : if (writepos < last_writepos)
3152 peter_e@gmx.net 4281 :UBC 0 : writepos = last_writepos;
4282 : :
3152 peter_e@gmx.net 4283 [ + + ]:CBC 18644 : if (flushpos < last_flushpos)
4284 : 3971 : flushpos = last_flushpos;
4285 : :
4286 : 18644 : now = GetCurrentTimestamp();
4287 : :
4288 : : /* if we've already reported everything we're good */
4289 [ + + ]: 18644 : if (!force &&
4290 [ + + ]: 18083 : writepos == last_writepos &&
4291 [ + + ]: 7530 : flushpos == last_flushpos &&
4292 [ + + ]: 7351 : !TimestampDifferenceExceeds(send_time, now,
4293 : : wal_receiver_status_interval * 1000))
4294 : 7283 : return;
4295 : 11361 : send_time = now;
4296 : :
4297 [ + + ]: 11361 : if (!reply_message)
4298 : : {
3034 bruce@momjian.us 4299 : 402 : MemoryContext oldctx = MemoryContextSwitchTo(ApplyContext);
4300 : :
3152 peter_e@gmx.net 4301 : 402 : reply_message = makeStringInfo();
4302 : 402 : MemoryContextSwitchTo(oldctx);
4303 : : }
4304 : : else
4305 : 10959 : resetStringInfo(reply_message);
4306 : :
31 nathan@postgresql.or 4307 :GNC 11361 : pq_sendbyte(reply_message, PqReplMsg_StandbyStatusUpdate);
2999 tgl@sss.pgh.pa.us 4308 :CBC 11361 : pq_sendint64(reply_message, recvpos); /* write */
4309 : 11361 : pq_sendint64(reply_message, flushpos); /* flush */
4310 : 11361 : pq_sendint64(reply_message, writepos); /* apply */
3034 bruce@momjian.us 4311 : 11361 : pq_sendint64(reply_message, now); /* sendTime */
3152 peter_e@gmx.net 4312 : 11361 : pq_sendbyte(reply_message, requestReply); /* replyRequested */
4313 : :
61 alvherre@kurilemu.de 4314 [ + + ]:GNC 11361 : elog(DEBUG2, "sending feedback (force %d) to recv %X/%08X, write %X/%08X, flush %X/%08X",
4315 : : force,
4316 : : LSN_FORMAT_ARGS(recvpos),
4317 : : LSN_FORMAT_ARGS(writepos),
4318 : : LSN_FORMAT_ARGS(flushpos));
4319 : :
1578 alvherre@alvh.no-ip. 4320 :CBC 11361 : walrcv_send(LogRepWorkerWalRcvConn,
4321 : : reply_message->data, reply_message->len);
4322 : :
3152 peter_e@gmx.net 4323 [ + + ]: 11361 : if (recvpos > last_recvpos)
4324 : 10556 : last_recvpos = recvpos;
4325 [ + + ]: 11361 : if (writepos > last_writepos)
4326 : 10557 : last_writepos = writepos;
4327 [ + + ]: 11361 : if (flushpos > last_flushpos)
4328 : 10351 : last_flushpos = flushpos;
4329 : : }
4330 : :
4331 : : /*
4332 : : * Attempt to advance the non-removable transaction ID.
4333 : : *
4334 : : * See comments atop worker.c for details.
4335 : : */
4336 : : static void
45 akapila@postgresql.o 4337 :GNC 203677 : maybe_advance_nonremovable_xid(RetainDeadTuplesData *rdt_data,
4338 : : bool status_received)
4339 : : {
4340 [ + + ]: 203677 : if (!can_advance_nonremovable_xid(rdt_data))
4341 : 203111 : return;
4342 : :
4343 : 566 : process_rdt_phase_transition(rdt_data, status_received);
4344 : : }
4345 : :
4346 : : /*
4347 : : * Preliminary check to determine if advancing the non-removable transaction ID
4348 : : * is allowed.
4349 : : */
4350 : : static bool
4351 : 203677 : can_advance_nonremovable_xid(RetainDeadTuplesData *rdt_data)
4352 : : {
4353 : : /*
4354 : : * It is sufficient to manage non-removable transaction ID for a
4355 : : * subscription by the main apply worker to detect update_deleted reliably
4356 : : * even for table sync or parallel apply workers.
4357 : : */
4358 [ + + ]: 203677 : if (!am_leader_apply_worker())
4359 : 382 : return false;
4360 : :
4361 : : /* No need to advance if retaining dead tuples is not required */
4362 [ + + ]: 203295 : if (!MySubscription->retaindeadtuples)
4363 : 202712 : return false;
4364 : :
4365 : : /* No need to advance if we have already stopped retaining */
4 4366 [ + + ]: 583 : if (!MySubscription->retentionactive)
4367 : 17 : return false;
4368 : :
45 4369 : 566 : return true;
4370 : : }
4371 : :
4372 : : /*
4373 : : * Process phase transitions during the non-removable transaction ID
4374 : : * advancement. See comments atop worker.c for details of the transition.
4375 : : */
4376 : : static void
4377 : 832 : process_rdt_phase_transition(RetainDeadTuplesData *rdt_data,
4378 : : bool status_received)
4379 : : {
4380 [ + + + + : 832 : switch (rdt_data->phase)
+ - ]
4381 : : {
4382 : 111 : case RDT_GET_CANDIDATE_XID:
4383 : 111 : get_candidate_xid(rdt_data);
4384 : 111 : break;
4385 : 215 : case RDT_REQUEST_PUBLISHER_STATUS:
4386 : 215 : request_publisher_status(rdt_data);
4387 : 215 : break;
4388 : 448 : case RDT_WAIT_FOR_PUBLISHER_STATUS:
4389 : 448 : wait_for_publisher_status(rdt_data, status_received);
4390 : 448 : break;
4391 : 57 : case RDT_WAIT_FOR_LOCAL_FLUSH:
4392 : 57 : wait_for_local_flush(rdt_data);
4393 : 57 : break;
4 4394 : 1 : case RDT_STOP_CONFLICT_INFO_RETENTION:
4395 : 1 : stop_conflict_info_retention(rdt_data);
4396 : 1 : break;
4397 : : }
45 4398 : 832 : }
4399 : :
4400 : : /*
4401 : : * Workhorse for the RDT_GET_CANDIDATE_XID phase.
4402 : : */
4403 : : static void
4404 : 111 : get_candidate_xid(RetainDeadTuplesData *rdt_data)
4405 : : {
4406 : : TransactionId oldest_running_xid;
4407 : : TimestampTz now;
4408 : :
4409 : : /*
4410 : : * Use last_recv_time when applying changes in the loop to avoid
4411 : : * unnecessary system time retrieval. If last_recv_time is not available,
4412 : : * obtain the current timestamp.
4413 : : */
4414 [ + + ]: 111 : now = rdt_data->last_recv_time ? rdt_data->last_recv_time : GetCurrentTimestamp();
4415 : :
4416 : : /*
4417 : : * Compute the candidate_xid and request the publisher status at most once
4418 : : * per xid_advance_interval. Refer to adjust_xid_advance_interval() for
4419 : : * details on how this value is dynamically adjusted. This is to avoid
4420 : : * using CPU and network resources without making much progress.
4421 : : */
4422 [ - + ]: 111 : if (!TimestampDifferenceExceeds(rdt_data->candidate_xid_time, now,
4423 : : rdt_data->xid_advance_interval))
45 akapila@postgresql.o 4424 :UNC 0 : return;
4425 : :
4426 : : /*
4427 : : * Immediately update the timer, even if the function returns later
4428 : : * without setting candidate_xid due to inactivity on the subscriber. This
4429 : : * avoids frequent calls to GetOldestActiveTransactionId.
4430 : : */
45 akapila@postgresql.o 4431 :GNC 111 : rdt_data->candidate_xid_time = now;
4432 : :
4433 : : /*
4434 : : * Consider transactions in the current database, as only dead tuples from
4435 : : * this database are required for conflict detection.
4436 : : */
4437 : 111 : oldest_running_xid = GetOldestActiveTransactionId(false, false);
4438 : :
4439 : : /*
4440 : : * Oldest active transaction ID (oldest_running_xid) can't be behind any
4441 : : * of its previously computed value.
4442 : : */
4443 [ - + ]: 111 : Assert(TransactionIdPrecedesOrEquals(MyLogicalRepWorker->oldest_nonremovable_xid,
4444 : : oldest_running_xid));
4445 : :
4446 : : /* Return if the oldest_nonremovable_xid cannot be advanced */
4447 [ + + ]: 111 : if (TransactionIdEquals(MyLogicalRepWorker->oldest_nonremovable_xid,
4448 : : oldest_running_xid))
4449 : : {
4450 : 79 : adjust_xid_advance_interval(rdt_data, false);
4451 : 79 : return;
4452 : : }
4453 : :
4454 : 32 : adjust_xid_advance_interval(rdt_data, true);
4455 : :
4456 : 32 : rdt_data->candidate_xid = oldest_running_xid;
4457 : 32 : rdt_data->phase = RDT_REQUEST_PUBLISHER_STATUS;
4458 : :
4459 : : /* process the next phase */
4460 : 32 : process_rdt_phase_transition(rdt_data, false);
4461 : : }
4462 : :
4463 : : /*
4464 : : * Workhorse for the RDT_REQUEST_PUBLISHER_STATUS phase.
4465 : : */
4466 : : static void
4467 : 215 : request_publisher_status(RetainDeadTuplesData *rdt_data)
4468 : : {
4469 : : static StringInfo request_message = NULL;
4470 : :
4471 [ + + ]: 215 : if (!request_message)
4472 : : {
4473 : 9 : MemoryContext oldctx = MemoryContextSwitchTo(ApplyContext);
4474 : :
4475 : 9 : request_message = makeStringInfo();
4476 : 9 : MemoryContextSwitchTo(oldctx);
4477 : : }
4478 : : else
4479 : 206 : resetStringInfo(request_message);
4480 : :
4481 : : /*
4482 : : * Send the current time to update the remote walsender's latest reply
4483 : : * message received time.
4484 : : */
31 nathan@postgresql.or 4485 : 215 : pq_sendbyte(request_message, PqReplMsg_PrimaryStatusRequest);
45 akapila@postgresql.o 4486 : 215 : pq_sendint64(request_message, GetCurrentTimestamp());
4487 : :
4488 [ + + ]: 215 : elog(DEBUG2, "sending publisher status request message");
4489 : :
4490 : : /* Send a request for the publisher status */
4491 : 215 : walrcv_send(LogRepWorkerWalRcvConn,
4492 : : request_message->data, request_message->len);
4493 : :
4494 : 215 : rdt_data->phase = RDT_WAIT_FOR_PUBLISHER_STATUS;
4495 : :
4496 : : /*
4497 : : * Skip calling maybe_advance_nonremovable_xid() since further transition
4498 : : * is possible only once we receive the publisher status message.
4499 : : */
4500 : 215 : }
4501 : :
4502 : : /*
4503 : : * Workhorse for the RDT_WAIT_FOR_PUBLISHER_STATUS phase.
4504 : : */
4505 : : static void
4506 : 448 : wait_for_publisher_status(RetainDeadTuplesData *rdt_data,
4507 : : bool status_received)
4508 : : {
4509 : : /*
4510 : : * Return if we have requested but not yet received the publisher status.
4511 : : */
4512 [ + + ]: 448 : if (!status_received)
4513 : 239 : return;
4514 : :
4515 : : /*
4516 : : * We don't need to maintain oldest_nonremovable_xid if we decide to stop
4517 : : * retaining conflict information for this worker.
4518 : : */
4 4519 [ - + ]: 209 : if (should_stop_conflict_info_retention(rdt_data))
4 akapila@postgresql.o 4520 :UNC 0 : return;
4521 : :
45 akapila@postgresql.o 4522 [ + + ]:GNC 209 : if (!FullTransactionIdIsValid(rdt_data->remote_wait_for))
4523 : 26 : rdt_data->remote_wait_for = rdt_data->remote_nextxid;
4524 : :
4525 : : /*
4526 : : * Check if all remote concurrent transactions that were active at the
4527 : : * first status request have now completed. If completed, proceed to the
4528 : : * next phase; otherwise, continue checking the publisher status until
4529 : : * these transactions finish.
4530 : : *
4531 : : * It's possible that transactions in the commit phase during the last
4532 : : * cycle have now finished committing, but remote_oldestxid remains older
4533 : : * than remote_wait_for. This can happen if some old transaction came in
4534 : : * the commit phase when we requested status in this cycle. We do not
4535 : : * handle this case explicitly as it's rare and the benefit doesn't
4536 : : * justify the required complexity. Tracking would require either caching
4537 : : * all xids at the publisher or sending them to subscribers. The condition
4538 : : * will resolve naturally once the remaining transactions are finished.
4539 : : *
4540 : : * Directly advancing the non-removable transaction ID is possible if
4541 : : * there are no activities on the publisher since the last advancement
4542 : : * cycle. However, it requires maintaining two fields, last_remote_nextxid
4543 : : * and last_remote_lsn, within the structure for comparison with the
4544 : : * current cycle's values. Considering the minimal cost of continuing in
4545 : : * RDT_WAIT_FOR_LOCAL_FLUSH without awaiting changes, we opted not to
4546 : : * advance the transaction ID here.
4547 : : */
4548 [ + + ]: 209 : if (FullTransactionIdPrecedesOrEquals(rdt_data->remote_wait_for,
4549 : : rdt_data->remote_oldestxid))
4550 : 26 : rdt_data->phase = RDT_WAIT_FOR_LOCAL_FLUSH;
4551 : : else
4552 : 183 : rdt_data->phase = RDT_REQUEST_PUBLISHER_STATUS;
4553 : :
4554 : : /* process the next phase */
4555 : 209 : process_rdt_phase_transition(rdt_data, false);
4556 : : }
4557 : :
4558 : : /*
4559 : : * Workhorse for the RDT_WAIT_FOR_LOCAL_FLUSH phase.
4560 : : */
4561 : : static void
4562 : 57 : wait_for_local_flush(RetainDeadTuplesData *rdt_data)
4563 : : {
4564 [ + - - + ]: 57 : Assert(!XLogRecPtrIsInvalid(rdt_data->remote_lsn) &&
4565 : : TransactionIdIsValid(rdt_data->candidate_xid));
4566 : :
4567 : : /*
4568 : : * We expect the publisher and subscriber clocks to be in sync using time
4569 : : * sync service like NTP. Otherwise, we will advance this worker's
4570 : : * oldest_nonremovable_xid prematurely, leading to the removal of rows
4571 : : * required to detect update_deleted reliably. This check primarily
4572 : : * addresses scenarios where the publisher's clock falls behind; if the
4573 : : * publisher's clock is ahead, subsequent transactions will naturally bear
4574 : : * later commit timestamps, conforming to the design outlined atop
4575 : : * worker.c.
4576 : : *
4577 : : * XXX Consider waiting for the publisher's clock to catch up with the
4578 : : * subscriber's before proceeding to the next phase.
4579 : : */
4580 [ - + ]: 57 : if (TimestampDifferenceExceeds(rdt_data->reply_time,
4581 : : rdt_data->candidate_xid_time, 0))
45 akapila@postgresql.o 4582 [ # # ]:UNC 0 : ereport(ERROR,
4583 : : errmsg_internal("oldest_nonremovable_xid transaction ID could be advanced prematurely"),
4584 : : errdetail_internal("The clock on the publisher is behind that of the subscriber."));
4585 : :
4586 : : /*
4587 : : * Do not attempt to advance the non-removable transaction ID when table
4588 : : * sync is in progress. During this time, changes from a single
4589 : : * transaction may be applied by multiple table sync workers corresponding
4590 : : * to the target tables. So, it's necessary for all table sync workers to
4591 : : * apply and flush the corresponding changes before advancing the
4592 : : * transaction ID, otherwise, dead tuples that are still needed for
4593 : : * conflict detection in table sync workers could be removed prematurely.
4594 : : * However, confirming the apply and flush progress across all table sync
4595 : : * workers is complex and not worth the effort, so we simply return if not
4596 : : * all tables are in the READY state.
4597 : : *
4598 : : * It is safe to add new tables with initial states to the subscription
4599 : : * after this check because any changes applied to these tables should
4600 : : * have a WAL position greater than the rdt_data->remote_lsn.
4601 : : */
45 akapila@postgresql.o 4602 [ + + ]:GNC 57 : if (!AllTablesyncsReady())
4603 : : {
4604 : : TimestampTz now;
4605 : :
4 4606 : 32 : now = rdt_data->last_recv_time
4607 [ + + ]: 16 : ? rdt_data->last_recv_time : GetCurrentTimestamp();
4608 : :
4609 : : /*
4610 : : * Record the time spent waiting for table sync, it is needed for the
4611 : : * timeout check in should_stop_conflict_info_retention().
4612 : : */
4613 : 16 : rdt_data->table_sync_wait_time =
4614 : 16 : TimestampDifferenceMilliseconds(rdt_data->candidate_xid_time, now);
4615 : :
4616 : 16 : return;
4617 : : }
4618 : :
4619 : : /*
4620 : : * We don't need to maintain oldest_nonremovable_xid if we decide to stop
4621 : : * retaining conflict information for this worker.
4622 : : */
4623 [ + + ]: 41 : if (should_stop_conflict_info_retention(rdt_data))
45 4624 : 1 : return;
4625 : :
4626 : : /*
4627 : : * Update and check the remote flush position if we are applying changes
4628 : : * in a loop. This is done at most once per WalWriterDelay to avoid
4629 : : * performing costly operations in get_flush_position() too frequently
4630 : : * during change application.
4631 : : */
4632 [ + + + + : 47 : if (last_flushpos < rdt_data->remote_lsn && rdt_data->last_recv_time &&
+ - ]
4633 : 7 : TimestampDifferenceExceeds(rdt_data->flushpos_update_time,
4634 : : rdt_data->last_recv_time, WalWriterDelay))
4635 : : {
4636 : : XLogRecPtr writepos;
4637 : : XLogRecPtr flushpos;
4638 : : bool have_pending_txes;
4639 : :
4640 : : /* Fetch the latest remote flush position */
4641 : 7 : get_flush_position(&writepos, &flushpos, &have_pending_txes);
4642 : :
4643 [ + + ]: 7 : if (flushpos > last_flushpos)
4644 : 1 : last_flushpos = flushpos;
4645 : :
4646 : 7 : rdt_data->flushpos_update_time = rdt_data->last_recv_time;
4647 : : }
4648 : :
4649 : : /* Return to wait for the changes to be applied */
4650 [ + + ]: 40 : if (last_flushpos < rdt_data->remote_lsn)
4651 : 16 : return;
4652 : :
4653 : : /*
4654 : : * Reaching here means the remote WAL position has been received, and all
4655 : : * transactions up to that position on the publisher have been applied and
4656 : : * flushed locally. So, we can advance the non-removable transaction ID.
4657 : : */
4658 [ - + ]: 24 : SpinLockAcquire(&MyLogicalRepWorker->relmutex);
4659 : 24 : MyLogicalRepWorker->oldest_nonremovable_xid = rdt_data->candidate_xid;
4660 : 24 : SpinLockRelease(&MyLogicalRepWorker->relmutex);
4661 : :
23 heikki.linnakangas@i 4662 [ + + ]: 24 : elog(DEBUG2, "confirmed flush up to remote lsn %X/%08X: new oldest_nonremovable_xid %u",
4663 : : LSN_FORMAT_ARGS(rdt_data->remote_lsn),
4664 : : rdt_data->candidate_xid);
4665 : :
4666 : : /* Notify launcher to update the xmin of the conflict slot */
45 akapila@postgresql.o 4667 : 24 : ApplyLauncherWakeup();
4668 : :
4 4669 : 24 : reset_retention_data_fields(rdt_data);
4670 : :
4671 : : /* process the next phase */
4672 : 24 : process_rdt_phase_transition(rdt_data, false);
4673 : : }
4674 : :
4675 : : /*
4676 : : * Check whether conflict information retention should be stopped due to
4677 : : * exceeding the maximum wait time (max_retention_duration).
4678 : : *
4679 : : * If retention should be stopped, transition to the
4680 : : * RDT_STOP_CONFLICT_INFO_RETENTION phase and return true. Otherwise, return
4681 : : * false.
4682 : : *
4683 : : * Note: Retention won't be resumed automatically. The user must manually
4684 : : * disable retain_dead_tuples and re-enable it after confirming that the
4685 : : * replication slot maintained by the launcher has been dropped.
4686 : : */
4687 : : static bool
4688 : 250 : should_stop_conflict_info_retention(RetainDeadTuplesData *rdt_data)
4689 : : {
4690 : : TimestampTz now;
4691 : :
4692 [ - + ]: 250 : Assert(TransactionIdIsValid(rdt_data->candidate_xid));
4693 [ + + - + ]: 250 : Assert(rdt_data->phase == RDT_WAIT_FOR_PUBLISHER_STATUS ||
4694 : : rdt_data->phase == RDT_WAIT_FOR_LOCAL_FLUSH);
4695 : :
4696 [ + + ]: 250 : if (!MySubscription->maxretention)
4697 : 249 : return false;
4698 : :
4699 : : /*
4700 : : * Use last_recv_time when applying changes in the loop to avoid
4701 : : * unnecessary system time retrieval. If last_recv_time is not available,
4702 : : * obtain the current timestamp.
4703 : : */
4704 [ - + ]: 1 : now = rdt_data->last_recv_time ? rdt_data->last_recv_time : GetCurrentTimestamp();
4705 : :
4706 : : /*
4707 : : * Return early if the wait time has not exceeded the configured maximum
4708 : : * (max_retention_duration). Time spent waiting for table synchronization
4709 : : * is excluded from this calculation, as it occurs infrequently.
4710 : : */
4711 [ - + ]: 1 : if (!TimestampDifferenceExceeds(rdt_data->candidate_xid_time, now,
4712 : 1 : MySubscription->maxretention +
4713 : 1 : rdt_data->table_sync_wait_time))
4 akapila@postgresql.o 4714 :UNC 0 : return false;
4715 : :
4 akapila@postgresql.o 4716 :GNC 1 : rdt_data->phase = RDT_STOP_CONFLICT_INFO_RETENTION;
4717 : :
4718 : : /* process the next phase */
4719 : 1 : process_rdt_phase_transition(rdt_data, false);
4720 : :
4721 : 1 : return true;
4722 : : }
4723 : :
4724 : : /*
4725 : : * Workhorse for the RDT_STOP_CONFLICT_INFO_RETENTION phase.
4726 : : */
4727 : : static void
4728 : 1 : stop_conflict_info_retention(RetainDeadTuplesData *rdt_data)
4729 : : {
4730 : : /*
4731 : : * Do not update the catalog during an active transaction. The transaction
4732 : : * may be started during change application, leading to a possible
4733 : : * rollback of catalog updates if the application fails subsequently.
4734 : : */
4735 [ - + ]: 1 : if (IsTransactionState())
4 akapila@postgresql.o 4736 :UNC 0 : return;
4737 : :
4 akapila@postgresql.o 4738 :GNC 1 : StartTransactionCommand();
4739 : :
4740 : : /*
4741 : : * Updating pg_subscription might involve TOAST table access, so ensure we
4742 : : * have a valid snapshot.
4743 : : */
4744 : 1 : PushActiveSnapshot(GetTransactionSnapshot());
4745 : :
4746 : : /* Set pg_subscription.subretentionactive to false */
4747 : 1 : UpdateDeadTupleRetentionStatus(MySubscription->oid, false);
4748 : :
4749 : 1 : PopActiveSnapshot();
4750 : 1 : CommitTransactionCommand();
4751 : :
4752 [ - + ]: 1 : SpinLockAcquire(&MyLogicalRepWorker->relmutex);
4753 : 1 : MyLogicalRepWorker->oldest_nonremovable_xid = InvalidTransactionId;
4754 : 1 : SpinLockRelease(&MyLogicalRepWorker->relmutex);
4755 : :
4756 [ + - ]: 1 : ereport(LOG,
4757 : : errmsg("logical replication worker for subscription \"%s\" has stopped retaining the information for detecting conflicts",
4758 : : MySubscription->name),
4759 : : errdetail("Retention of information used for conflict detection has exceeded max_retention_duration of %u ms.",
4760 : : MySubscription->maxretention));
4761 : :
4762 : : /* Notify launcher to update the conflict slot */
4763 : 1 : ApplyLauncherWakeup();
4764 : :
4765 : 1 : reset_retention_data_fields(rdt_data);
4766 : : }
4767 : :
4768 : : /*
4769 : : * Reset all data fields of RetainDeadTuplesData except those used to
4770 : : * determine the timing for the next round of transaction ID advancement. We
4771 : : * can even use flushpos_update_time in the next round to decide whether to get
4772 : : * the latest flush position.
4773 : : */
4774 : : static void
4775 : 25 : reset_retention_data_fields(RetainDeadTuplesData *rdt_data)
4776 : : {
45 4777 : 25 : rdt_data->phase = RDT_GET_CANDIDATE_XID;
4778 : 25 : rdt_data->remote_lsn = InvalidXLogRecPtr;
4779 : 25 : rdt_data->remote_oldestxid = InvalidFullTransactionId;
4780 : 25 : rdt_data->remote_nextxid = InvalidFullTransactionId;
4781 : 25 : rdt_data->reply_time = 0;
4782 : 25 : rdt_data->remote_wait_for = InvalidFullTransactionId;
4783 : 25 : rdt_data->candidate_xid = InvalidTransactionId;
4 4784 : 25 : rdt_data->table_sync_wait_time = 0;
45 4785 : 25 : }
4786 : :
4787 : : /*
4788 : : * Adjust the interval for advancing non-removable transaction IDs.
4789 : : *
4790 : : * If there is no activity on the node, we progressively double the interval
4791 : : * used to advance non-removable transaction ID. This helps conserve CPU
4792 : : * and network resources when there's little benefit to frequent updates.
4793 : : *
4794 : : * The interval is capped by the lowest of the following:
4795 : : * - wal_receiver_status_interval (if set),
4796 : : * - a default maximum of 3 minutes,
4797 : : * - max_retention_duration.
4798 : : *
4799 : : * This ensures the interval never exceeds the retention boundary, even if
4800 : : * other limits are higher. Once activity resumes on the node, the interval
4801 : : * is reset to lesser of 100ms and max_retention_duration, allowing timely
4802 : : * advancement of non-removable transaction ID.
4803 : : *
4804 : : * XXX The use of wal_receiver_status_interval is a bit arbitrary so we can
4805 : : * consider the other interval or a separate GUC if the need arises.
4806 : : */
4807 : : static void
4808 : 111 : adjust_xid_advance_interval(RetainDeadTuplesData *rdt_data, bool new_xid_found)
4809 : : {
4810 [ + + - + ]: 111 : if (!new_xid_found && rdt_data->xid_advance_interval)
45 akapila@postgresql.o 4811 :UNC 0 : {
4812 : 0 : int max_interval = wal_receiver_status_interval
4813 : 0 : ? wal_receiver_status_interval * 1000
4814 [ # # ]: 0 : : MAX_XID_ADVANCE_INTERVAL;
4815 : :
4816 : : /*
4817 : : * No new transaction ID has been assigned since the last check, so
4818 : : * double the interval, but not beyond the maximum allowable value.
4819 : : */
4820 : 0 : rdt_data->xid_advance_interval = Min(rdt_data->xid_advance_interval * 2,
4821 : : max_interval);
4822 : : }
4823 : : else
4824 : : {
4825 : : /*
4826 : : * A new transaction ID was found or the interval is not yet
4827 : : * initialized, so set the interval to the minimum value.
4828 : : */
45 akapila@postgresql.o 4829 :GNC 111 : rdt_data->xid_advance_interval = MIN_XID_ADVANCE_INTERVAL;
4830 : : }
4831 : :
4832 : : /* Ensure the wait time remains within the maximum limit */
4 4833 : 111 : rdt_data->xid_advance_interval = Min(rdt_data->xid_advance_interval,
4834 : : MySubscription->maxretention);
45 4835 : 111 : }
4836 : :
4837 : : /*
4838 : : * Exit routine for apply workers due to subscription parameter changes.
4839 : : */
4840 : : static void
971 akapila@postgresql.o 4841 :CBC 43 : apply_worker_exit(void)
4842 : : {
4843 [ - + ]: 43 : if (am_parallel_apply_worker())
4844 : : {
4845 : : /*
4846 : : * Don't stop the parallel apply worker as the leader will detect the
4847 : : * subscription parameter change and restart logical replication later
4848 : : * anyway. This also prevents the leader from reporting errors when
4849 : : * trying to communicate with a stopped parallel apply worker, which
4850 : : * would accidentally disable subscriptions if disable_on_error was
4851 : : * set.
4852 : : */
971 akapila@postgresql.o 4853 :UBC 0 : return;
4854 : : }
4855 : :
4856 : : /*
4857 : : * Reset the last-start time for this apply worker so that the launcher
4858 : : * will restart it without waiting for wal_retrieve_retry_interval if the
4859 : : * subscription is still active, and so that we won't leak that hash table
4860 : : * entry if it isn't.
4861 : : */
764 akapila@postgresql.o 4862 [ + - ]:CBC 43 : if (am_leader_apply_worker())
958 tgl@sss.pgh.pa.us 4863 : 43 : ApplyLauncherForgetWorkerStartTime(MyLogicalRepWorker->subid);
4864 : :
971 akapila@postgresql.o 4865 : 43 : proc_exit(0);
4866 : : }
4867 : :
4868 : : /*
4869 : : * Reread subscription info if needed.
4870 : : *
4871 : : * For significant changes, we react by exiting the current process; a new
4872 : : * one will be launched afterwards if needed.
4873 : : */
4874 : : void
3017 peter_e@gmx.net 4875 : 4290 : maybe_reread_subscription(void)
4876 : : {
4877 : : MemoryContext oldctx;
4878 : : Subscription *newsub;
3034 bruce@momjian.us 4879 : 4290 : bool started_tx = false;
4880 : :
4881 : : /* When cache state is valid there is nothing to do here. */
3017 peter_e@gmx.net 4882 [ + + ]: 4290 : if (MySubscriptionValid)
4883 : 4202 : return;
4884 : :
4885 : : /* This function might be called inside or outside of transaction. */
3089 4886 [ + + ]: 88 : if (!IsTransactionState())
4887 : : {
4888 : 82 : StartTransactionCommand();
4889 : 82 : started_tx = true;
4890 : : }
4891 : :
4892 : : /* Ensure allocations in permanent context. */
3042 4893 : 88 : oldctx = MemoryContextSwitchTo(ApplyContext);
4894 : :
3152 4895 : 88 : newsub = GetSubscription(MyLogicalRepWorker->subid, true);
4896 : :
4897 : : /*
4898 : : * Exit if the subscription was removed. This normally should not happen
4899 : : * as the worker gets killed during DROP SUBSCRIPTION.
4900 : : */
3148 4901 [ - + ]: 88 : if (!newsub)
4902 : : {
3152 peter_e@gmx.net 4903 [ # # ]:UBC 0 : ereport(LOG,
4904 : : (errmsg("logical replication worker for subscription \"%s\" will stop because the subscription was removed",
4905 : : MySubscription->name)));
4906 : :
4907 : : /* Ensure we remove no-longer-useful entry for worker's start time */
764 akapila@postgresql.o 4908 [ # # ]: 0 : if (am_leader_apply_worker())
958 tgl@sss.pgh.pa.us 4909 : 0 : ApplyLauncherForgetWorkerStartTime(MyLogicalRepWorker->subid);
4910 : :
3152 peter_e@gmx.net 4911 : 0 : proc_exit(0);
4912 : : }
4913 : :
4914 : : /* Exit if the subscription was disabled. */
3042 peter_e@gmx.net 4915 [ + + ]:CBC 88 : if (!newsub->enabled)
4916 : : {
4917 [ + - ]: 13 : ereport(LOG,
4918 : : (errmsg("logical replication worker for subscription \"%s\" will stop because the subscription was disabled",
4919 : : MySubscription->name)));
4920 : :
971 akapila@postgresql.o 4921 : 13 : apply_worker_exit();
4922 : : }
4923 : :
4924 : : /* !slotname should never happen when enabled is true. */
3042 peter_e@gmx.net 4925 [ - + ]: 75 : Assert(newsub->slotname);
4926 : :
4927 : : /* two-phase cannot be altered while the worker is running */
1515 akapila@postgresql.o 4928 [ - + ]: 75 : Assert(newsub->twophasestate == MySubscription->twophasestate);
4929 : :
4930 : : /*
4931 : : * Exit if any parameter that affects the remote connection was changed.
4932 : : * The launcher will start a new worker but note that the parallel apply
4933 : : * worker won't restart if the streaming option's value is changed from
4934 : : * 'parallel' to any other value or the server decides not to stream the
4935 : : * in-progress transaction.
4936 : : */
1876 tgl@sss.pgh.pa.us 4937 [ + + ]: 75 : if (strcmp(newsub->conninfo, MySubscription->conninfo) != 0 ||
4938 [ + + ]: 73 : strcmp(newsub->name, MySubscription->name) != 0 ||
4939 [ + - ]: 72 : strcmp(newsub->slotname, MySubscription->slotname) != 0 ||
4940 [ + + ]: 72 : newsub->binary != MySubscription->binary ||
1829 akapila@postgresql.o 4941 [ + + ]: 66 : newsub->stream != MySubscription->stream ||
870 4942 [ + - ]: 61 : newsub->passwordrequired != MySubscription->passwordrequired ||
1143 4943 [ + + ]: 61 : strcmp(newsub->origin, MySubscription->origin) != 0 ||
1338 jdavis@postgresql.or 4944 [ + + ]: 59 : newsub->owner != MySubscription->owner ||
1876 tgl@sss.pgh.pa.us 4945 [ + + ]: 58 : !equal(newsub->publications, MySubscription->publications))
4946 : : {
971 akapila@postgresql.o 4947 [ - + ]: 26 : if (am_parallel_apply_worker())
971 akapila@postgresql.o 4948 [ # # ]:UBC 0 : ereport(LOG,
4949 : : (errmsg("logical replication parallel apply worker for subscription \"%s\" will stop because of a parameter change",
4950 : : MySubscription->name)));
4951 : : else
971 akapila@postgresql.o 4952 [ + - ]:CBC 26 : ereport(LOG,
4953 : : (errmsg("logical replication worker for subscription \"%s\" will restart because of a parameter change",
4954 : : MySubscription->name)));
4955 : :
4956 : 26 : apply_worker_exit();
4957 : : }
4958 : :
4959 : : /*
4960 : : * Exit if the subscription owner's superuser privileges have been
4961 : : * revoked.
4962 : : */
690 4963 [ + + + + ]: 49 : if (!newsub->ownersuperuser && MySubscription->ownersuperuser)
4964 : : {
4965 [ - + ]: 4 : if (am_parallel_apply_worker())
690 akapila@postgresql.o 4966 [ # # ]:UBC 0 : ereport(LOG,
4967 : : errmsg("logical replication parallel apply worker for subscription \"%s\" will stop because the subscription owner's superuser privileges have been revoked",
4968 : : MySubscription->name));
4969 : : else
690 akapila@postgresql.o 4970 [ + - ]:CBC 4 : ereport(LOG,
4971 : : errmsg("logical replication worker for subscription \"%s\" will restart because the subscription owner's superuser privileges have been revoked",
4972 : : MySubscription->name));
4973 : :
4974 : 4 : apply_worker_exit();
4975 : : }
4976 : :
4977 : : /* Check for other changes that should never happen too. */
3078 peter_e@gmx.net 4978 [ - + ]: 45 : if (newsub->dbid != MySubscription->dbid)
4979 : : {
3152 peter_e@gmx.net 4980 [ # # ]:UBC 0 : elog(ERROR, "subscription %u changed unexpectedly",
4981 : : MyLogicalRepWorker->subid);
4982 : : }
4983 : :
4984 : : /* Clean old subscription info and switch to new one. */
3152 peter_e@gmx.net 4985 :CBC 45 : FreeSubscription(MySubscription);
4986 : 45 : MySubscription = newsub;
4987 : :
4988 : 45 : MemoryContextSwitchTo(oldctx);
4989 : :
4990 : : /* Change synchronous commit according to the user's wishes */
3067 4991 : 45 : SetConfigOption("synchronous_commit", MySubscription->synccommit,
4992 : : PGC_BACKEND, PGC_S_OVERRIDE);
4993 : :
3089 4994 [ + + ]: 45 : if (started_tx)
4995 : 42 : CommitTransactionCommand();
4996 : :
3152 4997 : 45 : MySubscriptionValid = true;
4998 : : }
4999 : :
5000 : : /*
5001 : : * Callback from subscription syscache invalidation.
5002 : : */
5003 : : static void
5004 : 92 : subscription_change_cb(Datum arg, int cacheid, uint32 hashvalue)
5005 : : {
5006 : 92 : MySubscriptionValid = false;
5007 : 92 : }
5008 : :
5009 : : /*
5010 : : * subxact_info_write
5011 : : * Store information about subxacts for a toplevel transaction.
5012 : : *
5013 : : * For each subxact we store offset of its first change in the main file.
5014 : : * The file is always over-written as a whole.
5015 : : *
5016 : : * XXX We should only store subxacts that were not aborted yet.
5017 : : */
5018 : : static void
1829 akapila@postgresql.o 5019 : 372 : subxact_info_write(Oid subid, TransactionId xid)
5020 : : {
5021 : : char path[MAXPGPATH];
5022 : : Size len;
5023 : : BufFile *fd;
5024 : :
5025 [ - + ]: 372 : Assert(TransactionIdIsValid(xid));
5026 : :
5027 : : /* construct the subxact filename */
1465 5028 : 372 : subxact_filename(path, subid, xid);
5029 : :
5030 : : /* Delete the subxacts file, if exists. */
1829 5031 [ + + ]: 372 : if (subxact_data.nsubxacts == 0)
5032 : : {
1465 5033 : 290 : cleanup_subxact_info();
5034 : 290 : BufFileDeleteFileSet(MyLogicalRepWorker->stream_fileset, path, true);
5035 : :
1829 5036 : 290 : return;
5037 : : }
5038 : :
5039 : : /*
5040 : : * Create the subxact file if it not already created, otherwise open the
5041 : : * existing file.
5042 : : */
1465 5043 : 82 : fd = BufFileOpenFileSet(MyLogicalRepWorker->stream_fileset, path, O_RDWR,
5044 : : true);
5045 [ + + ]: 82 : if (fd == NULL)
5046 : 8 : fd = BufFileCreateFileSet(MyLogicalRepWorker->stream_fileset, path);
5047 : :
1829 5048 : 82 : len = sizeof(SubXactInfo) * subxact_data.nsubxacts;
5049 : :
5050 : : /* Write the subxact count and subxact info */
5051 : 82 : BufFileWrite(fd, &subxact_data.nsubxacts, sizeof(subxact_data.nsubxacts));
5052 : 82 : BufFileWrite(fd, subxact_data.subxacts, len);
5053 : :
5054 : 82 : BufFileClose(fd);
5055 : :
5056 : : /* free the memory allocated for subxact info */
5057 : 82 : cleanup_subxact_info();
5058 : : }
5059 : :
5060 : : /*
5061 : : * subxact_info_read
5062 : : * Restore information about subxacts of a streamed transaction.
5063 : : *
5064 : : * Read information about subxacts into the structure subxact_data that can be
5065 : : * used later.
5066 : : */
5067 : : static void
5068 : 344 : subxact_info_read(Oid subid, TransactionId xid)
5069 : : {
5070 : : char path[MAXPGPATH];
5071 : : Size len;
5072 : : BufFile *fd;
5073 : : MemoryContext oldctx;
5074 : :
5075 [ - + ]: 344 : Assert(!subxact_data.subxacts);
5076 [ - + ]: 344 : Assert(subxact_data.nsubxacts == 0);
5077 [ - + ]: 344 : Assert(subxact_data.nsubxacts_max == 0);
5078 : :
5079 : : /*
5080 : : * If the subxact file doesn't exist that means we don't have any subxact
5081 : : * info.
5082 : : */
5083 : 344 : subxact_filename(path, subid, xid);
1465 5084 : 344 : fd = BufFileOpenFileSet(MyLogicalRepWorker->stream_fileset, path, O_RDONLY,
5085 : : true);
5086 [ + + ]: 344 : if (fd == NULL)
5087 : 265 : return;
5088 : :
5089 : : /* read number of subxact items */
964 peter@eisentraut.org 5090 : 79 : BufFileReadExact(fd, &subxact_data.nsubxacts, sizeof(subxact_data.nsubxacts));
5091 : :
1829 akapila@postgresql.o 5092 : 79 : len = sizeof(SubXactInfo) * subxact_data.nsubxacts;
5093 : :
5094 : : /* we keep the maximum as a power of 2 */
5095 : 79 : subxact_data.nsubxacts_max = 1 << my_log2(subxact_data.nsubxacts);
5096 : :
5097 : : /*
5098 : : * Allocate subxact information in the logical streaming context. We need
5099 : : * this information during the complete stream so that we can add the sub
5100 : : * transaction info to this. On stream stop we will flush this information
5101 : : * to the subxact file and reset the logical streaming context.
5102 : : */
5103 : 79 : oldctx = MemoryContextSwitchTo(LogicalStreamingContext);
5104 : 79 : subxact_data.subxacts = palloc(subxact_data.nsubxacts_max *
5105 : : sizeof(SubXactInfo));
5106 : 79 : MemoryContextSwitchTo(oldctx);
5107 : :
964 peter@eisentraut.org 5108 [ + - ]: 79 : if (len > 0)
5109 : 79 : BufFileReadExact(fd, subxact_data.subxacts, len);
5110 : :
1829 akapila@postgresql.o 5111 : 79 : BufFileClose(fd);
5112 : : }
5113 : :
5114 : : /*
5115 : : * subxact_info_add
5116 : : * Add information about a subxact (offset in the main file).
5117 : : */
5118 : : static void
5119 : 102513 : subxact_info_add(TransactionId xid)
5120 : : {
5121 : 102513 : SubXactInfo *subxacts = subxact_data.subxacts;
5122 : : int64 i;
5123 : :
5124 : : /* We must have a valid top level stream xid and a stream fd. */
5125 [ - + ]: 102513 : Assert(TransactionIdIsValid(stream_xid));
5126 [ - + ]: 102513 : Assert(stream_fd != NULL);
5127 : :
5128 : : /*
5129 : : * If the XID matches the toplevel transaction, we don't want to add it.
5130 : : */
5131 [ + + ]: 102513 : if (stream_xid == xid)
5132 : 92389 : return;
5133 : :
5134 : : /*
5135 : : * In most cases we're checking the same subxact as we've already seen in
5136 : : * the last call, so make sure to ignore it (this change comes later).
5137 : : */
5138 [ + + ]: 10124 : if (subxact_data.subxact_last == xid)
5139 : 10048 : return;
5140 : :
5141 : : /* OK, remember we're processing this XID. */
5142 : 76 : subxact_data.subxact_last = xid;
5143 : :
5144 : : /*
5145 : : * Check if the transaction is already present in the array of subxact. We
5146 : : * intentionally scan the array from the tail, because we're likely adding
5147 : : * a change for the most recent subtransactions.
5148 : : *
5149 : : * XXX Can we rely on the subxact XIDs arriving in sorted order? That
5150 : : * would allow us to use binary search here.
5151 : : */
5152 [ + + ]: 95 : for (i = subxact_data.nsubxacts; i > 0; i--)
5153 : : {
5154 : : /* found, so we're done */
5155 [ + + ]: 76 : if (subxacts[i - 1].xid == xid)
5156 : 57 : return;
5157 : : }
5158 : :
5159 : : /* This is a new subxact, so we need to add it to the array. */
5160 [ + + ]: 19 : if (subxact_data.nsubxacts == 0)
5161 : : {
5162 : : MemoryContext oldctx;
5163 : :
5164 : 8 : subxact_data.nsubxacts_max = 128;
5165 : :
5166 : : /*
5167 : : * Allocate this memory for subxacts in per-stream context, see
5168 : : * subxact_info_read.
5169 : : */
5170 : 8 : oldctx = MemoryContextSwitchTo(LogicalStreamingContext);
5171 : 8 : subxacts = palloc(subxact_data.nsubxacts_max * sizeof(SubXactInfo));
5172 : 8 : MemoryContextSwitchTo(oldctx);
5173 : : }
5174 [ + + ]: 11 : else if (subxact_data.nsubxacts == subxact_data.nsubxacts_max)
5175 : : {
5176 : 10 : subxact_data.nsubxacts_max *= 2;
5177 : 10 : subxacts = repalloc(subxacts,
5178 : 10 : subxact_data.nsubxacts_max * sizeof(SubXactInfo));
5179 : : }
5180 : :
5181 : 19 : subxacts[subxact_data.nsubxacts].xid = xid;
5182 : :
5183 : : /*
5184 : : * Get the current offset of the stream file and store it as offset of
5185 : : * this subxact.
5186 : : */
5187 : 19 : BufFileTell(stream_fd,
5188 : 19 : &subxacts[subxact_data.nsubxacts].fileno,
5189 : 19 : &subxacts[subxact_data.nsubxacts].offset);
5190 : :
5191 : 19 : subxact_data.nsubxacts++;
5192 : 19 : subxact_data.subxacts = subxacts;
5193 : : }
5194 : :
5195 : : /* format filename for file containing the info about subxacts */
5196 : : static inline void
5197 : 747 : subxact_filename(char *path, Oid subid, TransactionId xid)
5198 : : {
5199 : 747 : snprintf(path, MAXPGPATH, "%u-%u.subxacts", subid, xid);
5200 : 747 : }
5201 : :
5202 : : /* format filename for file containing serialized changes */
5203 : : static inline void
5204 : 438 : changes_filename(char *path, Oid subid, TransactionId xid)
5205 : : {
5206 : 438 : snprintf(path, MAXPGPATH, "%u-%u.changes", subid, xid);
5207 : 438 : }
5208 : :
5209 : : /*
5210 : : * stream_cleanup_files
5211 : : * Cleanup files for a subscription / toplevel transaction.
5212 : : *
5213 : : * Remove files with serialized changes and subxact info for a particular
5214 : : * toplevel transaction. Each subscription has a separate set of files
5215 : : * for any toplevel transaction.
5216 : : */
5217 : : void
5218 : 31 : stream_cleanup_files(Oid subid, TransactionId xid)
5219 : : {
5220 : : char path[MAXPGPATH];
5221 : :
5222 : : /* Delete the changes file. */
5223 : 31 : changes_filename(path, subid, xid);
1465 5224 : 31 : BufFileDeleteFileSet(MyLogicalRepWorker->stream_fileset, path, false);
5225 : :
5226 : : /* Delete the subxact file, if it exists. */
5227 : 31 : subxact_filename(path, subid, xid);
5228 : 31 : BufFileDeleteFileSet(MyLogicalRepWorker->stream_fileset, path, true);
1829 5229 : 31 : }
5230 : :
5231 : : /*
5232 : : * stream_open_file
5233 : : * Open a file that we'll use to serialize changes for a toplevel
5234 : : * transaction.
5235 : : *
5236 : : * Open a file for streamed changes from a toplevel transaction identified
5237 : : * by stream_xid (global variable). If it's the first chunk of streamed
5238 : : * changes for this transaction, create the buffile, otherwise open the
5239 : : * previously created file.
5240 : : */
5241 : : static void
5242 : 363 : stream_open_file(Oid subid, TransactionId xid, bool first_segment)
5243 : : {
5244 : : char path[MAXPGPATH];
5245 : : MemoryContext oldcxt;
5246 : :
5247 [ - + ]: 363 : Assert(OidIsValid(subid));
5248 [ - + ]: 363 : Assert(TransactionIdIsValid(xid));
5249 [ - + ]: 363 : Assert(stream_fd == NULL);
5250 : :
5251 : :
5252 : 363 : changes_filename(path, subid, xid);
5253 [ - + ]: 363 : elog(DEBUG1, "opening file \"%s\" for streamed changes", path);
5254 : :
5255 : : /*
5256 : : * Create/open the buffiles under the logical streaming context so that we
5257 : : * have those files until stream stop.
5258 : : */
5259 : 363 : oldcxt = MemoryContextSwitchTo(LogicalStreamingContext);
5260 : :
5261 : : /*
5262 : : * If this is the first streamed segment, create the changes file.
5263 : : * Otherwise, just open the file for writing, in append mode.
5264 : : */
5265 [ + + ]: 363 : if (first_segment)
1465 5266 : 32 : stream_fd = BufFileCreateFileSet(MyLogicalRepWorker->stream_fileset,
5267 : : path);
5268 : : else
5269 : : {
5270 : : /*
5271 : : * Open the file and seek to the end of the file because we always
5272 : : * append the changes file.
5273 : : */
5274 : 331 : stream_fd = BufFileOpenFileSet(MyLogicalRepWorker->stream_fileset,
5275 : : path, O_RDWR, false);
1829 5276 : 331 : BufFileSeek(stream_fd, 0, 0, SEEK_END);
5277 : : }
5278 : :
5279 : 363 : MemoryContextSwitchTo(oldcxt);
5280 : 363 : }
5281 : :
5282 : : /*
5283 : : * stream_close_file
5284 : : * Close the currently open file with streamed changes.
5285 : : */
5286 : : static void
5287 : 393 : stream_close_file(void)
5288 : : {
5289 [ - + ]: 393 : Assert(stream_fd != NULL);
5290 : :
5291 : 393 : BufFileClose(stream_fd);
5292 : :
5293 : 393 : stream_fd = NULL;
5294 : 393 : }
5295 : :
5296 : : /*
5297 : : * stream_write_change
5298 : : * Serialize a change to a file for the current toplevel transaction.
5299 : : *
5300 : : * The change is serialized in a simple format, with length (not including
5301 : : * the length), action code (identifying the message type) and message
5302 : : * contents (without the subxact TransactionId value).
5303 : : */
5304 : : static void
5305 : 107554 : stream_write_change(char action, StringInfo s)
5306 : : {
5307 : : int len;
5308 : :
5309 [ - + ]: 107554 : Assert(stream_fd != NULL);
5310 : :
5311 : : /* total on-disk size, including the action type character */
5312 : 107554 : len = (s->len - s->cursor) + sizeof(char);
5313 : :
5314 : : /* first write the size */
5315 : 107554 : BufFileWrite(stream_fd, &len, sizeof(len));
5316 : :
5317 : : /* then the action */
5318 : 107554 : BufFileWrite(stream_fd, &action, sizeof(action));
5319 : :
5320 : : /* and finally the remaining part of the buffer (after the XID) */
5321 : 107554 : len = (s->len - s->cursor);
5322 : :
5323 : 107554 : BufFileWrite(stream_fd, &s->data[s->cursor], len);
5324 : 107554 : }
5325 : :
5326 : : /*
5327 : : * stream_open_and_write_change
5328 : : * Serialize a message to a file for the given transaction.
5329 : : *
5330 : : * This function is similar to stream_write_change except that it will open the
5331 : : * target file if not already before writing the message and close the file at
5332 : : * the end.
5333 : : */
5334 : : static void
971 5335 : 5 : stream_open_and_write_change(TransactionId xid, char action, StringInfo s)
5336 : : {
5337 [ - + ]: 5 : Assert(!in_streamed_transaction);
5338 : :
5339 [ + - ]: 5 : if (!stream_fd)
5340 : 5 : stream_start_internal(xid, false);
5341 : :
5342 : 5 : stream_write_change(action, s);
5343 : 5 : stream_stop_internal(xid);
5344 : 5 : }
5345 : :
5346 : : /*
5347 : : * Sets streaming options including replication slot name and origin start
5348 : : * position. Workers need these options for logical replication.
5349 : : */
5350 : : void
765 5351 : 402 : set_stream_options(WalRcvStreamOptions *options,
5352 : : char *slotname,
5353 : : XLogRecPtr *origin_startpos)
5354 : : {
5355 : : int server_version;
5356 : :
5357 : 402 : options->logical = true;
5358 : 402 : options->startpoint = *origin_startpos;
5359 : 402 : options->slotname = slotname;
5360 : :
5361 : 402 : server_version = walrcv_server_version(LogRepWorkerWalRcvConn);
5362 : 402 : options->proto.logical.proto_version =
5363 [ - + - - : 402 : server_version >= 160000 ? LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM :
- - ]
5364 : : server_version >= 150000 ? LOGICALREP_PROTO_TWOPHASE_VERSION_NUM :
5365 : : server_version >= 140000 ? LOGICALREP_PROTO_STREAM_VERSION_NUM :
5366 : : LOGICALREP_PROTO_VERSION_NUM;
5367 : :
5368 : 402 : options->proto.logical.publication_names = MySubscription->publications;
5369 : 402 : options->proto.logical.binary = MySubscription->binary;
5370 : :
5371 : : /*
5372 : : * Assign the appropriate option value for streaming option according to
5373 : : * the 'streaming' mode and the publisher's ability to support that mode.
5374 : : */
5375 [ + - ]: 402 : if (server_version >= 160000 &&
5376 [ + + ]: 402 : MySubscription->stream == LOGICALREP_STREAM_PARALLEL)
5377 : : {
5378 : 368 : options->proto.logical.streaming_str = "parallel";
5379 : 368 : MyLogicalRepWorker->parallel_apply = true;
5380 : : }
5381 [ + - ]: 34 : else if (server_version >= 140000 &&
5382 [ + + ]: 34 : MySubscription->stream != LOGICALREP_STREAM_OFF)
5383 : : {
5384 : 26 : options->proto.logical.streaming_str = "on";
5385 : 26 : MyLogicalRepWorker->parallel_apply = false;
5386 : : }
5387 : : else
5388 : : {
5389 : 8 : options->proto.logical.streaming_str = NULL;
5390 : 8 : MyLogicalRepWorker->parallel_apply = false;
5391 : : }
5392 : :
5393 : 402 : options->proto.logical.twophase = false;
5394 : 402 : options->proto.logical.origin = pstrdup(MySubscription->origin);
5395 : 402 : }
5396 : :
5397 : : /*
5398 : : * Cleanup the memory for subxacts and reset the related variables.
5399 : : */
5400 : : static inline void
1829 5401 : 376 : cleanup_subxact_info()
5402 : : {
5403 [ + + ]: 376 : if (subxact_data.subxacts)
5404 : 87 : pfree(subxact_data.subxacts);
5405 : :
5406 : 376 : subxact_data.subxacts = NULL;
5407 : 376 : subxact_data.subxact_last = InvalidTransactionId;
5408 : 376 : subxact_data.nsubxacts = 0;
5409 : 376 : subxact_data.nsubxacts_max = 0;
5410 : 376 : }
5411 : :
5412 : : /*
5413 : : * Common function to run the apply loop with error handling. Disable the
5414 : : * subscription, if necessary.
5415 : : *
5416 : : * Note that we don't handle FATAL errors which are probably because
5417 : : * of system resource error and are not repeatable.
5418 : : */
5419 : : void
765 5420 : 402 : start_apply(XLogRecPtr origin_startpos)
5421 : : {
1272 5422 [ + + ]: 402 : PG_TRY();
5423 : : {
765 5424 : 402 : LogicalRepApplyLoop(origin_startpos);
5425 : : }
1272 5426 : 77 : PG_CATCH();
5427 : : {
5428 : : /*
5429 : : * Reset the origin state to prevent the advancement of origin
5430 : : * progress if we fail to apply. Otherwise, this will result in
5431 : : * transaction loss as that transaction won't be sent again by the
5432 : : * server.
5433 : : */
136 5434 : 77 : replorigin_reset(0, (Datum) 0);
5435 : :
1272 5436 [ + + ]: 77 : if (MySubscription->disableonerr)
5437 : 3 : DisableSubscriptionAndExit();
5438 : : else
5439 : : {
5440 : : /*
5441 : : * Report the worker failed while applying changes. Abort the
5442 : : * current transaction so that the stats message is sent in an
5443 : : * idle state.
5444 : : */
5445 : 74 : AbortOutOfAnyTransaction();
765 5446 : 74 : pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_worker());
5447 : :
1272 5448 : 74 : PG_RE_THROW();
5449 : : }
5450 : : }
1272 akapila@postgresql.o 5451 [ # # ]:UBC 0 : PG_END_TRY();
5452 : 0 : }
5453 : :
5454 : : /*
5455 : : * Runs the leader apply worker.
5456 : : *
5457 : : * It sets up replication origin, streaming options and then starts streaming.
5458 : : */
5459 : : static void
765 akapila@postgresql.o 5460 :CBC 257 : run_apply_worker()
5461 : : {
5462 : : char originname[NAMEDATALEN];
5463 : 257 : XLogRecPtr origin_startpos = InvalidXLogRecPtr;
5464 : 257 : char *slotname = NULL;
5465 : : WalRcvStreamOptions options;
5466 : : RepOriginId originid;
5467 : : TimeLineID startpointTLI;
5468 : : char *err;
5469 : : bool must_use_password;
5470 : :
5471 : 257 : slotname = MySubscription->slotname;
5472 : :
5473 : : /*
5474 : : * This shouldn't happen if the subscription is enabled, but guard against
5475 : : * DDL bugs or manual catalog changes. (libpqwalreceiver will crash if
5476 : : * slot is NULL.)
5477 : : */
5478 [ - + ]: 257 : if (!slotname)
765 akapila@postgresql.o 5479 [ # # ]:UBC 0 : ereport(ERROR,
5480 : : (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
5481 : : errmsg("subscription has no replication slot set")));
5482 : :
5483 : : /* Setup replication origin tracking. */
765 akapila@postgresql.o 5484 :CBC 257 : ReplicationOriginNameForLogicalRep(MySubscription->oid, InvalidOid,
5485 : : originname, sizeof(originname));
5486 : 257 : StartTransactionCommand();
5487 : 257 : originid = replorigin_by_name(originname, true);
5488 [ - + ]: 257 : if (!OidIsValid(originid))
765 akapila@postgresql.o 5489 :UBC 0 : originid = replorigin_create(originname);
765 akapila@postgresql.o 5490 :CBC 257 : replorigin_session_setup(originid, 0);
5491 : 257 : replorigin_session_origin = originid;
5492 : 257 : origin_startpos = replorigin_session_get_progress(false);
690 5493 : 257 : CommitTransactionCommand();
5494 : :
5495 : : /* Is the use of a password mandatory? */
765 5496 [ + + ]: 492 : must_use_password = MySubscription->passwordrequired &&
690 5497 [ + + ]: 235 : !MySubscription->ownersuperuser;
5498 : :
765 5499 : 257 : LogRepWorkerWalRcvConn = walrcv_connect(MySubscription->conninfo, true,
5500 : : true, must_use_password,
5501 : : MySubscription->name, &err);
5502 : :
5503 [ + + ]: 250 : if (LogRepWorkerWalRcvConn == NULL)
5504 [ + - ]: 31 : ereport(ERROR,
5505 : : (errcode(ERRCODE_CONNECTION_FAILURE),
5506 : : errmsg("apply worker for subscription \"%s\" could not connect to the publisher: %s",
5507 : : MySubscription->name, err)));
5508 : :
5509 : : /*
5510 : : * We don't really use the output identify_system for anything but it does
5511 : : * some initializations on the upstream so let's still call it.
5512 : : */
5513 : 219 : (void) walrcv_identify_system(LogRepWorkerWalRcvConn, &startpointTLI);
5514 : :
5515 : 219 : set_apply_error_context_origin(originname);
5516 : :
5517 : 219 : set_stream_options(&options, slotname, &origin_startpos);
5518 : :
5519 : : /*
5520 : : * Even when the two_phase mode is requested by the user, it remains as
5521 : : * the tri-state PENDING until all tablesyncs have reached READY state.
5522 : : * Only then, can it become ENABLED.
5523 : : *
5524 : : * Note: If the subscription has no tables then leave the state as
5525 : : * PENDING, which allows ALTER SUBSCRIPTION ... REFRESH PUBLICATION to
5526 : : * work.
5527 : : */
5528 [ + + + + ]: 235 : if (MySubscription->twophasestate == LOGICALREP_TWOPHASE_STATE_PENDING &&
5529 : 16 : AllTablesyncsReady())
5530 : : {
5531 : : /* Start streaming with two_phase enabled */
5532 : 9 : options.proto.logical.twophase = true;
5533 : 9 : walrcv_startstreaming(LogRepWorkerWalRcvConn, &options);
5534 : :
5535 : 9 : StartTransactionCommand();
5536 : :
5537 : : /*
5538 : : * Updating pg_subscription might involve TOAST table access, so
5539 : : * ensure we have a valid snapshot.
5540 : : */
99 nathan@postgresql.or 5541 : 9 : PushActiveSnapshot(GetTransactionSnapshot());
5542 : :
765 akapila@postgresql.o 5543 : 9 : UpdateTwoPhaseState(MySubscription->oid, LOGICALREP_TWOPHASE_STATE_ENABLED);
5544 : 9 : MySubscription->twophasestate = LOGICALREP_TWOPHASE_STATE_ENABLED;
99 nathan@postgresql.or 5545 : 9 : PopActiveSnapshot();
765 akapila@postgresql.o 5546 : 9 : CommitTransactionCommand();
5547 : : }
5548 : : else
5549 : : {
5550 : 210 : walrcv_startstreaming(LogRepWorkerWalRcvConn, &options);
5551 : : }
5552 : :
5553 [ + + + + : 219 : ereport(DEBUG1,
+ - + - ]
5554 : : (errmsg_internal("logical replication apply worker for subscription \"%s\" two_phase is %s",
5555 : : MySubscription->name,
5556 : : MySubscription->twophasestate == LOGICALREP_TWOPHASE_STATE_DISABLED ? "DISABLED" :
5557 : : MySubscription->twophasestate == LOGICALREP_TWOPHASE_STATE_PENDING ? "PENDING" :
5558 : : MySubscription->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED ? "ENABLED" :
5559 : : "?")));
5560 : :
5561 : : /* Run the main loop. */
5562 : 219 : start_apply(origin_startpos);
1272 akapila@postgresql.o 5563 :UBC 0 : }
5564 : :
5565 : : /*
5566 : : * Common initialization for leader apply worker, parallel apply worker and
5567 : : * tablesync worker.
5568 : : *
5569 : : * Initialize the database connection, in-memory subscription and necessary
5570 : : * config options.
5571 : : */
5572 : : void
765 akapila@postgresql.o 5573 :CBC 528 : InitializeLogRepWorker(void)
5574 : : {
5575 : : MemoryContext oldctx;
5576 : :
5577 : : /* Run as replica session replication role. */
3152 peter_e@gmx.net 5578 : 528 : SetConfigOption("session_replication_role", "replica",
5579 : : PGC_SUSET, PGC_S_OVERRIDE);
5580 : :
5581 : : /* Connect to our database. */
5582 : 528 : BackgroundWorkerInitializeConnectionByOid(MyLogicalRepWorker->dbid,
2711 magnus@hagander.net 5583 : 528 : MyLogicalRepWorker->userid,
5584 : : 0);
5585 : :
5586 : : /*
5587 : : * Set always-secure search path, so malicious users can't redirect user
5588 : : * code (e.g. pg_index.indexprs).
5589 : : */
1853 noah@leadboat.com 5590 : 524 : SetConfigOption("search_path", "", PGC_SUSET, PGC_S_OVERRIDE);
5591 : :
5592 : : /* Load the subscription into persistent memory context. */
3042 peter_e@gmx.net 5593 : 524 : ApplyContext = AllocSetContextCreate(TopMemoryContext,
5594 : : "ApplyContext",
5595 : : ALLOCSET_DEFAULT_SIZES);
3152 5596 : 524 : StartTransactionCommand();
3042 5597 : 524 : oldctx = MemoryContextSwitchTo(ApplyContext);
5598 : :
5599 : : /*
5600 : : * Lock the subscription to prevent it from being concurrently dropped,
5601 : : * then re-verify its existence. After the initialization, the worker will
5602 : : * be terminated gracefully if the subscription is dropped.
5603 : : */
18 akapila@postgresql.o 5604 : 524 : LockSharedObject(SubscriptionRelationId, MyLogicalRepWorker->subid, 0,
5605 : : AccessShareLock);
2710 peter_e@gmx.net 5606 : 522 : MySubscription = GetSubscription(MyLogicalRepWorker->subid, true);
5607 [ + + ]: 522 : if (!MySubscription)
5608 : : {
5609 [ + - ]: 60 : ereport(LOG,
5610 : : (errmsg("logical replication worker for subscription %u will not start because the subscription was removed during startup",
5611 : : MyLogicalRepWorker->subid)));
5612 : :
5613 : : /* Ensure we remove no-longer-useful entry for worker's start time */
764 akapila@postgresql.o 5614 [ + - ]: 60 : if (am_leader_apply_worker())
958 tgl@sss.pgh.pa.us 5615 : 60 : ApplyLauncherForgetWorkerStartTime(MyLogicalRepWorker->subid);
5616 : :
2710 peter_e@gmx.net 5617 : 60 : proc_exit(0);
5618 : : }
5619 : :
3152 5620 : 462 : MySubscriptionValid = true;
5621 : 462 : MemoryContextSwitchTo(oldctx);
5622 : :
5623 [ - + ]: 462 : if (!MySubscription->enabled)
5624 : : {
3152 peter_e@gmx.net 5625 [ # # ]:UBC 0 : ereport(LOG,
5626 : : (errmsg("logical replication worker for subscription \"%s\" will not start because the subscription was disabled during startup",
5627 : : MySubscription->name)));
5628 : :
971 akapila@postgresql.o 5629 : 0 : apply_worker_exit();
5630 : : }
5631 : :
5632 : : /*
5633 : : * Restart the worker if retain_dead_tuples was enabled during startup.
5634 : : *
5635 : : * At this point, the replication slot used for conflict detection might
5636 : : * not exist yet, or could be dropped soon if the launcher perceives
5637 : : * retain_dead_tuples as disabled. To avoid unnecessary tracking of
5638 : : * oldest_nonremovable_xid when the slot is absent or at risk of being
5639 : : * dropped, a restart is initiated.
5640 : : *
5641 : : * The oldest_nonremovable_xid should be initialized only when the
5642 : : * subscription's retention is active before launching the worker. See
5643 : : * logicalrep_worker_launch.
5644 : : */
45 akapila@postgresql.o 5645 [ + + ]:GNC 462 : if (am_leader_apply_worker() &&
5646 [ + + ]: 257 : MySubscription->retaindeadtuples &&
4 5647 [ + - ]: 10 : MySubscription->retentionactive &&
45 5648 [ - + ]: 10 : !TransactionIdIsValid(MyLogicalRepWorker->oldest_nonremovable_xid))
5649 : : {
45 akapila@postgresql.o 5650 [ # # ]:UNC 0 : ereport(LOG,
5651 : : errmsg("logical replication worker for subscription \"%s\" will restart because the option %s was enabled during startup",
5652 : : MySubscription->name, "retain_dead_tuples"));
5653 : :
5654 : 0 : apply_worker_exit();
5655 : : }
5656 : :
5657 : : /* Setup synchronous commit according to the user's wishes */
2710 peter_e@gmx.net 5658 :CBC 462 : SetConfigOption("synchronous_commit", MySubscription->synccommit,
5659 : : PGC_BACKEND, PGC_S_OVERRIDE);
5660 : :
5661 : : /*
5662 : : * Keep us informed about subscription or role changes. Note that the
5663 : : * role's superuser privilege can be revoked.
5664 : : */
3152 5665 : 462 : CacheRegisterSyscacheCallback(SUBSCRIPTIONOID,
5666 : : subscription_change_cb,
5667 : : (Datum) 0);
5668 : :
690 akapila@postgresql.o 5669 : 462 : CacheRegisterSyscacheCallback(AUTHOID,
5670 : : subscription_change_cb,
5671 : : (Datum) 0);
5672 : :
3089 peter_e@gmx.net 5673 [ + + ]: 462 : if (am_tablesync_worker())
3027 5674 [ + - ]: 195 : ereport(LOG,
5675 : : (errmsg("logical replication table synchronization worker for subscription \"%s\", table \"%s\" has started",
5676 : : MySubscription->name,
5677 : : get_rel_name(MyLogicalRepWorker->relid))));
5678 : : else
5679 [ + - ]: 267 : ereport(LOG,
5680 : : (errmsg("logical replication apply worker for subscription \"%s\" has started",
5681 : : MySubscription->name)));
5682 : :
3152 5683 : 462 : CommitTransactionCommand();
971 akapila@postgresql.o 5684 : 462 : }
5685 : :
5686 : : /*
5687 : : * Reset the origin state.
5688 : : */
5689 : : static void
381 5690 : 529 : replorigin_reset(int code, Datum arg)
5691 : : {
5692 : 529 : replorigin_session_origin = InvalidRepOriginId;
5693 : 529 : replorigin_session_origin_lsn = InvalidXLogRecPtr;
5694 : 529 : replorigin_session_origin_timestamp = 0;
5695 : 529 : }
5696 : :
5697 : : /* Common function to setup the leader apply or tablesync worker. */
5698 : : void
765 5699 : 518 : SetupApplyOrSyncWorker(int worker_slot)
5700 : : {
5701 : : /* Attach to slot */
971 5702 : 518 : logicalrep_worker_attach(worker_slot);
5703 : :
765 5704 [ + + - + ]: 518 : Assert(am_tablesync_worker() || am_leader_apply_worker());
5705 : :
5706 : : /* Setup signal handling */
971 5707 : 518 : pqsignal(SIGHUP, SignalHandlerForConfigReload);
5708 : 518 : pqsignal(SIGTERM, die);
5709 : 518 : BackgroundWorkerUnblockSignals();
5710 : :
5711 : : /*
5712 : : * We don't currently need any ResourceOwner in a walreceiver process, but
5713 : : * if we did, we could call CreateAuxProcessResourceOwner here.
5714 : : */
5715 : :
5716 : : /* Initialise stats to a sanish value */
5717 : 518 : MyLogicalRepWorker->last_send_time = MyLogicalRepWorker->last_recv_time =
5718 : 518 : MyLogicalRepWorker->reply_time = GetCurrentTimestamp();
5719 : :
5720 : : /* Load the libpq-specific functions */
5721 : 518 : load_file("libpqwalreceiver", false);
5722 : :
765 5723 : 518 : InitializeLogRepWorker();
5724 : :
5725 : : /*
5726 : : * Register a callback to reset the origin state before aborting any
5727 : : * pending transaction during shutdown (see ShutdownPostgres()). This will
5728 : : * avoid origin advancement for an in-complete transaction which could
5729 : : * otherwise lead to its loss as such a transaction won't be sent by the
5730 : : * server again.
5731 : : *
5732 : : * Note that even a LOG or DEBUG statement placed after setting the origin
5733 : : * state may process a shutdown signal before committing the current apply
5734 : : * operation. So, it is important to register such a callback here.
5735 : : */
381 5736 : 452 : before_shmem_exit(replorigin_reset, (Datum) 0);
5737 : :
5738 : : /* Connect to the origin and start the replication. */
3152 peter_e@gmx.net 5739 [ + + ]: 452 : elog(DEBUG1, "connecting to publisher using connection string \"%s\"",
5740 : : MySubscription->conninfo);
5741 : :
5742 : : /*
5743 : : * Setup callback for syscache so that we know when something changes in
5744 : : * the subscription relation state.
5745 : : */
3089 5746 : 452 : CacheRegisterSyscacheCallback(SUBSCRIPTIONRELMAP,
5747 : : invalidate_syncing_table_states,
5748 : : (Datum) 0);
765 akapila@postgresql.o 5749 : 452 : }
5750 : :
5751 : : /* Logical Replication Apply worker entry point */
5752 : : void
5753 : 320 : ApplyWorkerMain(Datum main_arg)
5754 : : {
5755 : 320 : int worker_slot = DatumGetInt32(main_arg);
5756 : :
5757 : 320 : InitializingApplyWorker = true;
5758 : :
5759 : 320 : SetupApplyOrSyncWorker(worker_slot);
5760 : :
5761 : 257 : InitializingApplyWorker = false;
5762 : :
5763 : 257 : run_apply_worker();
5764 : :
1272 akapila@postgresql.o 5765 :UBC 0 : proc_exit(0);
5766 : : }
5767 : :
5768 : : /*
5769 : : * After error recovery, disable the subscription in a new transaction
5770 : : * and exit cleanly.
5771 : : */
5772 : : void
1272 akapila@postgresql.o 5773 :CBC 4 : DisableSubscriptionAndExit(void)
5774 : : {
5775 : : /*
5776 : : * Emit the error message, and recover from the error state to an idle
5777 : : * state
5778 : : */
5779 : 4 : HOLD_INTERRUPTS();
5780 : :
5781 : 4 : EmitErrorReport();
5782 : 4 : AbortOutOfAnyTransaction();
5783 : 4 : FlushErrorState();
5784 : :
5785 [ - + ]: 4 : RESUME_INTERRUPTS();
5786 : :
5787 : : /* Report the worker failed during either table synchronization or apply */
5788 : 4 : pgstat_report_subscription_error(MyLogicalRepWorker->subid,
5789 : 4 : !am_tablesync_worker());
5790 : :
5791 : : /* Disable the subscription */
5792 : 4 : StartTransactionCommand();
5793 : :
5794 : : /*
5795 : : * Updating pg_subscription might involve TOAST table access, so ensure we
5796 : : * have a valid snapshot.
5797 : : */
99 nathan@postgresql.or 5798 : 4 : PushActiveSnapshot(GetTransactionSnapshot());
5799 : :
1272 akapila@postgresql.o 5800 : 4 : DisableSubscription(MySubscription->oid);
99 nathan@postgresql.or 5801 : 4 : PopActiveSnapshot();
1272 akapila@postgresql.o 5802 : 4 : CommitTransactionCommand();
5803 : :
5804 : : /* Ensure we remove no-longer-useful entry for worker's start time */
764 5805 [ + + ]: 4 : if (am_leader_apply_worker())
958 tgl@sss.pgh.pa.us 5806 : 3 : ApplyLauncherForgetWorkerStartTime(MyLogicalRepWorker->subid);
5807 : :
5808 : : /* Notify the subscription has been disabled and exit */
1272 akapila@postgresql.o 5809 [ + - ]: 4 : ereport(LOG,
5810 : : errmsg("subscription \"%s\" has been disabled because of an error",
5811 : : MySubscription->name));
5812 : :
5813 : : /*
5814 : : * Skip the track_commit_timestamp check when disabling the worker due to
5815 : : * an error, as verifying commit timestamps is unnecessary in this
5816 : : * context.
5817 : : */
4 akapila@postgresql.o 5818 :GNC 4 : CheckSubDeadTupleRetention(false, true, WARNING,
5819 : 4 : MySubscription->retaindeadtuples,
5820 : 4 : MySubscription->retentionactive, false);
5821 : :
3152 peter_e@gmx.net 5822 :CBC 4 : proc_exit(0);
5823 : : }
5824 : :
5825 : : /*
5826 : : * Is current process a logical replication worker?
5827 : : */
5828 : : bool
3018 5829 : 1976 : IsLogicalWorker(void)
5830 : : {
5831 : 1976 : return MyLogicalRepWorker != NULL;
5832 : : }
5833 : :
5834 : : /*
5835 : : * Is current process a logical replication parallel apply worker?
5836 : : */
5837 : : bool
971 akapila@postgresql.o 5838 : 1387 : IsLogicalParallelApplyWorker(void)
5839 : : {
5840 [ + + + - ]: 1387 : return IsLogicalWorker() && am_parallel_apply_worker();
5841 : : }
5842 : :
5843 : : /*
5844 : : * Start skipping changes of the transaction if the given LSN matches the
5845 : : * LSN specified by subscription's skiplsn.
5846 : : */
5847 : : static void
1264 5848 : 524 : maybe_start_skipping_changes(XLogRecPtr finish_lsn)
5849 : : {
5850 [ - + ]: 524 : Assert(!is_skipping_changes());
5851 [ - + ]: 524 : Assert(!in_remote_transaction);
5852 [ - + ]: 524 : Assert(!in_streamed_transaction);
5853 : :
5854 : : /*
5855 : : * Quick return if it's not requested to skip this transaction. This
5856 : : * function is called for every remote transaction and we assume that
5857 : : * skipping the transaction is not used often.
5858 : : */
5859 [ + + - + : 524 : if (likely(XLogRecPtrIsInvalid(MySubscription->skiplsn) ||
+ + ]
5860 : : MySubscription->skiplsn != finish_lsn))
5861 : 521 : return;
5862 : :
5863 : : /* Start skipping all changes of this transaction */
5864 : 3 : skip_xact_finish_lsn = finish_lsn;
5865 : :
5866 [ + - ]: 3 : ereport(LOG,
5867 : : errmsg("logical replication starts skipping transaction at LSN %X/%08X",
5868 : : LSN_FORMAT_ARGS(skip_xact_finish_lsn)));
5869 : : }
5870 : :
5871 : : /*
5872 : : * Stop skipping changes by resetting skip_xact_finish_lsn if enabled.
5873 : : */
5874 : : static void
5875 : 27 : stop_skipping_changes(void)
5876 : : {
5877 [ + + ]: 27 : if (!is_skipping_changes())
5878 : 24 : return;
5879 : :
5880 [ + - ]: 3 : ereport(LOG,
5881 : : errmsg("logical replication completed skipping transaction at LSN %X/%08X",
5882 : : LSN_FORMAT_ARGS(skip_xact_finish_lsn)));
5883 : :
5884 : : /* Stop skipping changes */
5885 : 3 : skip_xact_finish_lsn = InvalidXLogRecPtr;
5886 : : }
5887 : :
5888 : : /*
5889 : : * Clear subskiplsn of pg_subscription catalog.
5890 : : *
5891 : : * finish_lsn is the transaction's finish LSN that is used to check if the
5892 : : * subskiplsn matches it. If not matched, we raise a warning when clearing the
5893 : : * subskiplsn in order to inform users for cases e.g., where the user mistakenly
5894 : : * specified the wrong subskiplsn.
5895 : : */
5896 : : static void
5897 : 522 : clear_subscription_skip_lsn(XLogRecPtr finish_lsn)
5898 : : {
5899 : : Relation rel;
5900 : : Form_pg_subscription subform;
5901 : : HeapTuple tup;
5902 : 522 : XLogRecPtr myskiplsn = MySubscription->skiplsn;
5903 : 522 : bool started_tx = false;
5904 : :
971 5905 [ + + - + ]: 522 : if (likely(XLogRecPtrIsInvalid(myskiplsn)) || am_parallel_apply_worker())
1264 5906 : 519 : return;
5907 : :
5908 [ + + ]: 3 : if (!IsTransactionState())
5909 : : {
5910 : 1 : StartTransactionCommand();
5911 : 1 : started_tx = true;
5912 : : }
5913 : :
5914 : : /*
5915 : : * Updating pg_subscription might involve TOAST table access, so ensure we
5916 : : * have a valid snapshot.
5917 : : */
99 nathan@postgresql.or 5918 : 3 : PushActiveSnapshot(GetTransactionSnapshot());
5919 : :
5920 : : /*
5921 : : * Protect subskiplsn of pg_subscription from being concurrently updated
5922 : : * while clearing it.
5923 : : */
1264 akapila@postgresql.o 5924 : 3 : LockSharedObject(SubscriptionRelationId, MySubscription->oid, 0,
5925 : : AccessShareLock);
5926 : :
5927 : 3 : rel = table_open(SubscriptionRelationId, RowExclusiveLock);
5928 : :
5929 : : /* Fetch the existing tuple. */
5930 : 3 : tup = SearchSysCacheCopy1(SUBSCRIPTIONOID,
5931 : : ObjectIdGetDatum(MySubscription->oid));
5932 : :
5933 [ - + ]: 3 : if (!HeapTupleIsValid(tup))
1264 akapila@postgresql.o 5934 [ # # ]:UBC 0 : elog(ERROR, "subscription \"%s\" does not exist", MySubscription->name);
5935 : :
1264 akapila@postgresql.o 5936 :CBC 3 : subform = (Form_pg_subscription) GETSTRUCT(tup);
5937 : :
5938 : : /*
5939 : : * Clear the subskiplsn. If the user has already changed subskiplsn before
5940 : : * clearing it we don't update the catalog and the replication origin
5941 : : * state won't get advanced. So in the worst case, if the server crashes
5942 : : * before sending an acknowledgment of the flush position the transaction
5943 : : * will be sent again and the user needs to set subskiplsn again. We can
5944 : : * reduce the possibility by logging a replication origin WAL record to
5945 : : * advance the origin LSN instead but there is no way to advance the
5946 : : * origin timestamp and it doesn't seem to be worth doing anything about
5947 : : * it since it's a very rare case.
5948 : : */
5949 [ + - ]: 3 : if (subform->subskiplsn == myskiplsn)
5950 : : {
5951 : : bool nulls[Natts_pg_subscription];
5952 : : bool replaces[Natts_pg_subscription];
5953 : : Datum values[Natts_pg_subscription];
5954 : :
5955 : 3 : memset(values, 0, sizeof(values));
5956 : 3 : memset(nulls, false, sizeof(nulls));
5957 : 3 : memset(replaces, false, sizeof(replaces));
5958 : :
5959 : : /* reset subskiplsn */
5960 : 3 : values[Anum_pg_subscription_subskiplsn - 1] = LSNGetDatum(InvalidXLogRecPtr);
5961 : 3 : replaces[Anum_pg_subscription_subskiplsn - 1] = true;
5962 : :
5963 : 3 : tup = heap_modify_tuple(tup, RelationGetDescr(rel), values, nulls,
5964 : : replaces);
5965 : 3 : CatalogTupleUpdate(rel, &tup->t_self, tup);
5966 : :
5967 [ - + ]: 3 : if (myskiplsn != finish_lsn)
1264 akapila@postgresql.o 5968 [ # # ]:UBC 0 : ereport(WARNING,
5969 : : errmsg("skip-LSN of subscription \"%s\" cleared", MySubscription->name),
5970 : : errdetail("Remote transaction's finish WAL location (LSN) %X/%08X did not match skip-LSN %X/%08X.",
5971 : : LSN_FORMAT_ARGS(finish_lsn),
5972 : : LSN_FORMAT_ARGS(myskiplsn)));
5973 : : }
5974 : :
1264 akapila@postgresql.o 5975 :CBC 3 : heap_freetuple(tup);
5976 : 3 : table_close(rel, NoLock);
5977 : :
99 nathan@postgresql.or 5978 : 3 : PopActiveSnapshot();
5979 : :
1264 akapila@postgresql.o 5980 [ + + ]: 3 : if (started_tx)
5981 : 1 : CommitTransactionCommand();
5982 : : }
5983 : :
5984 : : /* Error callback to give more context info about the change being applied */
5985 : : void
1471 5986 : 856 : apply_error_callback(void *arg)
5987 : : {
5988 : 856 : ApplyErrorCallbackArg *errarg = &apply_error_callback_arg;
5989 : :
5990 [ + + ]: 856 : if (apply_error_callback_arg.command == 0)
5991 : 464 : return;
5992 : :
1278 5993 [ - + ]: 392 : Assert(errarg->origin_name);
5994 : :
1279 5995 [ + + ]: 392 : if (errarg->rel == NULL)
5996 : : {
5997 [ - + ]: 325 : if (!TransactionIdIsValid(errarg->remote_xid))
1078 peter@eisentraut.org 5998 :UBC 0 : errcontext("processing remote data for replication origin \"%s\" during message type \"%s\"",
5999 : : errarg->origin_name,
6000 : : logicalrep_message_type(errarg->command));
1278 akapila@postgresql.o 6001 [ + + ]:CBC 325 : else if (XLogRecPtrIsInvalid(errarg->finish_lsn))
1078 peter@eisentraut.org 6002 : 267 : errcontext("processing remote data for replication origin \"%s\" during message type \"%s\" in transaction %u",
6003 : : errarg->origin_name,
6004 : : logicalrep_message_type(errarg->command),
6005 : : errarg->remote_xid);
6006 : : else
61 alvherre@kurilemu.de 6007 :GNC 116 : errcontext("processing remote data for replication origin \"%s\" during message type \"%s\" in transaction %u, finished at %X/%08X",
6008 : : errarg->origin_name,
6009 : : logicalrep_message_type(errarg->command),
6010 : : errarg->remote_xid,
1278 akapila@postgresql.o 6011 :CBC 58 : LSN_FORMAT_ARGS(errarg->finish_lsn));
6012 : : }
6013 : : else
6014 : : {
971 6015 [ + - ]: 67 : if (errarg->remote_attnum < 0)
6016 : : {
6017 [ - + ]: 67 : if (XLogRecPtrIsInvalid(errarg->finish_lsn))
971 akapila@postgresql.o 6018 :UBC 0 : errcontext("processing remote data for replication origin \"%s\" during message type \"%s\" for replication target relation \"%s.%s\" in transaction %u",
6019 : : errarg->origin_name,
6020 : : logicalrep_message_type(errarg->command),
6021 : 0 : errarg->rel->remoterel.nspname,
6022 : 0 : errarg->rel->remoterel.relname,
6023 : : errarg->remote_xid);
6024 : : else
61 alvherre@kurilemu.de 6025 :GNC 134 : errcontext("processing remote data for replication origin \"%s\" during message type \"%s\" for replication target relation \"%s.%s\" in transaction %u, finished at %X/%08X",
6026 : : errarg->origin_name,
6027 : : logicalrep_message_type(errarg->command),
971 akapila@postgresql.o 6028 :CBC 67 : errarg->rel->remoterel.nspname,
6029 : 67 : errarg->rel->remoterel.relname,
6030 : : errarg->remote_xid,
6031 : 67 : LSN_FORMAT_ARGS(errarg->finish_lsn));
6032 : : }
6033 : : else
6034 : : {
971 akapila@postgresql.o 6035 [ # # ]:UBC 0 : if (XLogRecPtrIsInvalid(errarg->finish_lsn))
6036 : 0 : errcontext("processing remote data for replication origin \"%s\" during message type \"%s\" for replication target relation \"%s.%s\" column \"%s\" in transaction %u",
6037 : : errarg->origin_name,
6038 : : logicalrep_message_type(errarg->command),
6039 : 0 : errarg->rel->remoterel.nspname,
6040 : 0 : errarg->rel->remoterel.relname,
6041 : 0 : errarg->rel->remoterel.attnames[errarg->remote_attnum],
6042 : : errarg->remote_xid);
6043 : : else
61 alvherre@kurilemu.de 6044 :UNC 0 : errcontext("processing remote data for replication origin \"%s\" during message type \"%s\" for replication target relation \"%s.%s\" column \"%s\" in transaction %u, finished at %X/%08X",
6045 : : errarg->origin_name,
6046 : : logicalrep_message_type(errarg->command),
971 akapila@postgresql.o 6047 :UBC 0 : errarg->rel->remoterel.nspname,
6048 : 0 : errarg->rel->remoterel.relname,
6049 : 0 : errarg->rel->remoterel.attnames[errarg->remote_attnum],
6050 : : errarg->remote_xid,
6051 : 0 : LSN_FORMAT_ARGS(errarg->finish_lsn));
6052 : : }
6053 : : }
6054 : : }
6055 : :
6056 : : /* Set transaction information of apply error callback */
6057 : : static inline void
1278 akapila@postgresql.o 6058 :CBC 2922 : set_apply_error_context_xact(TransactionId xid, XLogRecPtr lsn)
6059 : : {
1471 6060 : 2922 : apply_error_callback_arg.remote_xid = xid;
1278 6061 : 2922 : apply_error_callback_arg.finish_lsn = lsn;
1471 6062 : 2922 : }
6063 : :
6064 : : /* Reset all information of apply error callback */
6065 : : static inline void
6066 : 1433 : reset_apply_error_context_info(void)
6067 : : {
6068 : 1433 : apply_error_callback_arg.command = 0;
6069 : 1433 : apply_error_callback_arg.rel = NULL;
6070 : 1433 : apply_error_callback_arg.remote_attnum = -1;
1278 6071 : 1433 : set_apply_error_context_xact(InvalidTransactionId, InvalidXLogRecPtr);
1471 6072 : 1433 : }
6073 : :
6074 : : /*
6075 : : * Request wakeup of the workers for the given subscription OID
6076 : : * at commit of the current transaction.
6077 : : *
6078 : : * This is used to ensure that the workers process assorted changes
6079 : : * as soon as possible.
6080 : : */
6081 : : void
974 tgl@sss.pgh.pa.us 6082 : 215 : LogicalRepWorkersWakeupAtCommit(Oid subid)
6083 : : {
6084 : : MemoryContext oldcxt;
6085 : :
6086 : 215 : oldcxt = MemoryContextSwitchTo(TopTransactionContext);
6087 : 215 : on_commit_wakeup_workers_subids =
6088 : 215 : list_append_unique_oid(on_commit_wakeup_workers_subids, subid);
6089 : 215 : MemoryContextSwitchTo(oldcxt);
6090 : 215 : }
6091 : :
6092 : : /*
6093 : : * Wake up the workers of any subscriptions that were changed in this xact.
6094 : : */
6095 : : void
6096 : 318583 : AtEOXact_LogicalRepWorkers(bool isCommit)
6097 : : {
6098 [ + + + + ]: 318583 : if (isCommit && on_commit_wakeup_workers_subids != NIL)
6099 : : {
6100 : : ListCell *lc;
6101 : :
6102 : 210 : LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
6103 [ + - + + : 420 : foreach(lc, on_commit_wakeup_workers_subids)
+ + ]
6104 : : {
6105 : 210 : Oid subid = lfirst_oid(lc);
6106 : : List *workers;
6107 : : ListCell *lc2;
6108 : :
409 akapila@postgresql.o 6109 : 210 : workers = logicalrep_workers_find(subid, true, false);
974 tgl@sss.pgh.pa.us 6110 [ + + + + : 273 : foreach(lc2, workers)
+ + ]
6111 : : {
6112 : 63 : LogicalRepWorker *worker = (LogicalRepWorker *) lfirst(lc2);
6113 : :
6114 : 63 : logicalrep_worker_wakeup_ptr(worker);
6115 : : }
6116 : : }
6117 : 210 : LWLockRelease(LogicalRepWorkerLock);
6118 : : }
6119 : :
6120 : : /* The List storage will be reclaimed automatically in xact cleanup. */
6121 : 318583 : on_commit_wakeup_workers_subids = NIL;
6122 : 318583 : }
6123 : :
6124 : : /*
6125 : : * Allocate the origin name in long-lived context for error context message.
6126 : : */
6127 : : void
971 akapila@postgresql.o 6128 : 412 : set_apply_error_context_origin(char *originname)
6129 : : {
6130 : 412 : apply_error_callback_arg.origin_name = MemoryContextStrdup(ApplyContext,
6131 : : originname);
6132 : 412 : }
6133 : :
6134 : : /*
6135 : : * Return the action to be taken for the given transaction. See
6136 : : * TransApplyAction for information on each of the actions.
6137 : : *
6138 : : * *winfo is assigned to the destination parallel worker info when the leader
6139 : : * apply worker has to pass all the transaction's changes to the parallel
6140 : : * apply worker.
6141 : : */
6142 : : static TransApplyAction
6143 : 326315 : get_transaction_apply_action(TransactionId xid, ParallelApplyWorkerInfo **winfo)
6144 : : {
6145 : 326315 : *winfo = NULL;
6146 : :
6147 [ + + ]: 326315 : if (am_parallel_apply_worker())
6148 : : {
6149 : 68985 : return TRANS_PARALLEL_APPLY;
6150 : : }
6151 : :
6152 : : /*
6153 : : * If we are processing this transaction using a parallel apply worker
6154 : : * then either we send the changes to the parallel worker or if the worker
6155 : : * is busy then serialize the changes to the file which will later be
6156 : : * processed by the parallel worker.
6157 : : */
6158 : 257330 : *winfo = pa_find_worker(xid);
6159 : :
963 6160 [ + + + + ]: 257330 : if (*winfo && (*winfo)->serialize_changes)
6161 : : {
6162 : 5037 : return TRANS_LEADER_PARTIAL_SERIALIZE;
6163 : : }
6164 [ + + ]: 252293 : else if (*winfo)
6165 : : {
6166 : 68914 : return TRANS_LEADER_SEND_TO_PARALLEL;
6167 : : }
6168 : :
6169 : : /*
6170 : : * If there is no parallel worker involved to process this transaction
6171 : : * then we either directly apply the change or serialize it to a file
6172 : : * which will later be applied when the transaction finish message is
6173 : : * processed.
6174 : : */
6175 [ + + ]: 183379 : else if (in_streamed_transaction)
6176 : : {
6177 : 103199 : return TRANS_LEADER_SERIALIZE;
6178 : : }
6179 : : else
6180 : : {
6181 : 80180 : return TRANS_LEADER_APPLY;
6182 : : }
6183 : : }
|