Under the optimistic transaction model, modification conflicts are regarded as part of the transaction commit. TiDB cluster uses the optimistic transaction model by default before version 3.0.8, uses pessimistic transaction model after version 3.0.8. System variable tidb_txn_mode controls TiDB uses optimistic transaction mode or pessimistic transaction mode.
This document talks about the implementation of optimistic transaction in TiDB. It is recommended that you have learned about the principles of optimistic transaction.
The main function stack to start an optimistic transaction is as followers.
(a *ExecStmt) Exec (a *ExecStmt) handleNoDelay (a *ExecStmt) handleNoDelayExecutor Next (e *SimpleExec) Next (e *SimpleExec) executeBegin
The function (e *SimpleExec) executeBegin do the main work for a "BEGIN" statement,The important comment and simplified code is as followers. The completed code is here .
/*Check the syntax "start transaction read only" and "as of timestamp" used correctly.If stale read timestamp was set, creates a new stale read transaction and sets "in transaction" state, and return.create a new transaction and set some properties like snapshot, startTS etc*/func (e *SimpleExec) executeBegin(ctx context.Context, s *ast.BeginStmt) error { if s.ReadOnly { // the statement "start transaction read only" must be used with tidb_enable_noop_functions is true // the statement "start transaction read only as of timestamp" can be used Whether tidb_enable_noop_functions is true or false,but that tx_read_ts mustn't be set. // the statement "start transaction read only as of timestamp" must ensure the timestamp is in the legal safe point range enableNoopFuncs := e.ctx.GetSessionVars().EnableNoopFuncs if!enableNoopFuncs && s.AsOf ==nil { return expression.ErrFunctionsNoopImpl.GenWithStackByArgs("READ ONLY") } if s.AsOf !=nil { if e.ctx.GetSessionVars().TxnReadTS.PeakTxnReadTS() >0 { return errors.New("start transaction read only as of is forbidden after set transaction read only as of") } } } // process stale read transaction if e.staleTxnStartTS >0 { // check timestamp of stale read correctly if err := e.ctx.NewStaleTxnWithStartTS(ctx, e.staleTxnStartTS); err !=nil { return err } // ignore tidb_snapshot configuration if in stale read transaction vars := e.ctx.GetSessionVars() vars.SetSystemVar(variable.TiDBSnapshot, "") // set "in transaction" state and return vars.SetInTxn(true) returnnil } /* If BEGIN is the first statement in TxnCtx, we can reuse the existing transaction, without the need to call NewTxn, which commits the existing transaction and begins a new one. If the last un-committed/un-rollback transaction is a time-bounded read-only transaction, we should always create a new transaction. */ txnCtx := e.ctx.GetSessionVars().TxnCtx if txnCtx.History !=nil|| txnCtx.IsStaleness { err := e.ctx.NewTxn(ctx) } // set "in transaction" state e.ctx.GetSessionVars().SetInTxn(true)// create a new transaction and set some properties like snapshot, startTS etc. txn, err := e.ctx.Txn(true) // set Linearizability option if s.CausalConsistencyOnly { txn.SetOption(kv.GuaranteeLinearizability, false) } returnnil}
DML Executed In Optimistic Transaction
There are three kinds of DML operations, such as update, delete and insert. For simplicity, This article only describes the update statement execution process. DML mutations are not sended to tikv directly, but buffered in TiDB temporarily, commit operation make the mutations effective at last.
The main function stack to execute an update statement such as "update t1 set id2 = 2 where pk = 1" is as followers.
(a *ExecStmt) Exec (a *ExecStmt) handleNoDelay (a *ExecStmt) handleNoDelayExecutor (e *UpdateExec) updateRows Next (e *PointGetExecutor) Next
(e *UpdateExec) updateRows
The function (e *UpdateExec) updateRows does the main work for update statement. The important comment and simplified code are as followers. The completed code is here .
/*Take a batch of data that needs to be updated each time.Traverse every row in the batch, make handle which identifies the data uniquely for the row and generate a new row.Write the new row to table.*/func (e *UpdateExec) updateRows(ctx context.Context) (int, error) { globalRowIdx :=0 chk :=newFirstChunk(e.children[0]) // composeFunc generates a new row composeFunc = e.composeNewRow totalNumRows :=0 for { // call "Next" method recursively until every executor finished, every "Next" returns a batch rows err :=Next(ctx, e.children[0], chk) // If all rows are updated, return if chk.NumRows() ==0 { break } for rowIdx :=0; rowIdx < chk.NumRows(); rowIdx++ { // take one row from the batch above chunkRow := chk.GetRow(rowIdx) // convert the data from chunk.Row to types.DatumRow,stored by fields datumRow := chunkRow.GetDatumRow(fields) // generate handle which is unique ID for every row e.prepare(datumRow) // compose non-generated columns newRow, err :=composeFunc(globalRowIdx, datumRow, colsInfo) // merge non-generated columns e.merge(datumRow, newRow, false) // compose generated columns and merge generated columns if e.virtualAssignmentsOffset <len(e.OrderedList) { newRow = e.composeGeneratedColumns(globalRowIdx, newRow, colsInfo) e.merge(datumRow, newRow, true) } // write to table e.exec(ctx, e.children[0].Schema(), datumRow, newRow) } }}
Commit Optimistic Transaction
Committing transaction includes "prewrite" and "commit" two phases that are explained separately below. The function (c *twoPhaseCommitter) execute does the main work for committing transaction. The important comment and simplified code are as followers. The completed code is here .
/*do the "prewrite" operation firstif OnePC transaction, return immediatelyif AsyncCommit transaction, commit the transaction Asynchronously and returnif not OnePC and AsyncCommit transaction, commit the transaction synchronously.*/func (c *twoPhaseCommitter) execute(ctx context.Context) (err error) { // do the "prewrite" operation c.prewriteStarted =true start := time.Now() err = c.prewriteMutations(bo, c.mutations) if c.isOnePC() { // If OnePC transaction, return immediately returnnil } if c.isAsyncCommit() { // commit the transaction Asynchronously and return for AsyncCommit gofunc() { err := c.commitMutations(commitBo, c.mutations) } returnnil } else { // do the "commit" phase return c.commitTxn(ctx, commitDetail) } }
prewrite
The entry function to prewrite a transaction is (c *twoPhaseCommitter) prewriteMutations which calls the function (batchExe *batchExecutor) process to do it. The function (batchExe *batchExecutor) process calls (batchExe *batchExecutor) startWorker to prewrite evenry batch parallelly. The function (batchExe *batchExecutor) startWorker calls (action actionPrewrite) handleSingleBatch to prewrite a single batch.
(batchExe *batchExecutor) process
The important comment and simplified code are as followers. The completed code is here .
// start worker routine to prewrite every batch parallely and collect resultsfunc (batchExe *batchExecutor) process(batches []batchMutations) error { var err error err = batchExe.initUtils() // For prewrite, stop sending other requests after receiving first error. var cancel context.CancelFunc if _, ok := batchExe.action.(actionPrewrite); ok { batchExe.backoffer, cancel = batchExe.backoffer.Fork() defercancel() } // concurrently do the work for each batch. ch :=make(chanerror, len(batches)) exitCh :=make(chanstruct{}) go batchExe.startWorker(exitCh, ch, batches) // check results of every batch prewrite synchronously, if one batch fails, // stops every prewrite worker routine immediately. for i :=0; i <len(batches); i++ { if e :=<-ch; e !=nil { // Cancel other requests and return the first error. if cancel !=nil { cancel() } if err ==nil { err = e } } } close(exitCh) // break the loop of function startWorker return err}
(batchExe *batchExecutor) startWorker
The important comment and simplified code are as followers. The completed code is here .
// startWork concurrently do the work for each batch considering rate limitfunc (batchExe *batchExecutor) startWorker(exitCh chanstruct{}, ch chanerror, batches []batchMutations) { for idx, batch1 :=range batches { waitStart := time.Now() // Limit the number of go routines by the buffer size of channel if exit := batchExe.rateLimiter.GetToken(exitCh); !exit { batchExe.tokenWaitDuration += time.Since(waitStart) batch := batch1 // call the function "handleSingleBatch" to prewrite every batch keys gofunc() { defer batchExe.rateLimiter.PutToken() // release the chan buffer ch <- batchExe.action.handleSingleBatch(batchExe.committer, singleBatchBackoffer, batch) }() } else { break } }}
(action actionPrewrite) handleSingleBatch
The important comment and simplified code are as followers. The completed code is here .
/*create a prewrite request and a region request sender that sends the prewrite request to tikv.(1)get Prewrite Response coming from tikv(2)If no error happened and it is OnePC transaction, update onePCCommitTS by prewriteResp and return(3)if no error happened and it is AsyncCommit transaction, update minCommitTS if need and return(4)If errors hanpped beacause of lock confilict, extract the locks from the error responsed, resolove the locks expired(5)do the backoff for prewrite*/func (action actionPrewrite) handleSingleBatch(c *twoPhaseCommitter, bo *retry.Backoffer, batch batchMutations) (err error) { // create a prewrite request and a region request sender that sends the prewrite request to tikv. txnSize :=uint64(c.regionTxnSize[batch.region.GetID()]) req := c.buildPrewriteRequest(batch, txnSize) sender := locate.NewRegionRequestSender(c.store.GetRegionCache(), c.store.GetTiKVClient()) for { resp, err := sender.SendReq(bo, req, batch.region, client.ReadTimeoutShort) regionErr, err := resp.GetRegionError() // get Prewrite Response coming from tikv prewriteResp := resp.Resp.(*kvrpcpb.PrewriteResponse) keyErrs := prewriteResp.GetErrors() iflen(keyErrs) ==0 { // If no error happened and it is OnePC transaction, update onePCCommitTS by prewriteResp and return if c.isOnePC() { c.onePCCommitTS = prewriteResp.OnePcCommitTs returnnil } // if no error happened and it is AsyncCommit transaction, update minCommitTS if need and return if c.isAsyncCommit() { if prewriteResp.MinCommitTs > c.minCommitTS { c.minCommitTS = prewriteResp.MinCommitTs } } returnnil }// if len(keyErrs) == 0 // If errors hanpped beacause of lock confilict, extract the locks from the error responsed var locks []*txnlock.Lock for _, keyErr :=range keyErrs { // Extract lock from key error lock, err1 := txnlock.ExtractLockFromKeyErr(keyErr) if err1 !=nil { return errors.Trace(err1) } locks =append(locks, lock) }// for _, keyErr := range keyErrs // resolve conflict locks expired, do the backoff for prewrite start := time.Now() msBeforeExpired, err := c.store.GetLockResolver().ResolveLocksForWrite(bo, c.startTS, c.forUpdateTS, locks) if msBeforeExpired >0 { err = bo.BackoffWithCfgAndMaxSleep(retry.BoTxnLock, int(msBeforeExpired), errors.Errorf("2PC prewrite lockedKeys: %d", len(locks))) if err !=nil { return errors.Trace(err) // backoff exceeded maxtime, returns error } } }// for loop}
commit
The entry function of commiting a transaction is (c *twoPhaseCommitter) commitMutations which calls the function (c *twoPhaseCommitter) doActionOnGroupMutations to do it. The batch of primary key will be committed first, then the function (batchExe *batchExecutor) process calls (batchExe *batchExecutor) startWorker to commit the rest batches parallelly and asynchronously. The function (batchExe *batchExecutor) startWorker calls (actionCommit) handleSingleBatch to commit a single batch.
(c *twoPhaseCommitter) doActionOnGroupMutations
The important comment and simplified code are as followers. The completed code is here .
/*If the groups contain primary, commit the primary batch synchronouslyIf the first time to commit, spawn a goroutine to commit secondary batches asynchronouslyif retry to commit, commit the secondary batches synchronously, because itself is in the asynchronously goroutine*/func (c *twoPhaseCommitter) doActionOnGroupMutations(bo *retry.Backoffer, action twoPhaseCommitAction, groups []groupedMutations) error { batchBuilder :=newBatched(c.primary()) // Whether the groups being operated contain primary firstIsPrimary := batchBuilder.setPrimary() actionCommit, actionIsCommit := action.(actionCommit) c.checkOnePCFallBack(action, len(batchBuilder.allBatches())) // If the groups contain primary, commit the primary batch synchronously if firstIsPrimary && (actionIsCommit &&!c.isAsyncCommit()) { // primary should be committed(not async commit)/cleanup/pessimistically locked first err = c.doActionOnBatches(bo, action, batchBuilder.primaryBatch()) batchBuilder.forgetPrimary() } // If the first time to commit, spawn a goroutine to commit secondary batches asynchronously// if retry to commit, commit the secondary batches synchronously, because itself is in the asynchronously goroutine if actionIsCommit &&!actionCommit.retry &&!c.isAsyncCommit() { secondaryBo := retry.NewBackofferWithVars(c.store.Ctx(), CommitSecondaryMaxBackoff, c.txn.vars) c.store.WaitGroup().Add(1) gofunc() { defer c.store.WaitGroup().Done() e := c.doActionOnBatches(secondaryBo, action, batchBuilder.allBatches()) } } else { err = c.doActionOnBatches(bo, action, batchBuilder.allBatches()) } return errors.Trace(err)}
(batchExe *batchExecutor) process
The function (c *twoPhaseCommitter) doActionOnGroupMutations calls (c *twoPhaseCommitter) doActionOnBatches to do the second phase of commit. The function (c *twoPhaseCommitter) doActionOnBatches calls (batchExe *batchExecutor) process to do the main work.
The important comment and simplified code of function (batchExe *batchExecutor) process are as mentioned above in prewrite part . The completed code is here .
(actionCommit) handleSingleBatch
The function (batchExe *batchExecutor) process calls the function (actionCommit) handleSingleBatch to send commit request to all tikv nodes.
The important comment and simplified code are as followers. The completed code is here .
/*create a commit request and commit sender.If regionErr happened, backoff and retry the commit operation.If the error is not a regionErr, but rejected by TiKV beacause the commit ts was expired, retry with a newer commits.Other errors happened, return error immediately.No error happened, exit the for loop and return success.*/func (actionCommit) handleSingleBatch(c *twoPhaseCommitter, bo *retry.Backoffer, batch batchMutations) error { // create a commit request and commit sender keys := batch.mutations.GetKeys() req := tikvrpc.NewRequest(tikvrpc.CmdCommit, &kvrpcpb.CommitRequest{ StartVersion: c.startTS, Keys: keys, CommitVersion: c.commitTS, }, kvrpcpb.Context{Priority: c.priority, SyncLog: c.syncLog, ResourceGroupTag: c.resourceGroupTag, DiskFullOpt: c.diskFullOpt}) tBegin := time.Now() attempts :=0 sender := locate.NewRegionRequestSender(c.store.GetRegionCache(), c.store.GetTiKVClient()) for { attempts++ resp, err := sender.SendReq(bo, req, batch.region, client.ReadTimeoutShort) regionErr, err := resp.GetRegionError() // If regionErr happened, backoff and retry the commit operation if regionErr !=nil { // For other region error and the fake region error, backoff because there's something wrong. // For the real EpochNotMatch error, don't backoff. if regionErr.GetEpochNotMatch() ==nil|| locate.IsFakeRegionError(regionErr) { err = bo.Backoff(retry.BoRegionMiss, errors.New(regionErr.String())) if err !=nil { return errors.Trace(err) } } same, err := batch.relocate(bo, c.store.GetRegionCache()) if err !=nil { return errors.Trace(err) } if same { continue } err = c.doActionOnMutations(bo, actionCommit{true}, batch.mutations) return errors.Trace(err) }// if regionErr != nil // If the error is not a regionErr, but rejected by TiKV beacause the commit ts was expired, retry with a newer commits. Other errors happened, return error immediately. commitResp := resp.Resp.(*kvrpcpb.CommitResponse) if keyErr := commitResp.GetError(); keyErr !=nil { if rejected := keyErr.GetCommitTsExpired(); rejected !=nil { // 2PC commitTS rejected by TiKV, retry with a newer commits, update commit ts and retry. commitTS, err := c.store.GetTimestampWithRetry(bo, c.txn.GetScope()) c.mu.Lock() c.commitTS = commitTS c.mu.Unlock() // Update the commitTS of the request and retry. req.Commit().CommitVersion = commitTS continue } if c.mu.committed { // 2PC failed commit key after primary key committed // No secondary key could be rolled back after it's primary key is committed. return errors.Trace(err) } return err } // No error happened, exit the for loop break }// for loop}