DDL - Data Definition Language / Schema change handling

A short introduction to TiDB's DDL

The design behind TiDB's DDL implementation can be read in the Online DDL design docarrow-up-right.

TiDB is a distributed database which needs to have a consistent view of all schemas across the whole cluster. To achieve this in a more asynchronous way, the system uses internal states where each single stage transition is done so that the old/previous stage is compatible with the new/current state, allowing different TiDB Nodes having different versions of the schema definition. All TiDB servers in a cluster shares at most two schema versions/states at the same time, so before moving to the next state change, all currently available TiDB servers needs to be synchronized with the current state.

The states used are defined in tidb/parser/model/model.goarrow-up-right:

        // StateNone means this schema element is absent and can't be used.
        StateNone SchemaState = iota
        // StateDeleteOnly means we can only delete items for this schema element.
        StateDeleteOnly
        // StateWriteOnly means we can use any write operation on this schema element,
        // but outer can't read the changed data.
        StateWriteOnly
        // StateWriteReorganization means we are re-organizing whole data after write only state.
        StateWriteReorganization
        // StateDeleteReorganization means we are re-organizing whole data after delete only state.
        StateDeleteReorganization
        // StatePublic means this schema element is ok for all write and read operations.
        StatePublic

Note: this is a very different implementation from MySQL, which uses Meta Data Locks (MDL) for keeping a single version of the MySQL instance schemas at a time. This results in MySQL replicas can have different version of the schemas, due to lag in asynchronous replication. TiDB has always a consistent view of all schemas in the cluster.

Implementation

All DDL jobs goes through two cluster wide DDL Queues:

The two base operations for these queue are:

When a DDL job is completed it will be moved to the DDL historyarrow-up-right.

There are two main execution parts handling DDLs:

  • TiDB session, which executes your statements. This will parse and validate the SQL DDL statement, create a DDL jobarrow-up-right, and enqueue it in the corresponding queue. It will then monitor the DDL History until the operation is complete (succeeded or failed) and return the result back to the MySQL client.

  • DDL background goroutines:

    • limitDDLJobsarrow-up-right which takes tasks from the sessions and adds to the DDL queues in batches.

    • workersarrow-up-right for processing DDLs:

      • General worker, handling the default DDL queue where only metadata changes are needed.

      • Add Index worker, which updates/backfills data requested in the AddIndexJob queue.

    • DDL owner managerarrow-up-right managing that one and only one TiDB node in the cluster is the DDL manager.

Execution in the TiDB session

The execution of the DDL is started through the 'Next' iterator of the DDLExec class (just like normal query execution):

Where the different DDL operations are executed as their own functions, like:

Let us use the simple CREATE TABLE as an example (which does not need any of the WriteOnly or DeleteOnly states):

This statement has the CreateTableStmt Abstract Syntax Tree type and will be handled by executeCreateTablearrow-up-right/CreateTablearrow-up-right functions.

It will fill in a TableInfoarrow-up-right struct according to the table definition in the statement and create a DDL jobarrow-up-right and call doDDLJobarrow-up-right which goes through the limitDDLJobsarrow-up-right goroutine and adds one or more jobs to the DDL job queue in addBatchDDLJobsarrow-up-right

DDL job encoded as JSON:

Execution in the TiDB DDL Owner

When the tidb-server starts, it will initialize a domainarrow-up-right where it creates a DDL object and calls ddl.Start()arrow-up-right which starts the limitDDLJobs goroutine and the two DDL workers. It also starts the CampaignOwnerarrow-up-right/campaignLooparrow-up-right which monitor the owner election and makes sure to elect a new owner when needed.

A ddl worker goes through this workflow in a loop (which may handle one job state per loop, leaving other work to a new loop):

  1. Wait for either a signal from local sessions, global changes through PD or a ticker (2 * lease time or max 1 second) and then calls handleDDLJobQueuearrow-up-right.

  2. Start a transaction.

  3. Checks if it is the owner (and if not just returns).

  4. Picks the first job from its DDL queue.

  5. Waits for dependent jobs (like reorganizations/backfill needs to wait for its meta-data jobs to be finished first, which it is dependent on).

  6. Waits for the current/old schema version to be globally synchronized, if needed, by waiting until the lease time is passed or all tidb nodes have updated their schema version.

  7. If the job is done (completed or rolled back):

    1. Clean up old physical tables or indexes, not part of the new table.

    2. Remove the job from the ddl queue.

    3. Add the job to the DDL History.

    4. Return from handleDDLJobQueue, we are finished!

  8. Otherwise, execute the actual DDL job, runDDLJobarrow-up-right See more about this below!

  9. Update the DDL Job in the queue, for the next loop/transaction.

  10. Write to the binlog.

The execution of the job's DDL changes in runDDLJobarrow-up-right looks like this:

Where each operation is handled separately, which is also one of the reasons TiDB has the limitation of only one DDL operation at a time (i.e. not possible to add one column and drop another column in the same DDL statement).

Following the example of CREATE TABLE we see it will be handled by onCreateTablearrow-up-right, which after some checks, will create a new Schema version and if table is not yet in StatePublic state, it will create the table in CreateTableOrViewarrow-up-right which simply writes the TableInfo as a JSON into the meta database.

Notice that it will take another loop in the handleDDLJobQueue above to finish the DDL Job by updating the Schema version and synchronizing it with other TiDB nodes.

Graphs over DDL life cycle

An overview of the DDL execution flow in the TiDB cluster can be seen here: DDL work flow in TiDB Cluster

And more specifically for the DDL worker: DDL owner worker detailed flow chart

References

Last updated

Was this helpful?