DQL
Last updated
Was this helpful?
Last updated
Was this helpful?
This chapter describes the execution process of a data query statement in TiDB. Starting from the SQL processing flow, it describes how a SQL statement is sent to TiDB, how TiDB processes it after receiving the SQL statement, and how the execution result is returned.
Briefly, the execution process of a SQL statement can be divided into three stages:
Protocol Layer
Protocol layer is responsible for parsing the network protocol. Its code locates at server
package, mainly consisting of two parts: one for connection establishing and management, every connection corresponds to one session separately; one for handling the packets read from the connection.
SQL Layer
SQL layer is the most complex part in TiDB, handling SQL statement parsing and execution. SQL is a complex language, having various data types and operators, numerous syntax combinations. Besides, TiDB uses a distributed storage engine underneath, so it will encounter many problems standalone storage engines won't.
KV API Layer
KV API layer routes requests to the right KV server and passes the results back to SQL layer. It should handle the exceptions happened in this stage.
A SQL statement goes through the above three stages sequentially, get parsed and transformed, then handled by SQL layer. In SQL layer, query plans are generated and executed, retrieving data from the underneath storage engine. We'll give a detailed introduction to SQL layer.
The entry of TiDB's SQL layer is in server/conn.go
. After a connection is established between the client and TiDB, TiDB spawns a goroutine to listen and poll on the port. In , a loop keeps reading network packets and calls to handle them:
In SQL layer, there are multiple concepts and interfaces we need to pay close attention to:
The most important function in Session
is ExecuteStmt
. It wraps calls to other modules. The SQL execution will respect environment variables in Session
like AutoCommit
and timezone.
From t
is parsed to From
field. WHERE c > 1
is parsed to Where
field. *
is parsed to Fields
field. Most data structures in ast
package implement ast.Node
interface. This interface has a Accept
method, implementing the classic visitor pattern, used by following procedures to traverse the tree.
There are three steps:
plan.Preprocess
: do validations and name binding.
plan.Optimize
: make and optimize query plans, this is the core part.
The functionality of each method is described in the comments. In short, Fields()
retrieves the type of each column. Next()
returns a batch of the result. Close()
closes the result set.
TiDB's execution engine executes in Volcano model. All the executors constitute an executor tree. Every upper layer gathers results from the lower layer by calling its Next()
method. Assuming we have a SQL statement SELECT c1 FROM t WHERE c2 > 1;
and the query plan is full table scanning plus filtering, the executor tree is like:
rs
is a RecordSet
instance. Keep calling its Next
method to get more results to return to the client.
The above SQL query statement execution process can in general be described as the following picture:
dispatch
handles the raw data array. The first byte of the array represents command type. Among the types, COM_QUERY
represents data query statement. You can refer to for more information about the data array. For COM_QUERY
, its content is SQL statement. handles the SQL statement. It calls in server/driver_tidb.go
:
is the entry of the SQL layer kernel and returns the result of the SQL execution.
After a series of operations described above, the execution results will be returned to the client in format by .
consists of and Yacc. It turns the SQL text to AST:
In the parsing process, lexer first transforms the SQL text to tokens, and then parser accepts the tokens as inputs and generates appropriate AST nodes. For example, statement SELECT * FROM t WHERE c > 1;
matches finally turns to the structure below:
After the AST is generated, it's going to be validated, transformed and optimized in :
construct executor.ExecStmt
structure: holds the query plans. It's the foundation for following execution.
While constructing the executor in , query plans are turned to executor. Then the execution engine could perform the query plans via the executor. The generated executor is encapsulated in a recordSet
structure:
This structure implements interface. It abstracts the query results and has the following methods:
From the above picture, we can see the data flow between executors. The starting point of a SQL statement execution, also the first Next()
call is in :