Issue Triage
TiDB uses an issue-centric workflow for development. Every problem, enhancement and feature starts with an issue. For bug issues, you need to perform some more triage operations on the issues.
Diagnose issue severity
The severity of a bug reflects the level of impact that the bug has on users when they use TiDB. The greater the impact, the higher severity the bug is. For higher severity bugs, we need to fix them faster. Although the impact of bugs can not be exhausted, they can be divided into four levels.
Critical
The bug affects critical functionality or critical data. It might cause huge losses to users and does not have a workaround. Some typical critical bugs are as follows:
Invalid query result (correctness issues)
TiDB returns incorrect results or results that are in the wrong order for a typical user-written query.
Bugs caused by type casts.
The parameters are not boundary value or invalid value, but the query result is not correct(except for overflow scenes).
Incorrect DDL and DML result
The data is not written to the disk, or wrong data is written.
Data and index are inconsistent.
Invalid features
Due to a regression, the feature can not work in its main workflow
Follower can not read follower.
SQL hint does not work.
SQL Plan
Cannot choose the best index. The difference between best plan and chosen plan is bigger than 200%.
DDL design
DDL process causes data accuracy issue.
Experimental feature
If the issue leads to another stable feature’s main workflow not work, and may occur on released version, the severity is critical.
If the issue leads to data loss, the severity is critical.
Exceptions
If the feature is clearly labeled as experimental, when it doesn’t work but doesn’t impact another stable feature’s main workflow or only impacts stable feature’s main workflow on master, the issue severity is major.
The feature has been deprecated and a viable workaround is available(at most major).
System stability
The system is unavailable for more than 5 minutes(if there are some system errors, the timing starts from failure recovery).
Tools cannot perform replication between upstream and downstream for more than 1 minute if there are no system errors.
TiDB cannot perform the upgrade operation.
TPS/QPS dropped 25% without system errors or rolling upgrades.
Unexpected TiKV core dump or TiDB panic(process crashed).
System resource leak, include but not limit to memory leak and goroutine leak.
System fails to recover from crash.
Security and compliance issues
CVSS score >= 9.0.
TiDB leaks secure information to log files, or prints customer data when set to be desensitized.
Backup or Recovery Issues
Failure to either backup or restore is always considered critical.
Incompatible Issues
Syntax/compatibility issue affecting default install of tier 1 application(i.e. Wordpress).
The DML result is incompatible with MySQL.
CI test case fail
Test cases which lead to CI failure and could always be reproduced.
Bug location information
Key information is missing in ERROR level log.
No data is reported in monitor.
Major
The bug affects major functionality. Some typical critical bugs are as follow:
Invalid query result
The query gets the wrong result caused by overflow.
The query gets the wrong result in the corner case.
For boundary value, the processing logic in TiDB is inconsistent with MySQL.
Inconsistent data precision.
Incorrect DML or DDL result
Extra or wrong data is written to TiDB with a DML in a corner case.
Invalid features
The corner case of the main workflow of the feature does not work.
The feature is experimental, but a main workflow does not work.
Incompatible issue of view functionality.
SQL Plan
Choose sub-optimal plan. The difference between best plan and chosen plan is bigger than 100% and less than 200%
System stability
TiDB panics but process does not exit.
Less important security and compliance issues
CVSS score >= 7.0
Issues that affects critical functionality or critical data but rare to reproduce(can’t be reproduced in one week, and have no clear reproduce steps)
CI test cases fail
Test case is not stable.
Bug location information
Key information is missing in WARN level log.
Data is not accurate in monitor.
Moderate
SQL Plan
Cannot get the best plan due to invalid statistics.
Documentation issues
The bugs were caused by invalid parameters which rarely occurred in the product environment.
Security issues
CVSS score >= 4.0
Incompatible issues occurred on boundary value
Bug location information
Key information is missing in DEBUG/INFO level log.
Minor
The bug does not affect functionality or data. It does not even need a workaround. It does not impact productivity or efficiency. It is merely an inconvenience. For example:
Invalid notification
Minor compatibility issues
Error message or error code does not match MySQL.
Issues caused by invalid parameters or abnormal cases.
Not a bug
The following issues look like bugs but actually not. They should not be labeled type/bug
and instead be only labeled type/compatibility
:
Behavior is different from MySQL, but could be argued as correct.
Behavior is different from MySQL, but MySQL behavior differs between 5.7 and 8.0.
Identify issue affected releases
For type/bug
issues, when they are created and identified as severity/critical
or severity/major
, the ti-chi-bot will assign a list of may-affects-x.y
labels to the issue. For example, currently if we have version 5.0, 5.1, 5.2, 5.3, 4.0 and the in-sprint 5.4, when a type/bug
issue is created and added label severity/critical
or severity/major
, the ti-chi-bot will add label may-affects-4.0
, may-affects-5.0
, may-affects-5.1
, may-affects-5.2
, and may-affects-5.3
. These labels mean that whether the bug affects these release versions are not yet determined, and is awaiting being triaged. You could check currently maintained releases list for all releases.
When a version is triaged, the triager needs to remove the corresponding may-affects-x.y
label. If the version is affected, the triager needs to add a corresponding affects-x.y
label to the issue and in the meanwhile the may-affects-x.y
label can be automatically removed by the ti-chi-bot, otherwise the triager can simply remove the may-affects-x.y
label. So when a issue has a label may-affects-x.y
, this means the issue has not been diagnosed on version x.y. When a issue has a label affects-x.y
, this means the issue has been diagnosed on version x.y and identified affected. When both the two labels are missing, this means the issue has been diagnosed on version x.y but the version is not affected.
The status of the affection of a certain issue can be then determined by the combination of the existence of the corresponding may-affects-x.y
and affects-x.y
labels on the issue, see the table below for a clearer illustration.
YES
NO
version x.y has not been diagnosed
NO
NO
version x.y has been diagnosed and identified as not affected
NO
YES
version x.y has been diagnosed and identified as affected
YES
YES
invalid status
Last updated