A new clinical treatment (i.e. intervention) is often studied by comparing it to a placebo or an existing intervention. Two or more patient groups with different interventions are created and the outcomes are studied. When you compare these outcomes, it is important to make sure you avoid confounding. This will increase the likelihood of capturing the true effect the intervention has on the outcome (i.e. the treatment effect).
As said before, a few methods to minimize confounding are restriction, randomization, or matching. These methods all aim to reduce the baseline differences between intervention groups. When patient characteristics are balanced between groups, they are less likely to confound the treatment effect.
Can I use p-values to assess balance in baseline between groups?
P-values can tell you whether differences between groups are likely caused by chance (i.e. random variation) or not. When a p-value is less than 0.05 (the current cut-off), the risk of mistakenly concluding that a difference is caused by chance, is <5%. P-values are an indispensable part of hypothesis testing, however, they are overused in clinical research. For more details check out this Wikipedia page on p-values.
One example of inappropriate use of p-values is to assess balance in baseline characteristics between intervention groups after an attempt to avoid confounding. Here are two examples why: (1) After randomization, the chance that observed differences are caused by random variation is 100% (p=1.0). (2) After matching, the p-value is uninformative because there is no immediate relationship between group differences and the p-value. Therefore, the p-value is inadequate to assess and optimize balance at baseline.
The current best practice for baseline assessment is to use the standardized mean difference (SMD). This ratio is calculated by dividing the difference in means between groups by the standard deviation of the variable among all study participants. An SMD of 0 indicates perfect balance, whereas an SMD of 1 indicates infinite imbalance. A typical rule of thumb for adequate balance is an SMD <0.1 (or 10%). For more details check out this Cochrane page on the SMD.
Using SMDs instead of p-values for baseline comparison will improve the quality of your study. However, keep in mind that some (late adapting) journals still require the use of p-values for baseline tables.