Identifying the binding between the target proteins and molecules is essential in drug discovery. The multi-task learning method has been introduced to facilitate knowledge sharing among tasks when the amount of information for each task is small. However, multi-task learning sometimes worsens the overall performance or generates a trade-off between individual task's performance. In this study, we propose a general multi-task learning scheme that not only increases the average performance but also minimizes individual performance degradation, through group selection and knowledge distillation. The groups are selected on the basis of chemical similarity between ligand sets of targets, and the similar targets in the same groups are trained together. During training, we apply knowledge distillation with teacher annealing. The multi-task learning models are guided by the predictions of the single-task learning models. This method results in higher average performance than that from single-task learning and classic multi-task learning. Further analysis reveals that multi-task learning is particularly effective for low performance tasks, and knowledge distillation helps the model avoid the degradation in individual task performance in multi-task learning.