Here are the key steps in ID3's approach to selecting the "best" attribute at each node:
1. Calculate the entropy (impurity/uncertainty) of the target attribute for the examples reaching that node.
2. Calculate the information gain (reduction in entropy) from splitting on each candidate attribute.
3. Select the attribute with the highest information gain. This attribute best separates the examples according to the target class.
So in this example, ID3 would calculate the information gain from splitting on attributes A1 and A2, and select the attribute with the highest gain. The goal is to pick the attribute that produces the "purest" partitions at each step.