We talk a lot these days about data science, and how it will pave our paths with beautiful insights and unexpected new relations and connections in our given datasets, and even across datasets.
But how to maintain the "Science" part in "Data Science"? After some time working in this field I appreciate more and more the critical thinking which has characterized the progress in science.
Hypothesis, facts, prove and/or disprove the thesis. This is how science has progressed in the past centuries. This method has been formalized by Popper and categorize as non-science all disciplines where the statements cannot be falsified. In other words, if a statement cannot be disproved, we cannot talk of science, since there is no mechanism to left to verify the solution or to prove it wrong.
When that happens the argument can still be accepted, but not scientifically accepted. Ways of accepting or refuting a non falsifiable statement are for instance based on aesthetic, authority or pragmatic or philosophical considerations. All valid but not scientific. This applies for instance to statements in the disciplines of politics, teology, ethics, etc.
Science has definitely progressed since then. For instance, Bayesian networks and statistical inductions are currently part of the arsenal of the (data) scientist weapons. But, no matter how the baseline is set, critical thinking and a rigorous method are definitely helpful in understanding the results produced by science in particular when this is based on large amount of data and computational in nature, rather than formula/model driven.
Data Science has currently many different connotations. On one side it praises the "artistry", the genius of laying out connections between disciplines and concepts. This is a truly great aspect of scientists and creativity is definitely very welcome in all data science profiles.
With the fun of creating new insights and new data golden eggs, a data scientist has to put up with those annoying criteria of reproducibility, falsifiability and peer reviewing. Sometimes these elements are postponed or left behind in name of the artistry. Granted, it's just hard to find metrics and baselines in order to compare models and data science solutions. But the scientific method has proven to be solid over the centuries and has proven to allow factual scientific discussion between scientists and a to allow selection between models based on objective agreed criteria.