“From the Demo Day onward, how much time should I wait before calling myself a data scientist?”
I have asked this question to almost every professor we had in the Bootcamp, as well as the guest speakers, the alumni and pretty much any data scientist I run into inside or outside IE.
I keep receiving different answers. The longest one of them was, to my surprise, a period of six months, more or less.
I was, way before starting the Bootcamp, aware of the fact that data science, as a professional field, is rather new. Therefore, the role description for a data scientist is still not fully defined, as is the job of a doctor, for example.
However, the main reason why I kept asking this question so often was that I wanted to be extra careful in dealing with such a powerful title.
Just put the word “data” in anything and you’ll sound professional. Now imagine putting it right before another word that people such as Albert Einstein and Steven Hawking would have as their job titles.
Now, as we approach the end of the Bootcamp, I can say that it makes total sense that we should be called data scientists. Not only because we are applying science to data and that should be enough, but also because all the fancy words we have in data science are in fact quite easy to understand and conceptualize.
Think of machine learning. That’s another fancy term that politicians and CEOs like to use
to sound cool and modern. Once you study it, you will realize that it’s actually a lot easier than what most of the media makes of it.
What would probably surprise you most is not the machine learning itself, or the models running. That’s simple. It’s what’s before and after that. The world for the machine is simple and organized. But the world for us is messy and chaotic.
Before we tell the machine to run a model, we ourselves have to work on this data that we obtain. We have to clean it, organize it, and most importantly, make sense of it. And in order
to do so, you need good old human thinking.
Should I use this variable or not? Should I put a threshold in these values and split them into categories, or not? Should I group by? Should I merge? Should I append? Should I ask for more data? The machine won’t help you with any of that. The machine is sitting there waiting for you to give it something to do.
Your job doesn’t end there, either—you have a lot to do afterwards. How about the results? How do you interpret them? What defines an “accurate” model? What can you change? Should you change it?
Then, just when you think the hard part is over, you will have to go to your clients and explain to them all the work you have done, but in simple business terms.
After learning all that, I got the answer to the question. Only when I truly master this art of explaining the world to the model and explaining the model to the world, and only then, I would truly call myself a data scientist.
And this part is what this Bootcamp is all about.