The Biggest Mistake I Did When Learning Data Science Which took me a long time to realize
I would like to start with stating a ground truth just
in case you have not realized by now: Data science is an extremely broad field.
Data science can be applied to any business or industry
where we can collect data. Besides, the advancements in data-related technology
has made it easier than ever to collect, process, store, and transfer data.
Thus, it is safe to say that data science applications will cover a broader
scope in the future.
Although data science is ubiquitous, its
applications differ greatly in different domains. It would be an uphill battle
to learn about all data science applications. I now consider it as a battle
that is impossible to win.
It is in your hands to turn this battle into an
achievable target. You should set your goal towards being a specialist rather
than a generalist. This is how you make great impact as a data scientist.
Being a jack of all trades but master of none will get
you only to a certain point. What you can achieve at that point will not
impress potential employers . You should try to become a master of at least one
area to stand out.
The learning path mostly starts from the same point for
all domains. The core principles are the same. You need to have a comprehensive
understanding of statistical concepts. A certain level of programming skills is
needed to turn your ideas into action.
Once you are through with the basics, pick an area to
specialize. You do not have to work in the first area you pick. You can always
change it. In fact, while obtaining in-depth knowledge in a particular field,
you also improve your data science skills and knowledge in general.
Let’s say you want to work as a data scientist in
finance. Then, time series analysis should be your area of expertise. You need
to be able to clean, process, and analyze time series data as well as extract
insights from it.
How
about tools?
The jack of all trades but master of none is also a
serious issue in terms of software tools and packages. Thanks to the data
science community, we have a rich selection of tools that ease and expedite our
jobs.
The advantage of having so many tools may turn into
trouble if used unwisely. There are almost always more than one option to
perform a task.
Consider a very simple case of cleaning and analyzing
tabular data. The first two options that come to my mind are Pandas for Python
and data table for R. SQL is also another strong candidate especially if the
data is stored in a relational database.
Similarly, there are many candidates to help you with
data visualization tasks. Matplotlib, Seaborn, and Altair are just the three
options in Python.
In most cases, one is enough to get the job done. You
won’t be at a disadvantage because of using Seaborn instead of Matplotlib, and
vice versa.
The hardest decision here is to pick a subfield of data
science to specialize. Unfortunately, there is not a strict set of rules to
help you make this decision. It depends on many parameters such as your
background, interests, and job opportunities.
Whatever area you pick, it will be better than learning
about all. You are highly likely to fail with the latter. What I mean by fail
is not that you cannot learn anything. However, you will fail to impress
recruiters with your general knowledge or skills. Learn More
Thank you for reading. Please let me know if you have
any feedback.
Comments
Post a Comment