[Advice needed] self-taught DS junior and too many ways to improve | paradox of choice
DISCLAIMER This would be rather long post, but I would really appreciate if you read it and offer any advice suggestions
Backstory
I'm currently unemployed data analyst/scientist which is lost in a flood of information in Internet and can't choose what to study next. I started as a technical worker in an IT company, but through the course of my work I started to develop tools for myself to give me better understanding of the feedback from our users and how our actions affect them. This was noticed by my boss and soon I started to work as data analyst. After some time I started to get tasks not only to analyze data, but also to try to apply ML algorithms and get "some results", if possible. I became quiet good at cleaning and processing data, learned myself a lot of python/pandas, so it was kind of a natural way to grow further in data science. I've been given tasks to read a paper and apply "this algo" on the data and measure the efficiency. While it was interesting, I was learning on-the-go and didn't have a chance to get a systematic knowledge in data science: one task done, here's another one. Other tasks were to try some different approaches about data: "let's not do regressions, let's do knowledge graphs", so I set the backend up and tested it with our data.
I ended up with the following skills(some are better than others):
-
python, pandas
-
data cleaning
-
data visualization(matplotlib, d3.js)
-
some algorithms(linear/logistic regressions, naive bayes, DBSCAN)
-
some metrics(HVDM along with "standart" ones)
-
running jobs on hadoop clusters
That being said, not everyone knows HVDM, but I do, but on the other hand I lack a lot of simple knowledge about number distributions and their properties etc etc. After I left my job I started applying to positions and going to interviews and realized how f*cked up I am: I do have some valueable knowledge and experience, but it's so chaotic...
I decided to improve. And here it comes...
Brain starts to f*ck me over
-
"Hey, al, we need foundation first, so let's go over this set of articles on math to understand algorithms better"
-
"You know, al, I've reconsidered, let's start with those awesome tutorials: we will implement algorithms in python and learn how they work"
-
"Damn, maybe we should start this specialization on Coursera, looks good!"
-
"Wait, I just realized smth, al! You don't know how to choose metrics! Let's find info on that!"
-
"The interviewer mentioned bloom filters and some other sick data structures, learn this!"
-
"Goddamit, al, you wanted to learn Clojure for so long, use those books"
-
"NO! Let's start with some books, that overview the whole field of data science!"
-
"Hm.... I read in some blog that they have tidyverse, maybe it's time to learn some R?"
-
"F*ck math, let's practice: go to datacamp and dataquest, then head to Kaggle and own them all!"
-
"ALARM! Everyone wants someone with experience with Spark, go learn Spark!"
Result? There is none.
In the end I became paralyzed by paradox of choice.
To sum it up
I still don't know which metrics are better to apply in which case. I don't know about which algorithms are preferrable given a task. Different distributions and their properties: is that even important? I'm not a jack of all trades, rather a king of none. And I'm lost. But I want to understand.
My questions
-
Where do I start? All those things I mentioned are important.
-
How to better combine theory and practice?
-
What path did you have? Will you recommend it?
-
What books/courses/whatever will you recommend?
-
What's the natural progression to grow as data scientist?
Thanks for reading this and offering your opinions!