How to Win a Data Science Competition
Learn from Top Kagglers

share ›
‹ links

Below are the top discussions from Reddit that mention this online Coursera course from HSE University.

Offered by HSE University. If you want to break into competitive data science, then this course is for you! Participating in predictive ... Enroll for free.

Reddsera may receive an affiliate commission if you enroll in a paid course after using these buttons to visit Coursera. Thank you for using these buttons to support Reddsera.

Taught by
Dmitry Ulyanov
Visiting lecturer
and 4 more instructors

Offered by
HSE University

Reddit Posts and Comments

1 posts • 14 mentions • top 15 shown below

r/learnmachinelearning • post
125 points • sercosan
How to Win a Data Science Competition: Learn from Top Kagglers
r/datascience • comment
3 points • nckmiz

This coursera Course has some examples of ensemble methods like blending and stacking.

https://www.coursera.org/learn/competitive-data-science

r/reinforcementlearning • comment
2 points • allliam

If you already have the necessary ML background, this coursera course (and these 3 videos on tuning in particular) give some good practical advice:

https://www.coursera.org/learn/competitive-data-science/lecture/giBKx/hyperparameter-tuning-i

r/learnmachinelearning • comment
1 points • jarandaf

I guess this might be of your interest.

r/datascience • comment
1 points • dtrillaa

This course does a good job on feature engineering and selection. The only caveat is some of these methods aren’t always applicable in practice at an actual company, but still good for learning the thought process

r/berkeley • comment
1 points • Ken_Obiwan

Honestly I only interviewed people for data science, not data analytics, so I can't say for sure. For data science, we were looking for 61B level coding knowledge and DS 100 level data analysis knowledge, but also knowledge of how machine learning algorithms work / how to use them (both theoretical and practical), and some knowledge of how to build web systems. I imagine you could get an analyst job with just the DS 100 stuff, but honestly I can't say for sure.

I would say CS 189 is probably more theory than we looked for. If you know how to implement most of the major algorithms from scratch, that should be enough. (Not as hard as it might sound, many are less than a page of code.)

In terms of practical stuff... at least when I was at Cal, there wasn't a good course on how to apply ML in practice. I haven't taken this class but it looks like it's probably more than you need.

https://www.coursera.org/learn/competitive-data-science

r/dataengineering • comment
1 points • trapatsas

Maybe take a look at this course.

r/learnmachinelearning • comment
1 points • ExilePrime

The National Research University Higher School of Economics offers an Advanced Machine Learning Specialization through Coursera. If you pay $125 per month then you can receive a certificate for each course you complete. There's a total of 7 courses and they take 50-60 hours each to complete. Aside from the certifications, you could just audit the courses for free. One of their courses has a competition on Kaggle as part of it.

r/mit • comment
1 points • cipher7d3

https://www.coursera.org/learn/competitive-data-science?

Assuming you already know the basics of ML.

r/learnmachinelearning • comment
1 points • graden_dissent

https://www.coursera.org/learn/competitive-data-science

This course has some interesting stuff (lectures on youtube, you can pay for the notebooks) and is quite instructive if you don't take obviously misleading title seriously.

r/WGU_CompSci • comment
1 points • lynda_

There's also this course that shows you how to solve predictive and modeling tasks.

https://www.coursera.org/learn/competitive-data-science/home/welcome

r/learnmachinelearning • comment
1 points • AurelianTactics

Coursera has course on Data Science competitions. There are a few sections on features and feature engineering that I found interesting and has a lot of ideas. There's a video by Kazanova toward the end of the course and he has like 40 feature engineering ideas in one slide.

For feature selection and feature extraction, sklearn has some pages set up that has some basics that are included with sklearn. Maybe worth skimming through and reading the sections that interest you.

I'm not sure I agree with your best practices. In some cases and some problems those are worth following but you can come up with counter examples where those best practices aren't worth doing.

Personally, I approach it this way:

  • Consider the specific problem and how the given features relate to the target variable. Consider doing a quick and dirty random forest and seeing how the random forest rates the feature importance.

  • For each feature, consider the basic transformations for that type of feature. Like for dates break it up into year/month/day of week etc. and see if its helpful. For text consider word2vec, tf-idf etc.

  • Try to come up with specific features for the specific problem. Either through brainstorming, EDA, or looking at how similar types of problems are solved.

  • If you have more time, keep trying to build more features and see what adds value.

r/datascience • comment
2 points • andrewdoss_bitdotio

I agree with the comment that feature engineering is often domain-dependent and good features can be developed by understanding the underlying process/domain.

That being said, there are also generic techniques, like target encoding of categorical features or embeddings, that are helpful to be aware of so that you can represent that domain knowledge in the most effective ways. This is not "instead of" domain knowledge, think of it as a compliment. The general goal is to present the right information in the best representation for your algorithm to learn from.

Some specific resources starting from more concrete/tactical to more conceptual:

https://www.coursera.org/learn/competitive-data-science

This is a free online course that will give you a good summary of some feature engineering techniques. I don't recommend everything in it, as some is Kaggle-specific and not necessarily useful in industry (e.g. purposefully pursuing data leaks, massive stacks of model ensembles).

https://fast.ai

You may also find a course like fast.ai helpful as deep learning is really a lot about using various techniques to learn features that cannot be constructed manually.

https://sites.astro.caltech.edu/\~george/ay122/cacm12.pdf

Finally, here is a classic and fundamental paper that you may find insightful about ML and feature engineering/data representation in general.

r/learnprogramming • comment
1 points • GrayWare_Developer

It is never too early. If you start now, you will know from your ML studies how much math to learn in school.

Anyway, you will need some Computer Science basics as well as working knowledge of Python and SQL - you can get that from Kaggle https://www.kaggle.com/learn/overview - they have tutorials and courses. You can participate in their competitions and read what models other participants develop. There is a course on Coursera on how to win in Kaggle competitions - https://www.coursera.org/learn/competitive-data-science?ranMID=40328&ranEAID=SAyYsTvLiGQ&ranSiteID=SAyYsTvLiGQ-3xo4wwo7d60n82UdJjFawA&siteID=SAyYsTvLiGQ-3xo4wwo7d60n82UdJjFawA&utm_content=10&utm_medium=partners&utm_source=linkshare&utm_campaign=SAyYsTvLiGQ Start participating in competitions immediately after you finish this course or even earlier - do not expect to win, expect to gain experience, pick up jargon, and improve your skills.

There are great courses on Coursera by Andrew Ng about machine learning and deep learning - you can audit them without a certificate which is anyway worthless. Knowledge is OK.

Remember to practice more than learn - write code for two hours for each hour you read a book or watch a video. Put all your projects on github, create videos about what you learn and put those on youtube.

r/datascience • comment
1 points • doct0r_d

I asked a lot of different things, and it can be kind of daunting sometimes. I believe I picked up these things from various online courses/books mostly. As an example, I read https://otexts.com/fpp2/ which goes over forecasting time series data which led to https://robjhyndman.com/hyndsight/tscv/ on "cross validation" with time series. I came across various encodings for categorical variables when looking into the "vtreat" package with R (http://www.win-vector.com/blog/2017/09/custom-level-coding-in-vtreat/#more-5231).

I also like to take all sorts of MOOCs and read math books in my free time, so as an example (https://www.coursera.org/learn/competitive-data-science) is a fun one which goes over many of the things I talked about (but you should already have some background in machine learning).

For modular code, I would take some courses in programming.

I could probably come up with a good list of resources if I spent some time and thought about which ones really influenced me. I may do that in the future. Are there any specific things you want to improve on?