How to Win a Data Science Competition
Learn from Top Kagglers

share ›
‹ links

Below are the top discussions from Reddit that mention this online Coursera course from National Research University Higher School of Economics.

If you want to break into competitive data science, then this course is for you.

Data Analysis Feature Extraction Feature Engineering Xgboost

Reddsera may receive an affiliate commission if you enroll in a paid course after using these buttons to visit Coursera. Thank you for using these buttons to support Reddsera.

Taught by
Dmitry Ulyanov
Visiting lecturer
and 4 more instructors

Offered by
National Research University Higher School of Economics

Reddit Posts and Comments

1 posts • 11 mentions • top 12 shown below

r/learnmachinelearning • post
125 points • sercosan
How to Win a Data Science Competition: Learn from Top Kagglers
r/datascience • comment
3 points • nckmiz

This coursera Course has some examples of ensemble methods like blending and stacking.

https://www.coursera.org/learn/competitive-data-science

r/reinforcementlearning • comment
2 points • allliam

If you already have the necessary ML background, this coursera course (and these 3 videos on tuning in particular) give some good practical advice:

https://www.coursera.org/learn/competitive-data-science/lecture/giBKx/hyperparameter-tuning-i

r/learnmachinelearning • comment
1 points • jarandaf

I guess this might be of your interest.

r/learnmachinelearning • comment
1 points • ExilePrime

The National Research University Higher School of Economics offers an Advanced Machine Learning Specialization through Coursera. If you pay $125 per month then you can receive a certificate for each course you complete. There's a total of 7 courses and they take 50-60 hours each to complete. Aside from the certifications, you could just audit the courses for free. One of their courses has a competition on Kaggle as part of it.

r/berkeley • comment
1 points • Ken_Obiwan

Honestly I only interviewed people for data science, not data analytics, so I can't say for sure. For data science, we were looking for 61B level coding knowledge and DS 100 level data analysis knowledge, but also knowledge of how machine learning algorithms work / how to use them (both theoretical and practical), and some knowledge of how to build web systems. I imagine you could get an analyst job with just the DS 100 stuff, but honestly I can't say for sure.

I would say CS 189 is probably more theory than we looked for. If you know how to implement most of the major algorithms from scratch, that should be enough. (Not as hard as it might sound, many are less than a page of code.)

In terms of practical stuff... at least when I was at Cal, there wasn't a good course on how to apply ML in practice. I haven't taken this class but it looks like it's probably more than you need.

https://www.coursera.org/learn/competitive-data-science

r/dataengineering • comment
1 points • trapatsas

Maybe take a look at this course.

r/learnmachinelearning • comment
1 points • graden_dissent

https://www.coursera.org/learn/competitive-data-science

This course has some interesting stuff (lectures on youtube, you can pay for the notebooks) and is quite instructive if you don't take obviously misleading title seriously.

r/WGU_CompSci • comment
1 points • lynda_

There's also this course that shows you how to solve predictive and modeling tasks.

https://www.coursera.org/learn/competitive-data-science/home/welcome

r/learnmachinelearning • comment
1 points • AurelianTactics

Coursera has course on Data Science competitions. There are a few sections on features and feature engineering that I found interesting and has a lot of ideas. There's a video by Kazanova toward the end of the course and he has like 40 feature engineering ideas in one slide.

For feature selection and feature extraction, sklearn has some pages set up that has some basics that are included with sklearn. Maybe worth skimming through and reading the sections that interest you.

I'm not sure I agree with your best practices. In some cases and some problems those are worth following but you can come up with counter examples where those best practices aren't worth doing.

Personally, I approach it this way:

  • Consider the specific problem and how the given features relate to the target variable. Consider doing a quick and dirty random forest and seeing how the random forest rates the feature importance.

  • For each feature, consider the basic transformations for that type of feature. Like for dates break it up into year/month/day of week etc. and see if its helpful. For text consider word2vec, tf-idf etc.

  • Try to come up with specific features for the specific problem. Either through brainstorming, EDA, or looking at how similar types of problems are solved.

  • If you have more time, keep trying to build more features and see what adds value.

r/learnprogramming • comment
1 points • GrayWare_Developer

It is never too early. If you start now, you will know from your ML studies how much math to learn in school.

Anyway, you will need some Computer Science basics as well as working knowledge of Python and SQL - you can get that from Kaggle https://www.kaggle.com/learn/overview - they have tutorials and courses. You can participate in their competitions and read what models other participants develop. There is a course on Coursera on how to win in Kaggle competitions - https://www.coursera.org/learn/competitive-data-science?ranMID=40328&ranEAID=SAyYsTvLiGQ&ranSiteID=SAyYsTvLiGQ-3xo4wwo7d60n82UdJjFawA&siteID=SAyYsTvLiGQ-3xo4wwo7d60n82UdJjFawA&utm_content=10&utm_medium=partners&utm_source=linkshare&utm_campaign=SAyYsTvLiGQ Start participating in competitions immediately after you finish this course or even earlier - do not expect to win, expect to gain experience, pick up jargon, and improve your skills.

There are great courses on Coursera by Andrew Ng about machine learning and deep learning - you can audit them without a certificate which is anyway worthless. Knowledge is OK.

Remember to practice more than learn - write code for two hours for each hour you read a book or watch a video. Put all your projects on github, create videos about what you learn and put those on youtube.

r/datascience • comment
1 points • doct0r_d

I asked a lot of different things, and it can be kind of daunting sometimes. I believe I picked up these things from various online courses/books mostly. As an example, I read https://otexts.com/fpp2/ which goes over forecasting time series data which led to https://robjhyndman.com/hyndsight/tscv/ on "cross validation" with time series. I came across various encodings for categorical variables when looking into the "vtreat" package with R (http://www.win-vector.com/blog/2017/09/custom-level-coding-in-vtreat/#more-5231).

I also like to take all sorts of MOOCs and read math books in my free time, so as an example (https://www.coursera.org/learn/competitive-data-science) is a fun one which goes over many of the things I talked about (but you should already have some background in machine learning).

For modular code, I would take some courses in programming.

I could probably come up with a good list of resources if I spent some time and thought about which ones really influenced me. I may do that in the future. Are there any specific things you want to improve on?