How to Win a Data Science Competition
Learn from Top Kagglers
If you want to break into competitive data science, then this course is for you! Participating in predictive modelling competitions can help you gain practical experience, improve and harness your data modelling skills in various domains such as credit, insurance, marketing, natural language process...
Data Analysis Feature Extraction Feature Engineering Xgboost
Next cohort starts July 13. Accessible for free. Completion certificates are offered.
Affiliate disclosure: Please use the blue and green buttons to visit Coursera if you plan on enrolling in a course. Commissions Reddsera receives from using these links will keep this site online and ad-free. Reddsera will not receive commissions if you only use course links found in the below Reddit discussions.
and 4 more instructors
National Research University Higher School of Economics
Reddit Posts and Comments
1 posts • 12 mentions • top 13 shown below
125 points • sercosan
How to Win a Data Science Competition: Learn from Top Kagglers
3 points • nckmiz
This coursera Course has some examples of ensemble methods like blending and stacking.
2 points • allliam
If you already have the necessary ML background, this coursera course (and these 3 videos on tuning in particular) give some good practical advice:
1 points • jarandaf
I guess this might be of your interest.
1 points • ExilePrime
The National Research University Higher School of Economics offers an Advanced Machine Learning Specialization through Coursera. If you pay $125 per month then you can receive a certificate for each course you complete. There's a total of 7 courses and they take 50-60 hours each to complete. Aside from the certifications, you could just audit the courses for free. One of their courses has a competition on Kaggle as part of it.
1 points • trapatsas
Maybe take a look at this course.
1 points • Ken_Obiwan
Honestly I only interviewed people for data science, not data analytics, so I can't say for sure. For data science, we were looking for 61B level coding knowledge and DS 100 level data analysis knowledge, but also knowledge of how machine learning algorithms work / how to use them (both theoretical and practical), and some knowledge of how to build web systems. I imagine you could get an analyst job with just the DS 100 stuff, but honestly I can't say for sure.
I would say CS 189 is probably more theory than we looked for. If you know how to implement most of the major algorithms from scratch, that should be enough. (Not as hard as it might sound, many are less than a page of code.)
In terms of practical stuff... at least when I was at Cal, there wasn't a good course on how to apply ML in practice. I haven't taken this class but it looks like it's probably more than you need.
1 points • graden_dissent
This course has some interesting stuff (lectures on youtube, you can pay for the notebooks) and is quite instructive if you don't take obviously misleading title seriously.
1 points • lynda_
There's also this course that shows you how to solve predictive and modeling tasks.
1 points • AurelianTactics
Coursera has course on Data Science competitions. There are a few sections on features and feature engineering that I found interesting and has a lot of ideas. There's a video by Kazanova toward the end of the course and he has like 40 feature engineering ideas in one slide.
For feature selection and feature extraction, sklearn has some pages set up that has some basics that are included with sklearn. Maybe worth skimming through and reading the sections that interest you.
I'm not sure I agree with your best practices. In some cases and some problems those are worth following but you can come up with counter examples where those best practices aren't worth doing.
Personally, I approach it this way:
Consider the specific problem and how the given features relate to the target variable. Consider doing a quick and dirty random forest and seeing how the random forest rates the feature importance.
For each feature, consider the basic transformations for that type of feature. Like for dates break it up into year/month/day of week etc. and see if its helpful. For text consider word2vec, tf-idf etc.
Try to come up with specific features for the specific problem. Either through brainstorming, EDA, or looking at how similar types of problems are solved.
If you have more time, keep trying to build more features and see what adds value.
1 points • GrayWare_Developer
It is never too early. If you start now, you will know from your ML studies how much math to learn in school.
Anyway, you will need some Computer Science basics as well as working knowledge of Python and SQL - you can get that from Kaggle https://www.kaggle.com/learn/overview - they have tutorials and courses. You can participate in their competitions and read what models other participants develop. There is a course on Coursera on how to win in Kaggle competitions - https://www.coursera.org/learn/competitive-data-science?ranMID=40328&ranEAID=SAyYsTvLiGQ&ranSiteID=SAyYsTvLiGQ-3xo4wwo7d60n82UdJjFawA&siteID=SAyYsTvLiGQ-3xo4wwo7d60n82UdJjFawA&utm_content=10&utm_medium=partners&utm_source=linkshare&utm_campaign=SAyYsTvLiGQ Start participating in competitions immediately after you finish this course or even earlier - do not expect to win, expect to gain experience, pick up jargon, and improve your skills.
There are great courses on Coursera by Andrew Ng about machine learning and deep learning - you can audit them without a certificate which is anyway worthless. Knowledge is OK.
Remember to practice more than learn - write code for two hours for each hour you read a book or watch a video. Put all your projects on github, create videos about what you learn and put those on youtube.
1 points • doct0r_d
I asked a lot of different things, and it can be kind of daunting sometimes. I believe I picked up these things from various online courses/books mostly. As an example, I read https://otexts.com/fpp2/ which goes over forecasting time series data which led to https://robjhyndman.com/hyndsight/tscv/ on "cross validation" with time series. I came across various encodings for categorical variables when looking into the "vtreat" package with R (http://www.win-vector.com/blog/2017/09/custom-level-coding-in-vtreat/#more-5231).
I also like to take all sorts of MOOCs and read math books in my free time, so as an example (https://www.coursera.org/learn/competitive-data-science) is a fun one which goes over many of the things I talked about (but you should already have some background in machine learning).
For modular code, I would take some courses in programming.
I could probably come up with a good list of resources if I spent some time and thought about which ones really influenced me. I may do that in the future. Are there any specific things you want to improve on?
1 points • Responsible_Text9102
# 1. Probability and Statistics by Stanford Online
[See course materials](https://online.stanford.edu/courses/gse-yprobstat-probability-and-statistics)
This wonderful, self-paced course covers basic concepts in probability and statistics spanning over four fundamental aspects of machine learning: exploratory data analysis, producing data, probability, and inference.
Alternatively, you might want to check out this excellent course in statistical learning: “ [An Introduction to Statistical Learning with Applications in R](https://www.r-bloggers.com/in-depth-introduction-to-machine-learning-in-15-hours-of-expert-videos/) ”.
# 2. 18:06 Linear Algebra by MIT
[See course materials](https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/)
The best linear algebra course I’ve seen, taught by the legendary professor Gilbert Strang. I’ve students describe this as “life changing”.
# 3. CS231N: Convolutional Neural Networks for Visual Recognition by Stanford
[See video lectures (2017)](https://www.youtube.com/playlist?list=PLzUTmXVwsnXod6WNdg57Yc3zFx_f-RYsq) [See course notes](https://cs231n.github.io/)
Whether you’re into computer vision or not, CS231N will help you become a better machine learning researcher/practitioner. CS231N balances theories with practices. The lecture notes are well written with visualizations and examples that explain difficult concepts such as backpropagation, gradient descents, losses, regularizations, dropouts, batchnorm, etc.
# 4.Practical Deep Learning for Coders by fast.ai
[See course materials](https://course.fast.ai/)
With the ex president of Kaggle as one of its co-founders, this hands-on course focuses on getting things up and running. It has a forum with helpful discussions about the latest best practices in machine learning.
# 5. CS224N: Natural Language Processing with Deep Learning by Stanford
[See video lectures (2017)](https://www.youtube.com/playlist?list=PLU40WL8Ol94IJzQtileLTqGZuXtGlLMP_) [See course materials](http://web.stanford.edu/class/cs224n/syllabus.html)
Taught by one of the most influential (and most down-to-earth) researcher, Christopher Manning, this is must-take course for anyone interested in natural language processing. The course is well organized, well taught, and up-to-date with the latest NLP research.
# 6. Machine Learning by Coursera
[See course materials](https://www.coursera.org/learn/machine-learning)
Originally taught at Stanford, Andrew Ng’s course is probably the most popular machine learning course in the world. Its Coursera version has been enrolled by more 2.5M people as of writing. This course is theory-heav, so students would benefit more from the course if they have taken more practical courses such as CS231N, CS224N, and Practical Deep Learning for Coders.
# 7. Probabilistic Graphical Models Specialization by Coursera
[See course materials](https://www.coursera.org/specializations/probabilistic-graphical-models)
Unlike most AI courses that introduce small concepts one by one or add one layer on top of another, this specialization tackles AI top down as it asks you to think about the relationships between different variables, how you represent those relationships, what independence you’re assuming, what exactly you’re trying to learn when you say machine learning. This specialization will change the way you approach machine learning. Warning: this specialization isn’t easy. You can also consult detailed notes written by Stanford CS228’s TAs [here](https://ermongroup.github.io/cs228-notes/) .
# 8. Introduction to Reinforcement Learning by DeepMind
[See lecture videos](https://www.youtube.com/watch?v=2pWv7GOvuf0&list=PLqYmG7hTraZDM-OYHWgPebj2MfCFzFObQ)
Reinforcement learning is hard. Luckily, David Silver is here to the rescue. This course provides a great introduction to RL with intuitive explanations and fun examples, taught by one of the world’s leading RL experts.
# 9. Full Stack Deep Learning Bootcamp by Berkeley
[See course materials](https://fullstackdeeplearning.com/march2019)
Most courses only teach you how to train and tune your models. This is the only one I’ve seen that shows you how to design, train, and deploy models from A to Z. This is also a great resource for those struggling with the machine learning system design questions in interviews.
# 10. How to Win a Data Science Competition: Learn from Top Kagglers by Coursera
[See course materials](https://www.coursera.org/learn/competitive-data-science/home/welcome)
With all the knowledge we’ve learned, it’s time to head over to Kaggle to build some machine learning models to gain experience and win some money. Warning: Kaggle grandmasters might not necessarily be good instructors.
** 11. Full Stack Deep Learning: Deploy ML Projects
[Lecture 1: Introduction to Deep Learning - Full Stack Deep Learning - March 2019 - YouTube](https://www.youtube.com/watch?v=5AjG5OPQuBM&list=PLbcQZcJKzjYWRD2LB8N2I8bFNWg3W_J3n&index=2&t=0s)