Production Machine Learning Systems

share ›
‹ links

Below are the top discussions from Reddit that mention this online Coursera course from Google Cloud.

Offered by Google Cloud. This course covers how to implement the various flavors of production ML systems— static, dynamic, and continuous ... Enroll for free.

Reddsera may receive an affiliate commission if you enroll in a paid course after using these buttons to visit Coursera. Thank you for using these buttons to support Reddsera.

Taught by
Google Cloud Training

and 13 more instructors

Offered by
Google Cloud

Reddit Posts and Comments

0 posts • 3 mentions • top 3 shown below

r/MachineLearning • comment
2 points • zombiecalypse

I can't speak to the quality personally, but

Sound like giving a good basis.

TIL that a lot of Coursera ML courses are Google centric, so if you find others that sound good, it may be worth to diversify. Also don't feel pressured to take all of them just because I mentioned them, learning is part of any career ;)

As a sidenote: if you create projects as part of the courses, putting them on GitHub and listing them on your CV can be a great way to demonstrate your technical ability.

r/humblebundles • comment
1 points • arjunkharbanda

FYI: All the courses by Google Cloud are available on Coursera and you can audit them for free but you will not get any certificate.

1. https://www.coursera.org/learn/gcp-exploring-preparing-data-bigquery

2. https://www.coursera.org/learn/image-understanding-tensorflow-gcp

3. https://www.coursera.org/learn/gcp-production-ml-systems

4. https://www.coursera.org/learn/end-to-end-ml-tensorflow-gcp

r/dataengineering • comment
1 points • gato_felix_69

Before listing a few resources, I would like to recommend you a few things that I´ve seen working very well in similar situations:

  • For sure, at some point you will need to choose between to equally good solutions for a certain problem. Go with the Occam's razor: pick up the simplest solution! Don´t bother with over optimization at the beginning. You will any way need to refactor things and even change the stack from time to time.
  • Always keep in mind: your role is to provide value to your clients! And time is a variable that is quite important in this case. If your customer requests a dashboard and your lead time is 6 months, forget it. Your customers will figure something out in excel and your department will be the "forget those guys, everything there takes forever"-department. For startups that boils down to using managed services as much as possible. A few examples:
  • Complicated and expensive setup
    • For ingestion, you have a Spark cluster running Spark Streaming and reading data from the transaction database using Change Data Capture, using Delta format on Databricks in a cluster running 24x7.
    • For processing, you use Scala because it has better performance than Python (not really true for 90% of the cases).
    • For visualization, you either create your own charts using matplotlib or use MicroStrategy/QlikSense
  • Simpler, but effective and cheap setup
    • For ingestion, as your use cases might not require Streaming, you load data once per day using plain SQL queries that are executed outside business hours directly against the production database. As a Databricks cluster might be overkill, you use Apache Nifi to read the data from the transaction database in a single server (not a cluster because you won´t need it for the time being and the job is not time critical).
    • For the processing, you use Python, as it´s as effective as Scala and you can get things done faster
    • For the visualization, you use the solution already provided by your cloud vendor.
  • You won´t need to implement things like Data Mesh, but separating your use cases in domains will help you on the long run. Nothing will stop you from cherry-picking ideas from these hyped concepts and applying them to your context.
  • If your team is small (which will likely be the case in a startup), then try to use managed services as much as possible. Operations is a bitch and will kill you. For example, in GCP, go for Dataflow instead of Dataproc.

Now for the references:

  • System Design Interview – An insider's guide, Second Edition Paperback – 12 Jun. 2020 English edition by Alex Xu (Autor) - very good book.
  • Mastering the System Design Interview (Frank Kane) in Udemy - Frank is really great and has many trainings in DE. This course is one of the best about systems design. https://www.udemy.com/course/system-design-interview-prep/
  • For Data Science and ML, you need to be concerned not with the algorithms per se, but with creating proper production pipelines that will allow the data sciences to deploy and test their things in an agile way. For that, the best reference I´ve seen so far is the Production Machine Learning Systems in Coursera (https://www.coursera.org/learn/gcp-production-ml-systems)
  • If you still need some ML background, check Machine Learning Crash Course from Google - https://developers.google.com/machine-learning/crash-course

​

Other tips:

  • Streaming is very complicated. Don´t do it unless you really need it
  • Cloud can be expensive, but you can take advantage of a few things such as:
  • Blob storage - cheap and you can use it as your data lake layers
  • Virtualization/dashboard solutions - most of them are free or very cheap. You won´t need to host and manage your own dashboard solution and certainly not an expensive proprietary monster like Microstrategy.
  • Serverless solutions like lamdbas or step-functions in AWS, Dataflow in GCP
  • Not every problem is a big data problem! use Spark for heavy ETL, but feel free to use Pandas as well for datasets that fit in the memory.