ML Ops

An ML culture and practice that unifies ML application development (Dev) with ML system deployment and operations (Ops)

chevron-rightExamshashtag
  • 50% internal

    • 25%for 1 -Minor

    • 15% for Assignments 1

    • 10% Not Asking more questions

  • 50% Main

Lecture 1: (11/01/2025)

Machine Learning Systems Design

The process of defining the interface, algorithms, data infrastructure, and hardware for a machine learning system to satisfy specified requirements.

These requirements are:

  • Reliable

  • Scalable

  • Maintainable

  • Adaptable

The questions to think about...

You have trained a model, now what?

What are different components of an ML system?

How to do data engineering?

How to evaluate your models, both offline and online?

What's the difference between online prediction and batch prediction?

How to serve a model on the cloud? On the edge?

How to continually monitor and deploy changes to ML systems?

chevron-rightThe Berkeley studyhashtag

The Berkeley study found that both face-to-face and online lenders rejected a total of 1.3 million creditworthy black and Latino applicants between 2008 and 2015. Researchers said they believe the applicants "would have been accepted had the applicant not been in these minority groups." That's because when they used the income and credit scores of the rejected applications but deleted the race identifiers, the mortgage application was accepted.

How to

  • Validate data correctness?

  • Test features' usefulness?

  • Detect when the underlying data distribution has changed?

  • Know if the changes are bad for models without ground truth labels?

  • Detect malicious data?

  • O Not all data points are equal (e.g. scans of cancerous lungs are more valuable)

  • Bad data might harm your model and/or make it susceptible to attacks

ML Engineering is more engineering than ML

MLEs might spend most of their time:

wrangling data

understanding data

setting up infrastructure

deploying models

instead of training ML models

Last updated