Leverage Core ML code generation while reducing the bundle size

Feb 07, 2025

Core ML Models are in Core ML format(with .mlpackage extension) and can be integrated seamlessly into Xcode projects. A model is the result of applying a machine learning algorithm to a set of training data. You use a model to make predictions based on new input data.

If you are not familiar with Core ML and how to use Core ML to perform tasks like detecting faces and dominant objects, please refer to the Apple documentation.

While a Core ML model can store the training data’s results, it can be large in disk space, potentially consuming up to 50~100+MB for enhanced precision. If your app’s main feature is related to machine learning, sure it’s worth bundling the whole Core ML model into your app’s bundle and using it as soon as it launches. But if only one of your sub-features in your app uses machine learning, and it’s not even easy to find out by users, gaining 50~100+MB for your app bundle should be reconsidered since your app’s bundle size is an important metric regarding the download rate of your app.

You can, of course, choose to download the whole Core ML model from the remote and then use some APIs like MLModel.compileModel(at:) to compile the downloaded model and use the Core ML API to do its job. However, remember that the Core ML models can be recognized and integrated by Xcode and can provide code generation for use. In such a case, the API is tailored for the model, and we can write as little as possible to integrate the machine learning feature into our apps.

Is there any way to leverage the Core ML code generation while not bundling the whole model into your app bundle to reduce the bundle size? Yes, of course there is.

Examine the Core ML package

The downloaded Core ML model from the Apple website is a zip package, and after unzipping it, you will get a directory with a name suffixing .mlpackage. To examine the package content, you can just right-click the .mlpackage and choose “Shows Package Contents”.

Then you can find out what’s inside the whole package.

A few key points here:

The Manifest.json file records the necessary file entities for this ML package.
The model.mlmodel file should be recognized by Xcode and provide the model metadata, and importantly can be used to generate code for use in Xcode.
The weight.bin file contains the training data’s results, which should be the elephant in the room regarding the package size.

Leverage Core ML code generation while reducing the bundle size

The key points to leverage Core ML code generation while reducing the bundle size are described as follows. Note: The following content uses the Core ML model downloaded from the Apple website. Be sure to check it out.

First, strip the weight.bin file from the package and replace it with an arbitrary file with the exact same name. For example, creating an empty file with zero bytes and renaming it to weight.bin.

Then, drag the modified ML package to your project in Xcode. Then, it can be recognized by Xcode and make sure you can see the information extracted by Xcode.

Until now, the code generation work as expected, and you can write your code to check this out:

Note that the entry point of using this model is a class naming with the same model name. For example, for the FastViTMA36F16 model, the class name is FastViTMA36F16.

Since we are replacing the original weight.bin file with an arbitrary file, the Core ML prediction won’t work as expected, we need to download the original ML package from remote and compile it for later use.

As the .mlpackage is just a directory, you will need to zip it first and place it on your remote. After downloading from the remote, you will also need to unzip it to the local storage of your app.

Then, you can call the MLModel.compileModel(at:) method where the unzipURL is pointing to the directory with name ending with .mlpackage.

After compiling the model, it’s best to store it in the local storage to avoid re-downloading and re-compiling it for future use.

The compiledURL file can be used in the MLModel(contentsOf:) method to create the MLModel instance.

The generated entry point FastViTMA36F16 also will also have a generated initializer to take the MLModel instance as an argument:

Then you can use the model to do the predictions as expected.

JuniperPhoton’s Substack

Discussion about this post