6 minute read

The purpose of any model can only be fulfilled if it is used by other people, as soon as possible. Namely, if one can shorten the time between the model development and model deployment, it would help to validate the model’s value in the organization. However, deploying a model (i.e., put it to production) can sometimes be a long yet tedious process, especially without strong engineering support in place.

What does the client want

Often time, the end users of our models are other stakeholders within the organization, who simply would like to know the prediction output by your model, on a new input. Think a mortgage application: an applicant walks in for a loan, and the loan officer would like to know the applicant’s probability of default, given some attributes of both the applicant and the intended loan. In this case, the officer should be able to quickly pass that information to the model, and get the result back.

In the end, the loan officer only needs to pass the essential information, such as {'age': 35, 'gender': 'male', ...} to the model, and expecting something like {'prediction': 0.02}. Everything else should be handled by the infrastructure and the model behind. Given such a design pattern, it is hard not to use the RESTful API service.

A server to meet the client’s need

Of course, we won’t expect the loan officer actually constructing the json string, but rather facing a well designed frontend interface. However, behind the frontend, all the information are most likely nested as a json string anyway, and submitted as an HTTP POST request to the server. Once the server receives the request, it will parse the json string, invoke the appropriate model, make the prediction, and return the result back to the frontend. Done!

It all sounds great, but how much effort is needed to actually build such a server? It turns out not to be so difficult, at least for a working prototype. Thanks to Flask, one can deploy a model in this fashion in minutes.

A hello world server

Nothing works better than a hello-world example. First let’s make a dedicated folder (model_server) for this purpose, with the following simple structure:

── model_server
    ├── server.py
    ├── utils.py
    └── models
        ├── model1.pkl
        ...

We have trained and pickled a model named model1.pkl and put it under the /models folder. The utils.py will contain some helper functions, and the main effort here will be the server.py file, as:

from flask import Flask, jsonify, request, abort
from utils import load_pkl_model, check_payload, get_precition

app = Flask(__name__)
MODEL_NAME = './models/model1.pkl'

@app.route('/')
def home():
    """This function just responds to the browser URL.
    """
    return "Home page"

@app.route('/api', methods=['POST', 'GET'])
def api():
    """This is the endpoint of api calls.
    """

    if request.method == 'POST':
        try:  # we want json
            payload = request.get_json(force=True)  # convert it to dict
            data = payload['data']  # `data` is a dict
        except Exception as e:
            print(e)
            abort(400)

        # let's load the model first
        model = load_pkl_model(MODEL_NAME)

        # payload received, let's turn it to proper format
        # e.g., check for whether we have all the features needed
        inputs = check_payload(model, data)

        # combine the input and model
        pred = get_precition(model, inputs)  # `pred` is a dict

        return jsonify(pred)

    else:  # GET, print some info about the model
        return jsonify({
            'model_name': MODEL_NAME,
            'feature_names': ['feature_1', 'feature_2']
        })

if __name__ == '__main__':
    app.run(debug=True, port=5050)

Above example is just a very simple case, and the codes are pretty straightforward, where the meat is in the api() function. We simply carry out a procedural process to: 1) load the model, 2) process the received input, 3) apply the input to the model.

The devil is in the detail, and I conveniently brush them under utils.py. In any case, we are ready to start the server! In the terminal, simply go into model_server folder, and run:

$ python server.py

Lo and behold, Flask will do its magic, and we have a server running at http://127.0.0.1:5050.

Making an API call

Now we have the server ready, let the client make the call, as

import request

url = 'http://127.0.0.1:5050/api'  # if you are running it locally

payload = {
    "data": {
        "feature_1": [1.0],
        "feature_2": [0.5],
    }
}

r = requests.post(url, json=payload)
r.json()  # it should be the `pred` variable defined in `server.py`

Here the client only has to provide the essential information needed, in a predetermined format that is accepted by the server. We choose the json format, since it would allow the client to submit additional fields, such as version of the model, type of backend engine, as long as the server expects these fields and knows how to handle them. Once the server receives the request, the backend engine will be called to make predictions.

The above example deploys the server on the local machine, with Flask itself, however, it is just a few more steps to deploy it on the cloud, such as AWS.

Conclusion

Once the communication format between the server and the client is established and documented, one can freely make changes to the backend models without the client changing anything grammatically.

The ability to clearly divide the development and depolyment has huge benefits. Not only it allows abstractions of the development and deployment side of things, but also makes it possible for other data scientists on the team to play with your models quickly and easily. This is particularly valuable as models become more and more opaque, and a global or local understandings of the models can only be done through numerical simulations.

Tags:

Updated:

Leave a Comment