Model as a Service through Flask
The purpose of any model can only be fulfilled if it is used by other people, as soon as possible. Namely, if one can shorten the time between the model development and model deployment, it would help to validate the model’s value in the organization. However, deploying a model (i.e., put it to production) can sometimes be a long yet tedious process, especially without strong engineering support in place.
What does the client want
Often time, the end users of our models are other stakeholders within the organization, who simply would like to know the prediction output by your model, on a new input. Think a mortgage application: an applicant walks in for a loan, and the loan officer would like to know the applicant’s probability of default, given some attributes of both the applicant and the intended loan. In this case, the officer should be able to quickly pass that information to the model, and get the result back.
In the end, the loan officer only needs to pass the essential information, such as {'age': 35, 'gender': 'male', ...}
to the model, and expecting something like {'prediction': 0.02}
. Everything else should be handled by the infrastructure and the model behind. Given such a design pattern, it is hard not to use the RESTful API service.
A server to meet the client’s need
Of course, we won’t expect the loan officer actually constructing the json
string, but rather facing a well designed frontend interface. However, behind the frontend, all the information are most likely nested as a json
string anyway, and submitted as an HTTP POST request to the server. Once the server receives the request, it will parse the json
string, invoke the appropriate model, make the prediction, and return the result back to the frontend. Done!
It all sounds great, but how much effort is needed to actually build such a server? It turns out not to be so difficult, at least for a working prototype. Thanks to Flask
, one can deploy a model in this fashion in minutes.
A hello world server
Nothing works better than a hello-world example. First let’s make a dedicated folder (model_server
) for this purpose, with the following simple structure:
── model_server
├── server.py
├── utils.py
└── models
├── model1.pkl
...
We have trained and pickled a model named model1.pkl
and put it under the /models
folder. The utils.py
will contain some helper functions, and the main effort here will be the server.py
file, as:
from flask import Flask, jsonify, request, abort
from utils import load_pkl_model, check_payload, get_precition
app = Flask(__name__)
MODEL_NAME = './models/model1.pkl'
@app.route('/')
def home():
"""This function just responds to the browser URL.
"""
return "Home page"
@app.route('/api', methods=['POST', 'GET'])
def api():
"""This is the endpoint of api calls.
"""
if request.method == 'POST':
try: # we want json
payload = request.get_json(force=True) # convert it to dict
data = payload['data'] # `data` is a dict
except Exception as e:
print(e)
abort(400)
# let's load the model first
model = load_pkl_model(MODEL_NAME)
# payload received, let's turn it to proper format
# e.g., check for whether we have all the features needed
inputs = check_payload(model, data)
# combine the input and model
pred = get_precition(model, inputs) # `pred` is a dict
return jsonify(pred)
else: # GET, print some info about the model
return jsonify({
'model_name': MODEL_NAME,
'feature_names': ['feature_1', 'feature_2']
})
if __name__ == '__main__':
app.run(debug=True, port=5050)
Above example is just a very simple case, and the codes are pretty straightforward, where the meat is in the api()
function. We simply carry out a procedural process to: 1) load the model, 2) process the received input, 3) apply the input to the model.
The devil is in the detail, and I conveniently brush them under utils.py
. In any case, we are ready to start the server! In the terminal, simply go into model_server
folder, and run:
$ python server.py
Lo and behold, Flask
will do its magic, and we have a server running at http://127.0.0.1:5050
.
Making an API call
Now we have the server ready, let the client make the call, as
import request
url = 'http://127.0.0.1:5050/api' # if you are running it locally
payload = {
"data": {
"feature_1": [1.0],
"feature_2": [0.5],
}
}
r = requests.post(url, json=payload)
r.json() # it should be the `pred` variable defined in `server.py`
Here the client only has to provide the essential information needed, in a predetermined format that is accepted by the server. We choose the json
format, since it would allow the client to submit additional fields, such as version of the model, type of backend engine, as long as the server expects these fields and knows how to handle them. Once the server receives the request, the backend engine will be called to make predictions.
The above example deploys the server on the local machine, with Flask
itself, however, it is just a few more steps to deploy it on the cloud, such as AWS.
Conclusion
Once the communication format between the server and the client is established and documented, one can freely make changes to the backend models without the client changing anything grammatically.
The ability to clearly divide the development and depolyment has huge benefits. Not only it allows abstractions of the development and deployment side of things, but also makes it possible for other data scientists on the team to play with your models quickly and easily. This is particularly valuable as models become more and more opaque, and a global or local understandings of the models can only be done through numerical simulations.
Leave a Comment