The following is a quickstart for running Flask on Spark.
Most of the example tutorials I have found are for running a bunch of spark jobs on a spark cluster and returning a result. I was interested in long-running tasks and seeing if I could build a web app that ran on Spark. I thought it would be possible but I didn’t think it would be this easy. Please note that this same procedure will work for lots of python scripts, and I am interested to see what else I can load into Spark.
- Java runtime
- Python 3 (It will probably run in python 2 with minor changes)
- Apache Spark (instructions below)
If you haven’t installed Spark then grab the latest build from
https://spark.apache.org/downloads.html and untar it in a directory somewhere.
This doesn’t need to be anywhere special and I just used
Change into the root of the extracted Spark directory::
Create the following file
start_standalone.sh to start spark. This is optional but it helps me. Set
JAVA_HOME correctly for you system, the trick below works for me.
#! /bin/sh # # start_standalone.sh # export JAVA_HOME=$(dirname $(dirname $(readlink -f $(which java)))) ./sbin/start-master.sh ./sbin/start-slave.sh spark://$(hostname -s):7077
You should also make sure you are running the same version of python when you start Spark as when you do spark-submit. The best way to do that is to quickly create a virtualenv and activate it. Install Flask in there while we are at it.
python3 -m venv env source env/bin/activate pip install flask
Now we can start our cluster (of one)
You should now be able to access the Spark UI at http://localhost:8080/ and see that it has 1 worker attached.
Next we write our Python Flask script. I have created 2 routes, one of which
calculates Pi using the example from the source code
examples/src/main/python/pi.py with a few tweaks to remove the arguments and
change them to GET parameters. `
#! /usr/bin/env python # -*- coding: utf-8 -*- # vim:fenc=utf-8 # """ Flask on Spark example. Run with: ./bin/spark-submit --master spark://$(hostname -s):7077 exampleweb.py """ import sys sys.path.append( './python' ) from random import random from operator import add from flask import Flask, request from pyspark.sql import SparkSession app = Flask(__name__) spark = SparkSession\ .builder\ .appName("Flark - Flask on Spark")\ .getOrCreate() @app.route("/") def hello(): return "Hello World! There is a spark example at <a href=\"/pi?partitions=1\">/pi</a>" @app.route("/pi") def pi(): try: partitions = int(request.args.get('partitions', '1')) except Exception as e: return e partitions = 4 n = 1000000 * partitions def f(_): x = random() * 2 - 1 y = random() * 2 - 1 return 1 if x ** 2 + y ** 2 <= 1 else 0 count = spark.sparkContext.parallelize(range(1, n + 1), partitions).map(f).reduce(add) return "Pi is roughly %f" % (4.0 * count / n) if __name__ == "__main__": app.run()
Now we can start our Flask application by submitting it to spark.
./bin/spark-submit --master spark://$(hostname -s):7077 exampleweb.py
And then access it at http://localhost:5000/ and http://localhost:5000/pi?partitions=1