Skip to content

Commit 900a90d

Browse files
authored
Move fpga demo from fpga branch to tsm_fpga. Update fpga README. Disable FPGA kinetics demo. Update online-TSM src/main.cpp with video-friendly option. Simple code restructure to move mobilenet_v2_tfslim.py out of submodule (the shift operation is still implemented in tensorflow-slim submodule), we simply seperate our code. (#167)
1 parent 7192eaa commit 900a90d

File tree

22 files changed

+19684
-0
lines changed

22 files changed

+19684
-0
lines changed

.gitmodules

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
[submodule "tsm_fpga/tf_models"]
2+
path = tsm_fpga/tf_models
3+
url = https://github.com/JoshNoel/tf_models

tsm_fpga/README.md

+34
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# TSM Deployed to FPGA
2+
3+
We deploy TSM to FPGA using the Vitis-AI framework. To do so, we generate a tensorflow implementation of the TSM model, and pipeline the network such that all shift operations are isolated. This allows deployment of the Shift operation to CPU and the remaining operations to the Vitis-AI DPU IP.
4+
5+
We must take additional steps to deploy this pipelined model. First, isolating the shift-operations results in a number of seperate DPU kernels for the seperate portions of the network (11 for MobileNetV2 TSM). These kernels must be quantized to int8 and compiled for DPU seperately.
6+
7+
To quantize the split model, we dump intermediate activations from the unsplit implementation at the locations of DPU kernel inputs. These inputs are then used as input to the Vitis-AI quantizer. Once quantized, the resulting splits of the model can be compiled into the final demo executable.
8+
9+
![split-mbv2](https://github.com/mit-han-lab/temporal-shift-module/tree/master/tsm_fpga/images/split_mobilenetv2_bottleneck.png)
10+
11+
## FPGA Setup
12+
13+
To build the FPGA project, ensure you have initialized to tensorflow-slim submodule (git submodule update --init --recursive).
14+
15+
This was tested with the ZCU104 MPSOC DPU TRD in the Vitis-AI repository and the Ultra96V2 Avnet 2020.1 beta branch (https://github.com/Avnet/vitis/tree/2020.1) (See the following guide for additional build instructions https://www.hackster.io/AlbertaBeef/vitis-ai-1-1-flow-for-avnet-vitis-platforms-part-2-f18be4)
16+
17+
### 1) Dump Split TF Models
18+
The `mobilenet_v2_tfslim.py` is the primary scripts to build the online-TSM model for FPGA. To generate the split model set `SPLIT_MODEL`,`SPLIT_EXPORT`,and EXPORT to True at the top of the files. After running the script, you will see the split model dumped to the `model_tf_split_*` directories.
19+
20+
### 2) Dump Quantization Inputs
21+
To gather quantization information, one must run the unsplit models. To do so ensure you set to quantize data paths at the TODOs at the top of the files. Then set `SPLIT_MODEL`,`SPLIT_EXPORT`, and EXPORT to False. Then set the corresponding `QUANTIZE_*` flag and `DUMP_QUANTIZE` flag to True to enable quantization.
22+
23+
### 3) Quantize & Compile DPU Kernels
24+
Once quantization data is generated (see `inputs.pickle` and `quantize_info.txt` under the `model_tf_split_export/*` directories), one can move to the `fpga_build` to quantize and compile each split of the model.
25+
26+
Update `compile_split.sh` to use the correct target architecture variable. Use the `quantize_split.sh` and `compile_split.sh` files to launch `vai_q_tensorflow` and `vai_c_tensorflow` respectively (from within the docker container).
27+
28+
### 4) Compile demo executable
29+
Once model quantization is complete, in the `fpga_build/model_tf_split` directory one can run "make `ultra96v2.tsm_online`" or "make `zcu104.tsm_online` to generate the demo executable for a given target from the src files and generated DPU kernels.
30+
31+
## Ultra96V2 Online-TSM Jester Demo
32+
33+
On Ultra96V2 we achieve an inference throughput of 37 FPS with a power consumpstion of 10.6W.
34+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
2+
# Disable default rebuild Makefile rule to avoid Makeile.o match
3+
MAKEFLAGS += -r
4+
5+
CXX ?= aarch64-xilinx-linux-g++
6+
CFLAGS += -O3 -g -Wall -Wpointer-arith -std=c++14 -ffast-math -mcpu=cortex-a53
7+
LDFLAGS += -L./ -ln2cube -lpthread -lopencv_core -lopencv_imgproc -lopencv_videoio -lopencv_imgcodecs -lopencv_highgui
8+
CFLAGS += -fdiagnostics-color=always
9+
10+
SRC = ./src
11+
COMPILE_RESULTS = compile_results
12+
BUILD = build
13+
14+
VPATH = $(SRC)
15+
CPP_FILES = $(wildcard $(SRC)/*.cpp)
16+
OBJ = $(patsubst $(SRC)/%.cpp, $(BUILD)/%.o, $(CPP_FILES))
17+
18+
TARGETS = zcu104.tsm_online ultra96v2.tsm_online
19+
.PHONY: all clean copy $(TARGETS)
20+
21+
all : $(TARGETS)
22+
23+
$(TARGETS) : SUBDIR = $(patsubst %.tsm_online,%,$@)
24+
$(TARGETS) : ELF = $(shell find $(SUBDIR)/$(COMPILE_RESULTS) -name *.elf)
25+
$(TARGETS) : %.tsm_online : $(OBJ)
26+
mkdir -p $(SUBDIR)/$(BUILD)
27+
$(CXX) $(CFLAGS) $^ $(ELF) -o $*/tsm_online $(LDFLAGS)
28+
29+
%.copy : tsm_online
30+
scp ./tsm_online %(dir %@):~/tsm_online/
31+
32+
33+
$(BUILD)/%.o : %.cpp
34+
$(CXX) -c $(CFLAGS) $< -o $@
35+
36+
clean :
37+
$(RM) -rf $(BUILD)
38+
$(RM) tsm_online
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
import os
2+
from PIL import Image
3+
import numpy as np
4+
import random
5+
import pickle
6+
7+
IMAGENET_PATH = "/MEng/Data/ILSVRC2012_img_val/"
8+
MEAN = [0.485, 0.456, 0.406]
9+
STD = [0.229, 0.224, .225]
10+
11+
CALIB_BASE_PATH=os.getenv("CALIB_BASE_PATH")
12+
if CALIB_BASE_PATH is None:
13+
raise ValueError("Environment variable CALIB_BASE_PATH not set")
14+
15+
CALIB_MODEL_SPLIT=os.getenv("CALIB_MODEL_SPLIT")
16+
if CALIB_MODEL_SPLIT is None:
17+
raise ValueError("Environment variable CALIB_MODEL_SPLIT not set")
18+
19+
quantize_info_path = os.path.join(CALIB_BASE_PATH, f"model_tf_split_{CALIB_MODEL_SPLIT}/quantize_info.txt")
20+
input_info_path = os.path.join(CALIB_BASE_PATH, f"model_tf_split_{CALIB_MODEL_SPLIT}/inputs.pickle")
21+
22+
input_shapes = {}
23+
with open(quantize_info_path) as f:
24+
lines = f.readlines()
25+
raw_input_names = []
26+
raw_input_shapes = []
27+
for i in range(len(lines)):
28+
if "--input_nodes" in lines[i]:
29+
raw_input_names = lines[i+1].rstrip()
30+
if "--input_shapes" in lines[i]:
31+
raw_input_shapes = lines[i+1].rstrip()
32+
33+
raw_input_names = raw_input_names.split(",")
34+
raw_input_shapes = raw_input_shapes.split(":")
35+
raw_input_shapes = [[int(x) for x in shape.split(',')] for shape in raw_input_shapes]
36+
input_shapes = dict(zip(raw_input_names, raw_input_shapes))
37+
38+
39+
input_data = {}
40+
# shift_concat, resid
41+
with open(input_info_path, 'rb') as f:
42+
input_data = pickle.load(f)
43+
44+
def input_fn(iter):
45+
#files = sorted(os.listdir(IMAGENET_PATH))
46+
#img = Image.open(os.path.join(IMAGENET_PATH,files[iter])).resize((224, 224))
47+
#img = np.array(img) / 255.0
48+
##img = (img - MEAN) / STD
49+
#img = np.transpose(img, axes=[2, 0, 1])
50+
#img = np.expand_dims(img, axis=0)
51+
#return {"input_node": img}
52+
inputs = {}
53+
for name,shape in input_shapes.items():
54+
if "/input" in name:
55+
inputs[name] = np.array(input_data[iter]["resid"])
56+
#inputs[name] = np.array(input_data["0"]["resid"])
57+
else:
58+
inputs[name] = np.array(input_data[iter]["shift_concat"])
59+
#inputs[name] = np.array(input_data["0"]["shift_concat"])
60+
61+
#inputs = {name: np.random.rand(*shape) for name,shape in input_shapes.items()}
62+
63+
return inputs
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
#!/bin/bash
2+
3+
num_splits=$(ls quantize_results | grep ^quantize_results_.* | wc -l)
4+
5+
#ZCU104_arch="/opt/vitis_ai/compiler/arch/DPUCZDX8G/ZCU104/arch.json"
6+
ZCU104_arch="../zcu104_arch/arch.json"
7+
ULTRA96V2_arch="../ultra96v2_arch/arch.json"
8+
9+
ZCU104_out="zcu104/compile_results"
10+
ULTRA96V2_out="ultra96v2/compile_results"
11+
12+
echo "Compiling $num_splits splits..."
13+
14+
for ((i=0;i<num_splits;i++)); do
15+
printf "\n================ Compiling split # $i ====================\n"
16+
17+
tee_append=""
18+
if [[ $i -ne 0 ]]; then
19+
tee_append="-a"
20+
fi
21+
22+
vai_c_tensorflow --arch "$ZCU104_arch" \
23+
--frozen_pb "quantize_results/quantize_results_$i/deploy_model.pb" \
24+
--output_dir "$ZCU104_out/compile_results_$i" \
25+
--net_name "tsm_mobilenet_v2_$i" \
26+
--options "{'save_kernel':'','dump':'graph','split_io_mem':'','mode':'normal'}" \
27+
2>&1 | tee $tee_append compile_log.txt
28+
done

tsm_fpga/fpga_build/model_tf_split/quantize_results/.gitkeep

Whitespace-only changes.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
#!/bin/bash
2+
set -e
3+
4+
export DECENT_DEBUG=3
5+
6+
# Constants
7+
num_splits=11
8+
calib_iter=50
9+
10+
# Path to output directory of TF
11+
base_path="$1"
12+
base_options="--calib_iter $calib_iter --input_fn calib_input_split.input_fn"
13+
14+
num_split_dirs=$(ls "$base_path" | wc -l)
15+
16+
if [[ num_splits -ne $num_split_dirs ]]; then
17+
echo "Number of outputs split directories from: \n\
18+
'$base_path' ($num_split_dirs)\n not equal to coded num_splits ($num_splits)"
19+
fi
20+
21+
export CALIB_BASE_PATH="$base_path"
22+
23+
if [[ $# -eq 0 ]]; then
24+
echo "Missing arg: Provide path to base split model dir"
25+
exit 1
26+
fi
27+
28+
for ((i=0;i<num_splits;i++)); do
29+
printf "\n================ Quantizing split # $i ====================\n"
30+
model_dir="$base_path/model_tf_split_$i"
31+
config=$(<"$model_dir/quantize_info.txt")
32+
export CALIB_MODEL_SPLIT=$i
33+
34+
tee_append=""
35+
if [[ $i -ne 0 ]]; then
36+
tee_append="-a"
37+
fi
38+
39+
vai_q_tensorflow quantize --output_dir "quantize_results/quantize_results_$i" $base_options --input_frozen_graph "$model_dir/model_tf_split_$i.pb" \
40+
$(echo $config) 2>&1 | tee $tee_append quantize_log.txt
41+
done
42+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
#include <dirent.h>
2+
#include <vector>
3+
#include <string>
4+
5+
std::vector<std::string> listDir(const std::string& path) {
6+
std::vector<std::string> res;
7+
std::string prepend = (path.back() == '/') ? path : path + "/";
8+
9+
DIR *df;
10+
struct dirent *file;
11+
df = opendir(path.c_str());
12+
if (df) {
13+
while ((file = readdir(df))) {
14+
if (!file->d_name || file->d_name[0] == '.')
15+
continue;
16+
res.push_back(prepend + file->d_name);
17+
}
18+
closedir(df);
19+
}
20+
21+
std::sort(res.begin(), res.end());
22+
return res;
23+
}

0 commit comments

Comments
 (0)