This package provides a python module to convert the election results of Harris County, Texas, into csv format. It accepts as input the PDF canvass report for the entire county. E.g., General and Special Elections of November 2016. The PDF canvas reports are available online here.
It also generates a separate csv file for each office or proposition being tallied. The module will correct common errors on the OCR, but the final output is not accurate and requires manual review. For instance, the raw output for the November elections of 2012, 2014, and 2016 will look similar to this.
Please note that it does not process the cummulative reports.
- pytesseract
- PyPDF2
- Pillow
- cv2 (OpenCV 3.3+) See, e.g., opencv-python or compile OpenCV with the Python module
- GPL Ghostscript 9.18+
- This script converts each page of the PDF into a TIFF file using Ghostscript.
- Check your version:
gs -v
- Ubuntu:
sudo apt-get install ghostscript
- macOS:
brew install ghostscript
usage: convert-election-results.py [-h] [-p PDF] [-i IMAGE_FILE]
[--first-page FIRST_PAGE]
[--last-page LAST_PAGE] [-o OUTPUT_PATH]
[-v] [-d]
Convert election results to computer readable format, e.g., csv, json, xml
optional arguments:
-h, --help show this help message and exit
-p PDF, --pdf PDF PDF file to process
-i IMAGE_FILE, --image-file IMAGE_FILE
image file to process
--first-page FIRST_PAGE
page to begin processing
--last-page LAST_PAGE
page to end processing
-o OUTPUT_PATH, --output-path OUTPUT_PATH
path to write csv files
-v, --version show program's version number and exit
-d, --debug print debug messages
This product uses the election results available from the Harris County Clerk but is not endorsed or certified by the Harris County Clerk.