McAPI - HTML to PDF Converter API with Python

Python sample code to convert HTML and Website content to PDF with the McAPI HTML to PDF Converter REST API service. All sample code was written in Python 3 but the HTML to PDF converter can also be used with older versions. The sample requires the requests module, install it with PIP if necessary:

$ python3 -m pip install requests

Requirements: A free RapidAPI account. Replace YOUR_API_KEY in the code below with your RapidAPI key.

All samples below work with the free tier of the API, see the McAPI HTML to PDF API Listing for available plans.

See the overview page for a reference that lists all available parameters and error codes.

Convert an HTML Invoice to PDF in Python

In the first code snippet we'll convert this invoice to PDF, a typical use case of HTML to PDF conversion. We set the page format to "A4" and the storeExternal option to "true"; with this settings the PDF will be returned as a downloadable URL.

The source code:

# Python 3
  
import requests

url = 'https://mcapi-html-2-pdf.p.rapidapi.com/'

payload = '{\
  "url": "https://mcapi.io/html2pdf/templates/invoice.html",\
  "format": "A4",\
  "storeExternal": "true"\
}'
headers = {
  'content-type': 'application/json',
  'x-rapidapi-key': 'YOUR_API_KEY',
  'x-rapidapi-host': 'mcapi-html-2-pdf.p.rapidapi.com'
}

response = requests.request('POST', url, data=payload, headers=headers)

The PDF's URL will be delivered as a JSON object in response.text, e.g.:

{
  "service": "McAPI HTML 2 PDF, https://mcapi.io",
  "version": "V1",
  "pdf": "https://...pdf"
}

McAPI HTML to PDF API - Converted HTML Invoice as PDF with Python

The returned PDF from the Python snippet, seen here in the macOS Preview app. The PDF is fully indexable and searchable:

Image of Converted HTML Invoice to PDF Python

Specifying page formats

The HTML to PDF Converter API contains a built-in list of common paper formats, e.g. "A4" or "letter". To get a list of all formats use the listFormats option, like so:

# Python
  
...

payload = '{\
  "listFormats": "true"\
}'
...

The API will now return a list of all predefined formats, shown here as JSON source:

{
  formats: ["Letter", "Legal", "Tabloid", "Ledger", "A0", "A2", "A3", "A4", "A5", "A6"]
}

Specifying a predefined format is simple, just use its name in the call (format names, like all parameters and options, are case sensitive):

# Python
  
...

payload = '{\
  "url": "https://mcapi.io/html2pdf/templates/invoice.html",\
  "format": "Letter",\
  "storeExternal": "true"\
}'

...

Posting HTML code for conversion to PDF with Python

All previous examples sent URLs for conversion to the API. You can also post HTML code directly to the API. Use the html parameter for this. We first load the HTML code from a file, then escape it and encode it as utf-8 so that it can be put into a JSON block:

# Python 3

import json

...

with open('invoice.html', 'r') as f:
  data = f.read()
  html = json.dumps(data).encode('utf-8')

... 

Then we set the parameters like so:

# Python 3

...

payload = '{\
  "html": html,\
  "format": "Letter",\
  "storeExternal": "true"\
}'

...

The rest of the call remains unchanged. The API will now render the provided HTML into a PDF and return it to the caller.

Note that relative links or references in the HTML will not work. Example for a link that won't resolve:

<img src="../templates/logo.png"/>

All references and links in your HTML code must be absolute and point to valid web locations, example:

<img src="https://mcapi.io/html2pdf/templates/logo.png"/>

Make sure to see the discussion in the API overview on this.

With your call you can also specify headers and footers to be put on each page of the generated PDF. The overview has more on this and provides some examples.

Cookie consent banners and ad blocking

If so desired, the API can also automatically click the "Accept" button on GDPR / DSGVO cookie consent banners (Note that this feature is currently experimental, discussion.)

Consider the CNBC website which we convert to a PDF with this Python payload:

# Python 3
  
...

payload = '{\
  "html": "https://cnbc.com",\
  "format": "A4",\
  "background": "true",\
  "orientation": 1,\
  "storeExternal": "true"\
}'

...

The site displays a very large cookie consent banner. Screenshot from the captured PDF (link to PDF):

Python Website HTML to PDF Conversion with Cookie Banner

Set the cookie option in the payload to "true" like so:

# Python 3
  
...

payload = '{\
  "html": "https://cnbc.com",\
  "format": "A4",\
  "background": "true",\
  "orientation": 1,\
  "cookie": "true",\
  "storeExternal": "true"\
}'

...

The site without the banner but with ads instead (link to PDF):

Python Website HTML to PDF Conversion with Ad.png

Blocking website ads before conversion to a PDF

While it can be useful to convert websites with all ads (for example to document ad placement or to check ad rotation), in many cases you want the PDFs without any ads. The API comes with a built-in ad blocker, activate it like so:

# Python 3
  
...

payload = '{\
  "html": "https://cnbc.com",\
  "format": "A4",\
  "background": "true",\
  "orientation": 1,\
  "cookie": "true",\
  "adblock": "true",\
  "storeExternal": "true"\
}'

...

The Site without cookie banner and without ads (link to PDF):

Python Website HTML to PDF Conversion no Cookies no Ads

For the PDF of the CNBC website we had set the orientation to "1" (for landscape) and the background option to "true". This is a sensible option for converting websites to PDF because they often have inverted text and similar styling. Here's the same site without background elements (the default). The site has a lot of white text on blue background which is now no longer visible:

Python Website HTML to PDF Conversion Transparent Background

As a rule of thumb, set the background option to "false" for conversion of documents like invoices, package lists, time sheets, set it to "true" for web pages or sites.

The header-parameter - writing PDFs to a file (Python)

With the storeExternal option set to "false", the PDF is returned immediately as a base64 encoded string. Per default, this string is preceded by a header that describes the media type (or MIME) of the string.

Sample payload block:

# Python 3

...

payload = '{\
  "url": "https://mcapi.io/html2pdf/templates/invoice.html",\
  "format": "A4",\
  "storeExternal": "false"\
}'

...

The result will look like this:

{
  "service": "McAPI HTML 2 PDF, https://mcapi.io",
  "version": "V1",
  "pdf": "data:application/pdf;base64,JVBERi0 ... JUVPRg=="
}

You can now directly set the "pdf"-string as the data property of an HTML object element like in this example:

# Python 3

...

import json

...

if response.status_code == 200:
  # In real life you would put the JSON parser in a try/except block 
  pdf = json.loads(response.text)['pdf']
  print('<object data="' + pdf + '"/>')
else:
  print("Error")

...

The MIME header will make sure that the PDF data is interpreted correctly by the browser (Note that not all browsers support the embedding of PDF files with the object tag, see this discussion on Stackoverflow.)

However, when writing the PDF data to a file, including the header would result in an invalid PDF document. To create a PDF without the header, set the header-parameter to "false", like so:

# Python 3

...

payload = '{\
  "url": "https://mcapi.io/html2pdf/templates/invoice.html",\
  "format": "A4",\
  "header": "false",\
  "storeExternal": "false"\
}'

...

The returned PDF data without header:

{
  "service": "McAPI HTML 2 PDF, https://mcapi.io",
  "version": "V1",
  "pdf": "JVBERi0 ... JUVPRg=="
}

Now, all we have to do is decode the base64 string with the PDF data and then write the binary PDF to a file. Shown here as a complete Python 3 program:

# Python 3

import requests
import json
import base64

url = 'https://mcapi-html-2-pdf.p.rapidapi.com/'

payload = '{\
  "url": "https://mcapi.io/html2pdf/templates/invoice.html",\
  "format": "A4",\
  "header": "false",\
  "storeExternal": "false"\
}'
headers = {
  'content-type': 'application/json',
  'x-rapidapi-key': 'YOUR_API_KEY',
  'x-rapidapi-host': 'mcapi-html-2-pdf.p.rapidapi.com'
}

response = requests.request('POST', url, data=payload.encode('utf-8'), headers=headers)

if response.status_code == 200:
  # In real life you would put the JSON parser in a try/except block 
  pdf = json.loads(response.text)['pdf']
  pdfData = base64.b64decode(pdf)
  with open('invoice.pdf', 'wb') as f:
    f.write(pdfData)
else:
  print("Error")

Back to McAPI HTML to PDF API main page.