McAPI - HTML to PDF Converter API with Ruby

Ruby sample code to convert websites and plain HTML to a PDF document with the McAPI HTML to PDF Converter REST API service. The samples use Ruby 2.6 but the code will also work with Ruby 3. The samples require the packages uri, net/http, json and openssl which should all be available with a standard Ruby installation. If not, simply install them via GEM.

Requirements: A free RapidAPI account. Replace YOUR_API_KEY in the code below with your RapidAPI key.

All samples below work with the free tier of the API, see the McAPI HTML to PDF API Listing for available plans.

See the overview page for a reference that lists all available parameters and error codes.

Convert business documents like invoices, package lists, delivery notes from HTML to PDF with Ruby

In the first code snippet we'll convert this invoice from HTML to PDF, a common application of HTML to PDF conversion. We set the paper to "A4" and the storeExternal parameter to "true"; the PDF will then be stored in the cloud and the API returns a downloadable URL (Note that PDFs in the cloud are deleted after 30 days.)

The Ruby source code:

# Ruby

require 'uri'
require 'net/http'
require 'openssl'

url = URI("https://mcapi-html-2-pdf.p.rapidapi.com/")

http = Net::HTTP.new(url.host, url.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE

request = Net::HTTP::Post.new(url)
request["content-type"] = 'application/json'
request["x-rapidapi-key"] = 'YOUR_API_KEY'
request["x-rapidapi-host"] = 'mcapi-html-2-pdf.p.rapidapi.com'
request.body = "{
  \"url\": \"https://mcapi.io/html2pdf/templates/invoice.html\",
  \"format\": \"A4\",
  \"storeExternal\": \"true\"
}"

response = http.request(request)

The PDF's URL will be delivered as a JSON object in the response.read_body, example:

{
  "service": "McAPI HTML 2 PDF, https://mcapi.io",
  "version": "V1",
  "pdf": "https://...pdf"
}

McAPI HTML to PDF API - Converted HTML Invoice as PDF with Ruby

Here's the created PDF from the Ruby call, viewed in the Preview app on a Mac. The PDF is fully indexable and searchable (note the selected and highlighted text):

Image of Converted HTML Invoice to PDF in Ruby

Specifying page formats

The HTML to PDF Converter API comes with a built-in list of standard paper formats, e.g. "Letter" or "Legal". To retrieve a list of all formats use the listFormats option, like so:

# Ruby
  
...

request.body = "{
  \"listFormats\": \"true\"
}"

...

The API will now return a list of all predefined page sizes, shown here as JSON source:

{
  formats: ["Letter", "Legal", "Tabloid", "Ledger", "A0", "A2", "A3", "A4", "A5", "A6"]
}

Selecting one of the predefined formats is simple, just use its name in the call (paper names, like all parameters and options, are case sensitive):

# Ruby
  
...

request.body = "{
  \"url\": \"https://mcapi.io/html2pdf/templates/invoice.html\",
  \"format\": \"Letter\",
  \"storeExternal\": \"true\"
}"

...

Convert a HTML string to PDF with Ruby

All previous examples sent URLs for conversion to the API. You can also add HTML code directly to your call to the API with the html option. We first load the HTML code from a file and then escape the data with to_json so that it can be used safely in a JSON block:

# Ruby

...

require 'json'

f = File.open('invoice.html')
data = f.read()
html = data.to_json
f.close()

... 

Then we set the parameters like so:

# Ruby

...

request.body = "{
  \"html\": html,
  \"format\": \"Letter\",
  \"storeExternal\": \"true\"
}"

...

The rest of the call to the API remains unchanged. The API will now render the HTML string into a PDF and return it to the caller.

Note that relative references or links in the HTML string will not resolve. Example for an image that will not be loaded:

<img src="../templates/logo.png"/>

All references and links in your HTML must be absolute and point to valid web locations, here's an example:

<img src="https://mcapi.io/html2pdf/templates/logo.png"/>

Make sure to see the section in the API overview for more on this.

With your call you can also add some HTML to define header and footer sections for the PDF. Those sections can display page number, print date and other static and dynamic data. The API overview has more on this and provides some sample HTML.

Cookie consent banners and ad blocking with Ruby

If so desired, the API can also automatically click the "Accept" button on GDPR / DSGVO cookie consent banners (Note that this feature is currently experimental, discussion.)

Consider the CNBC website which we convert to a PDF with this Ruby snippet:

# Ruby
  
...

request.body = "{
  \"url\": \"https://cnbc.com\",
  \"format\": \"A4\",
  \"background\": \"true\",
  \"orientation\": 1,
  \"storeExternal\": \"true\"
}"

...

Screenshot from the captured PDF (link to PDF), note the large cookie consent banner:

Ruby Website HTML to PDF Conversion with Cookie Banner

Set the cookie option in the request to "true" to get a PDF without the banner:

# Ruby
  
...

request.body = "{
  \"url\": \"https://cnbc.com\",
  \"format\": \"A4\",
  \"background\": \"true\",
  \"orientation\": 1,
  \"cookie\": \"true\",
  \"storeExternal\": \"true\"
}"

...

The CNBC site without the cookie message but now with ads (link to PDF):

Ruby Website HTML to PDF Conversion with Ad.png

Blocking ads on a website before conversion to a PDF with Ruby

While there are some use cases to include ads when converting a website (for example to document ad placement or to check ad rotation), in many cases you'll prefer the PDFs without any ads. The API comes with a built-in ad blocker, activate it with the adblock option:

# Ruby
  
...

request.body = "{
  \"url\": \"https://cnbc.com\",
  \"format\": \"A4\",
  \"background\": \"true\",
  \"orientation\": 1,
  \"cookie\": \"true\",
  \"adblock\": \"true\",  
  \"storeExternal\": \"true\"
}"

...

The Site without cookie banner and without ads (link to PDF):

Ruby Website HTML to PDF Conversion no Cookies no Ads

For the conversion of the CNBC site we had set the orientation to "1" for landscape (default is "0", portrait) and the background param to "true". This is a sensible option for converting websites to PDF because they often have inverted text and similar styling.

Here's the CNBC site with background set to "false". The site's design features lots of white text against blue or black background which is now no longer visible:

Ruby Website HTML to PDF Conversion Transparent Background

As a rule of thumb, set the background option to "false" for conversion of documents like invoices, package lists, time sheets, set it to "true" when converting web pages or sites to a PDF.

The header-parameter - writing PDFs to a file with Ruby

With the storeExternal option set to "false", the PDF is returned immediately as a base64 encoded string. Per default, this string has a MIME header prefix that describes the media type of the encoded content.

Sample request block:

# Ruby

...

request.body = "{
  \"url\": \"https://mcapi.io/html2pdf/templates/invoice.html\",
  \"format\": \"A4\",
  \"storeExternal\": \"false\"
}"

...

The returned response with the MIME prefix at the beginning of the PDF string:

{
  "service": "McAPI HTML 2 PDF, https://mcapi.io",
  "version": "V1",
  "pdf": "data:application/pdf;base64,JVBERi0 ... JUVPRg=="
}

You can now directly set the "pdf"-string as the data property of an object tag like in this example:

# Ruby

require json

...

if response.code == "200"
  # In production code you would wrap the parser in begin/rescue/end
  pdf = JSON.parse(response.read_body)["pdf"]
  puts('<object data="' + pdf + '"/>')
else
  puts("Error")
end

...

The MIME header will make sure that the PDF data is interpreted correctly by the browser (Note that not all browsers support the embedding of PDF files with the object tag, see this thread on Stackoverflow.)

However, when writing the PDF data to a file, including the MIME header would result in an invalid PDF document as the header is not part of the PDF file format specification. To have the API convert your HTML content to a PDF without the header, set the header-parameter to "false", like so:

# Ruby

...

request.body = "{
  \"url\": \"https://mcapi.io/html2pdf/templates/invoice.html\",
  \"format\": \"A4\",
  \"storeExternal\": \"false\",
  \"header\": \"false\"
}"

...

The API will now return the PDF data without the MIME header:

{
  "service": "McAPI HTML 2 PDF, https://mcapi.io",
  "version": "V1",
  "pdf": "JVBERi0 ... JUVPRg=="
}

All that is left now is decoding the base64 string to get the binary PDF data and then write this data to a file. Shown here as a complete Ruby program:

# Ruby

require 'uri'
require 'net/http'
require 'openssl'
require 'json'
require 'base64'

url = URI("https://mcapi-html-2-pdf.p.rapidapi.com/")

http = Net::HTTP.new(url.host, url.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE

request = Net::HTTP::Post.new(url)
request["content-type"] = 'application/json'
request["x-rapidapi-key"] = 'YOUR_API_KEY'
request["x-rapidapi-host"] = 'mcapi-html-2-pdf.p.rapidapi.com'
request.body = "{
  \"url\": \"https://mcapi.io/html2pdf/templates/invoice.html\",
  \"format\": \"A4\",
  \"storeExternal\": \"false\",
  \"header\": \"false\"
}"

response = http.request(request)
if response.code == "200"
  # In production code you would wrap the parser in begin/rescue/end
  pdf = JSON.parse(response.read_body)["pdf"]
  pdfData = Base64.decode64(pdf)
  File.open("invoice.pdf", "wb") do |f|
    f.write(pdfData)
  end
else
  puts("Error")
end

Back to McAPI HTML to PDF API main page.