DEV Community

ryanrosello-og
ryanrosello-og

Posted on

Validate complicated graphics rich pdf documents using Playwright

This article will take your pdf verification skills to the next level using Playwright.

The pdf document we will use for this example will be the "Tesla Powerwall 2 Datasheet". This file is hosted at following location https://oedtrngbj.wpengine.com/wp-content/uploads/Powerwall_2_AC_Datasheet_EN_NA.pdf

Observe that this pdf document contains multiple pages, some illustrations and a table.

complex pdf

Caveats

Let me be upfront with the following issues I have encountered with this approach:

  • This solution only seems to work when the test is run using chromium in headed mode
  • The elements contained within pdf viewer component are not accessible to Playwright, this means you will not be able to mask/hide dynamic elements in the pdf. For example, customer id or dynamic date/time stamps
  • We are using Playwrights' built in visual comparison library. It is advisable that you get familiar with the maintenance required to keep the baseline images up to date. See the Visual comparisons page on the Playwright documentation

If you are happy with these compromises, read on!

page.setContent() + toMatchSnapshot() = 🤩

Using the setContent(), load the pdf into an iframe like so:

  const pdfResource =
    'https://oedtrngbj.wpengine.com/wp-content/uploads/Powerwall_2_AC_Datasheet_EN_NA.pdf';
  let iframe = `<iframe src="${pdfResource}#zoom=60" style="width: 100%;height:100%;border: none;"></iframe>`;
  await page.setContent(iframe);
  await page.waitForTimeout(5000);
Enter fullscreen mode Exit fullscreen mode

NOTE: You may need to experiment with the zoom level, width, height attributes to suit your needs

ANOTHER NOTE: The waitForTimeout function is used here to wait for the pdf contents to be loaded into the iframe

We will make use of Playwrights' assertion expect(screenshot).toMatchSnapshot(name[, options]) => https://playwright.dev/docs/test-assertions#screenshot-assertions-to-match-snapshot-1, to capture a screenshot of a particular element matching a locator, in our case - we will need to take a screenshot of the iframe above with the PDF file fully loaded to particular page.

Our solution will make use of this function:

  expect(await page.locator('iframe').screenshot()).toMatchSnapshot();
Enter fullscreen mode Exit fullscreen mode

The completed test will look like this ...

import { test, expect } from '@playwright/test';

test('validate a complex pdf', async ({ page }) => {
  const pdfResource =
    'https://oedtrngbj.wpengine.com/wp-content/uploads/Powerwall_2_AC_Datasheet_EN_NA.pdf';
  let iframe = `<iframe src="${pdfResource}#zoom=60" style="width: 100%;height:100%;border: none;"></iframe>`;
  await page.setContent(iframe);
  await page.waitForTimeout(5000);
  expect(await page.locator('iframe').screenshot()).toMatchSnapshot();
});
Enter fullscreen mode Exit fullscreen mode

Run the test. It should fail with the following error.

Initial attempt should fail

This is fine

Playwright has not found a golden snapshot of the element and hence on the very first test execution, it will automatically generate this file for you. You will need to commit these files into your repo.

Golden files

Rerun the test again, this time it should pass since it will already have a baseline image to compare against.

Image description

Ok, that's nice. We managed to validate the first page of the pdf.

But most pdfs you will encounter out in the wild will contain multiple pages. Let us ammend our test to cater for multiple pages.

test('validate a complex pdf II, all pages', async ({page}) => {
  const numberOfPages = 2;
  for (let i = 1; i < numberOfPages + 1; i += 1) {
    let pdfResource =
      'https://oedtrngbj.wpengine.com/wp-content/uploads/Powerwall_2_AC_Datasheet_EN_NA.pdf';
    let iframe = `<iframe src="${pdfResource}#zoom=60&page=${i}" style="width: 100%;height:100%;border: none;"></iframe>`;
    await page.setContent(iframe);
    await page.waitForTimeout(5000);
    expect(await page.locator('iframe').screenshot()).toMatchSnapshot({
      name: `pdf_validation_page_${i}.png`,
    });
  }
});
Enter fullscreen mode Exit fullscreen mode

Multiple pages

Job done.

Other improvements for you to consider

(all totally optional and it entirely up to you to implement)

  • Dynamically determine the number of pages, the example above uses a predefined value for the expected number of pages.
  • Remove the hard coded waitForTimeout and implement a better way of waiting for the contents to be loaded.

Final solution

Top comments (0)