How to create an AWS Lambda/API gateway python function that takes a pdf file as input using multipart/form-data?

I have been struggling with this for a while now. I need to create a resource in API gateway linking to a lambda function that takes a pdf file as input sent as a multipart/form-data POST request. To make it simple, I am just returning the file for now.

When I try to call the API with the following curl, I get Internal server error from AWS. Did anyone ever succeeded to send a pdf file to Lambda without having to use the S3 trick (upload to S3)?

Thank you all in advance for any hint.

Commands/Files:

curl

curl -vvv -X POST -H "Content-Type: multipart/form-data" -F "[email protected]" https://...MYAPIHERE.../pdf

I am currently using serverless and python3.

Below are my files:

Servelerlss.yaml

function:
  pdf:
    handler: handler.pdf
    events:
      - http:
          path: /pdf
          method: post 
          integration: lambda
          request:
            template:
              application/json: "$input.json('$')"
          response:
            headers:
              Content-Type: "'aplication/json'"

handler.py

def pdf(event, context):
    pdf = event.get('content')
    out = {'statusCode': 200,
           'isBase64Encoded': False,
           'headers': {"content-type": "application/json"},
           'body': json.dumps({
               'input':  pdf,
               'inputType': 'url',
               #'tags': list(tags.keys()),
               'error': None})}
    return(out)

Answers 1

  • I finally managed to solve this after a lot of google and with help of the AWS support team.

    It turns out that API gateway checks the headers: "Content-Type" or "Accept" in the incoming request and matches it with the settings of Binary Media Type to decide which payload is considered as binary. That means we need to specify two content types (multipart/form-data, application/pdf) as Binary media type.

    It is possible to do this using serveless by using serverless-apigw-binary and adding these to serverless.yaml:

    plugins:
      - serverless-apigw-binary 
    
    custom:
      apigwBinary:
        types:           #list of mime-types
          - 'multipart/form-data'
          - 'application/pdf'
    

    But since lambda expects the payload in application/json format from the API gateway, the binary data cannot be passed directly. Therefore the settings for ContentHandling should be set to “CONVERT_TO_TEXT”. In the yaml file this translates into:

    contentHandling: CONVERT_TO_TEXT
    

    The final catch was solved by Kris Gohlson at serverless-thumbnail. Thank you for that Kris. I just wonder how did you come up with that...


    Serverless.yaml

    plugins:
      - serverless-apigw-binary 
    
    custom:
      apigwBinary:
        types:           #list of mime-types
          - 'multipart/form-data'
          - 'application/pdf'
    
    function:
      pdf:
        handler: handler.pdf
        events:
          - http:
              path: /pdf
              method: post 
              integration: lambda
              request:
                contentHandling: CONVERT_TO_TEXT
                passThrough: WHEN_NO_TEMPLATES
                template:
                  application/pdf: "{'body': $input.json('$')}"
                  multipart/form-data: "{'body': $input.json('$')}"
              response:
                contentHandling: CONVERT_TO_BINARY
                headers:
                  Content-Type: "'aplication/json'"
    

Related Articles