Operations of Multipart Upload in S3

Posted on Dec 1, 2023

MULTIPART UPLOAD – INTRODUCTION

Multipart upload is one of the most important operations in S3 protocol which is very efficient to upload the parts of the files separately and let the S3 combine those parts to a single object.

Many S3 supported software, such as S3Browser can handle such situations when there is a need to upload of a large file, which is usually more than 10MB. In such situations, the software splits the file to into many parts, and upload these parts according to the configured setting for simultaneous uploads.

Until now, we are only thinking about uploading the parts, but why we are not uploading the file at once instead of getting into a complex operations of multipart upload in s3.

MULTIPART UPLOAD – LOGIC

There is a very simple logic in this. Reduce the risk of the upload process. Keeping the upload long can become risky and open to encounter timeout issues. If you would like to upload a file, for instance, an ISO file around 4GB, it might be possible that a timeout can occur and the upload breaks. This has two outcomes, the most annoying one is that you lose time depending on your network bandwidth, and another is this usually costs for the amount of the data you upload and more additional costs. Consider the same file, but this time it will be broken into 512 parts with 8MB each. Even this way, the upload process might hit timeouts but the amount of data and time will be less than the previous one. This makes the practice in multipart upload be more efficient than the direct upload for large files.

OPERATIONS OF MULTIPART UPLOAD – STEPS

OK, hereafter, lets focus on the operation of multipart upload in S3. We need to understand the phases of accomplishing multipart upload.

Create a multipart upload request. This is done basically to define where the object key will be located after multipart upload completes. You will need to keep the upload id to use in the next steps Upload every part using the upload id from the first step. After requesting each part upload, you will need to keep the ETag and the Part Number. After all parts are uploaded, you will form a file in a json form using all ETag and Part Number Complete the multipart upload Using the prepared file from step-2, the complete multipart upload will be requested.

OPERATIONS OF MULTIPART UPLOAD – EXAMPLES

OPERATIONS OF MULTIPART UPLOAD – PREPARE PARTS

$ for x in `seq 5`; do echo $x > file_$x; done
$ ls -ltr file_*
file_1 file_2 file_3 file_4 file_5
$ cat file_1
1
$ cat file_2
2
$ cat file_3
3
$ cat file_4
4
$ cat file_5
5
$ cat file*
1
2
3
4
5

Each file part will be uploaded from 1 to 5 in an increasing order, and will be forming a single file which has all numbers from 1 to 5 in its contents.

OPERATIONS OF MULTIPART UPLOAD – STEP-1

$ aws s3api create-multipart-upload --bucket emre-de --key 'mp/01'
{
    "Bucket": "emre-de",
    "Key": "mp/01",
    "UploadId": "NjcyZjg4MDg4MTcwMTM1Nzg2ODg1MA"
}

!!! Note that UploadId will be used in the next steps, and the upload file will be stored in mp/01 object.

OPERATIONS OF MULTIPART UPLOAD – STEP-2

$ aws s3api upload-part --bucket emre-de --key 'mp/01' --part-number 1 --body file_1 --upload-id "NjcyZjg4MDg4MTcwMTM1Nzg2ODg1MA"
{
    "ETag": "b026324c6904b2a9cb4b88d6d61c81d1"
}
$ aws s3api upload-part --bucket emre-de --key 'mp/01' --part-number 2 --body file_2 --upload-id "NjcyZjg4MDg4MTcwMTM1Nzg2ODg1MA"
{
    "ETag": "26ab0db90d72e28ad0ba1e22ee510510"
}
$ aws s3api upload-part --bucket emre-de --key 'mp/01' --part-number 3 --body file_3 --upload-id "NjcyZjg4MDg4MTcwMTM1Nzg2ODg1MA"
{
    "ETag": "6d7fce9fee471194aa8b5b6e47267f03"
}
$ aws s3api upload-part --bucket emre-de --key 'mp/01' --part-number 4 --body file_4 --upload-id "NjcyZjg4MDg4MTcwMTM1Nzg2ODg1MA"
{
    "ETag": "48a24b70a0b376535542b996af517398"
}
$ aws s3api upload-part --bucket emre-de --key 'mp/01' --part-number 5 --body file_5 --upload-id "NjcyZjg4MDg4MTcwMTM1Nzg2ODg1MA"
{
    "ETag": "1dcca23355272056f04fe8bf20edfce0"
}

Using the above output with given part number and etag we will form a mpustruct file which will be in json notation.

$ cat mpustruct
{
"Parts": [
        {
            "ETag": "b026324c6904b2a9cb4b88d6d61c81d1",
            "PartNumber": 1
        },
        {
            "ETag": "26ab0db90d72e28ad0ba1e22ee510510",
            "PartNumber": 2
        },
        {
            "ETag": "6d7fce9fee471194aa8b5b6e47267f03",
            "PartNumber": 3
        },
        {
            "ETag": "48a24b70a0b376535542b996af517398",
            "PartNumber": 4
        },
        {
            "ETag": "1dcca23355272056f04fe8bf20edfce0",
            "PartNumber": 5
        }
        ]
}

OPERATIONS OF MULTIPART UPLOAD – STEP-3

$ aws s3api complete-multipart-upload --multipart-upload file://mpustruct --bucket emre-de --key 'mp/01' --upload-id "NjcyZjg4MDg4MTcwMTM1Nzg2ODg1MA"
{
    "Location": "emre-de/mp/01",
    "Bucket": "emre-de",
    "Key": "mp/01",
    "ETag": "3bc5ebe4c3ca4f4ba95deff07def6af2-5"
}

To conclude, now we understand the benefits of using multipart uploads and to perform the operations of it. There will be one another topic about not creating but aborting, which this article does not touch.