-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Description
Hello,
I am running parallelScanAsStream to dump a whole DynamoDB table to S3. It is awfully slow though, slower than running sequential scan. Running it locally I am getting results like:
Config: {"concurrency":16,"chunkSize":1000,"highWaterMark":1000}
Total items: 167744, Duration: 92.73s
Throughput: 1808.91 items/second
Config: {"concurrency":32,"chunkSize":2000,"highWaterMark":2000}
Total items: 167744, Duration: 108.71s
Throughput: 1543.01 items/second
Config: {"concurrency":64,"chunkSize":5000,"highWaterMark":5000}
Total items: 167744, Duration: 109.73s
Throughput: 1528.67 items/second
My code looks the following:
const agent = new https.Agent({
maxSockets: 100
});
const dynamodbClient = new DynamoDBClient({
requestHandler: new NodeHttpHandler({
httpsAgent: agent
})
});
const stream = await parallelScanAsStream(
{ TableName: event.tableName },
{
concurrency: 100,
chunkSize: 200,
client: dynamodbClient
}
);
for await (const items of stream) {
const records = items.map((item: any) => {
return {
Data: Buffer.from(
JSON.stringify(item,
(_, v) => (typeof v === 'bigint' ? v.toString() : v)
) + '\n'
)
};
});
await aysncProcessingFunction(records);
}
}
The asyncProcessing function itself is not the problem.
Does anyone see something obvious that I am doing wrong? Or can someone provide me some examples?
Metadata
Metadata
Assignees
Labels
No labels