Over the years, data has increasingly been regarded as an organization's most vital asset. It's central to everything you use Salesforce for. It's the engine that powers your customer marketing and sales programs.
The size and complexity of Salesforce platform implementations continue to increase as customers migrate business-critical operations. With large data sets with record counts in the 10s and 100s of millions and an ever-increasing number of integration points, data management is front and center in the concerns of Salesforce administrators.
Salesforce uses Bulk API to work with a large volume of data where we use filters & chunks to extract the data when require. But at extremely high volumes — 100s of millions of records — defining these chunks by filtering on field values may not be practical, because the number of rows that are returned may be higher than the selectivity threshold of Salesforce’s query optimizer. The result could be a full table scan and very slow performance, or even failure of the query to complete.
PK Chunking is a feature enabled by Salesforce for its bulk API. PK stands for Primary Key — the object’s record ID — which is always indexed. With this method, customers first query the target table to identify many chunks of records with sequential IDs.You can simply enter a few parameters on your Bulk API job, and the platform will automatically split the query into separate chunks, execute a query for each chunk, and return the data.
If there are 100,000 records in Salesforce and if PK chunking is enabled with chunk size 25,000, then the BULK queries sent to Salesforce would have a where clause based on the ID field/primary key, such that each query fetches a maximum of 25,000 records each. If the normal query is SELECT Name FROM Account, then with PK chunking, the queries would be as follows:
SELECT Name FROM Account WHERE Id >= 001300000000000 AND Id < 00130000000132G
SELECT Name FROM Account WHERE Id >= 00130000000132G AND Id < 00130000000264W
SELECT Name FROM Account WHERE Id >= 00130000000264W AND Id < 00130000000396m
SELECT Name FROM Account WHERE Id >= 00130000000euQ4 AND Id < 00130000000fxSK
When to use PK Chunking
- Salesforce recommends enabling PK chunking for objects more than 10 million records. This also improves performance.
- You can use PK Chunking with most standard objects and all custom objects.
- To enable the feature you specify the header '
Sforce-Enable-PKChunking' on the job request for your Bulk API query.
We can use pk chunking for Data extraction using third-party ETL tools such as Informatica. The below picture shows the Informatica screen where we can enable pk chunking and specify the chunking size.