Before you start working with Sitecore Search, here are the limitations and issues I learned while working with API crawlers, push API, webhooks, and website crawlers.
Source Limitations
You can have up to 50 sources per domain. The number of documents you can index depends on your entitlement. Usually it is up to 500,000 documents.
If you work with multisite architecture and plan to have more sites, this becomes a problem. You cannot have a website crawler for each site. You need to combine groups of sites into one source.
Search ID Considerations
In sandbox environments, if you have dev and stage environments and you sync items between them, they can end up with the same IDs. If you go to content and open one of the IDs, it will always open your first one.
Multi-Locale Considerations
Locales in Sitecore Search are different from Sitecore language codes. You need to create a mapping between them.
API Push Limitations
There is no way to recrawl with API push. Only the ingestion API can work with it. This prevents accidental huge recrawl processes.
You need to handle all operations and languages manually. In Sitecore Search Analytics overview, you can monitor all job requests and see exactly what you sent, including the whole JSON of the ingestion API call.
Ingestion API Specification
You can download the ingestion API specification from the Sitecore documentation.
With the ingestion API, there is no batch update. For each update or delete on item level, you need to send one call to the ingestion API. Sitecore Search combines multiple ingestion calls into batches.
You can use the ingestion API for website crawler and API crawler only when incremental updates are enabled for the source.
Delete Operations - Ingestion API
Deleting items takes more time than update or create operations.
If you want to trigger an ingestion delete, you can send locale as "all", but the ID should be the same across all locales.
Delete operations will fail if you send locale "all" and one language is missing. The whole delete job will fail. Consider language-by-language ingestions instead of using "all" for delete operations.
Push operations using the PUT method can create an item if it does not exist.
Working with Local API and Ngrok
If you are setting up an API crawler and testing over localhost with ngrok configuration, if it is not triggering, check if you added the required header. When working with a crawler, you need to add an additional header because ngrok blocks the request without it.
Here is the header you need to add:
{
"ngrok-skip-browser-warning": ["true"]
}
API Crawler
You can use a request extractor to trigger pagination. I wrote a step-by-step blog about this here. Your trigger can be of type JavaScript.
In the Sitecore Search Analytics tab, you do not see any ingestion calls when using API crawlers. You see just API URL requests. This makes debugging harder compared to push API where you can see all the requests.
Website Crawler
The website crawler can generate false traffic because it hits the head application. This can affect your analytics and create unnecessary load.
Your sitemap should contain only valid language links. If you have multiple locales and an item is not published in all of them, the crawler will hit a 404 error and fail. When more than 30% of requests fail, the whole index is not updated.
The website crawler can trigger different sitemaps.
One user reported slow running times when the index reaches 150,000 items.
Viewing Source Run Status
In Sitecore Search, you can view the last run status of each source from the Analytics tab, then under Sources select Overview.
Job Status and Batch Failures
In the Sitecore Search dashboard, if the overall job status is "failed", then all the updates in the batch are failed. Metrics and URL details are aggregation of different steps executed in the batch. Even though URL details say that it was "Successfully indexed", it was not successful if the overall job failed. See the Sitecore knowledge base article for more details.
Content Update Delays
Sometimes during workdays, between 11:00 AM and 2:30 PM Sofia time (Eastern European Time), Sitecore Search jobs are successful but content in the content collection is not showing. This might be because of maintenance or because I am testing on a sandbox environment. After this time period, all updates are applied automatically.
Sitecore Webhooks
For publish webhooks, you can set rules but they are not working as of today. You cannot use Sitecore rules to filter which items trigger webhooks. Make sure you handle filtering in your code to determine when to fire the webhook, otherwise it will be triggered on each item publish event. See the Sitecore documentation for details.
Webhooks return item ID, not template ID. You need to make an additional call to get the template information if you need it.
Delete webhooks send more information compared to other webhook types.
You can use your unique URL from webhook.site to test the requests sent by Sitecore webhook event handler.




