Streamlining File Management in Azure Blob Storage for Sitecore

Streamlining File Management in Azure Blob Storage for Sitecore

By admin November 7, 2024

In our recent Sitecore project, we integrated Azure Blob Storage to handle various types of media and data files for the backend, ensuring scalability and security. While this setup provided excellent support for content storage and retrieval, we encountered a need for efficient file management to maintain an optimized storage structure. Specifically, we needed a way to:

  1. Clean up old files from storage that were no longer required.
  2. Move files from one folder to another, adjusting the structure to reflect changes in our content hierarchy.

Here’s a closer look at the challenges we faced and how we addressed them.

The Challenge: Managing Files Without Filenames

In Azure Blob Storage, files are typically stored with a structure that includes both an ID and a filename. However, our use case only provided us with file IDs, not their associated filenames. This made direct identification of files cumbersome, especially when handling batch deletions or moving files across folders.

To solve this, we implemented custom methods that operate on file prefixes (in this case, the file ID) rather than full filenames. These methods enabled us to manage our files in Azure Blob Storage effectively.

Solution Part 1: Deleting Files by Prefix

The first task was to delete files that matched a specific prefix, which allowed us to remove old, unused files without needing the exact filenames. Using the following method, DeleteBlobByPrefixAsync, we were able to search for files by their ID prefix and delete them in bulk.

public static async Task DeleteBlobByPrefixAsync(string prefix, string path)
{
    // Create a BlobServiceClient to connect to the storage account
    BlobServiceClient blobServiceClient = new BlobServiceClient(connectionString);

    // Get a reference to the container
    BlobContainerClient containerClient = blobServiceClient.GetBlobContainerClient(containerName);

    // Define the path and prefix to search within
    string pathPrefix = path + "/" + prefix; // e.g., "video/short-videos/6245387_"

    // List all blobs under the specified path
    await foreach (BlobItem blobItem in containerClient.GetBlobsAsync(prefix: path))
    {
        // Check if the blob name starts with the specified prefix
        if (blobItem.Name.StartsWith(pathPrefix))
        {
            // Get a reference to the blob
            BlobClient blobClient = containerClient.GetBlobClient(blobItem.Name);

            // Delete the blob
            await blobClient.DeleteIfExistsAsync();
            Console.WriteLine($"Deleted blob: {blobItem.Name}");
        }
    }
}

This approach provided a flexible way to remove files based on their ID prefix, helping us streamline storage without unnecessary accumulation of outdated files.

Solution Part 2: Moving Files by Prefix

Our second task involved moving files from one folder to another based on their prefix, reflecting updates to our content organization. With the MoveBlobByPrefixAsync method, we could locate files by their prefix and transfer them seamlessly to a new storage path within Azure Blob Storage.

 public static async Task MoveBlobByPrefixAsync(string prefix, string path, string destinationPath)
 {
     // Create a BlobServiceClient to connect to the storage account
     BlobServiceClient blobServiceClient = new BlobServiceClient(connectionString);

     // Get a reference to the container
     BlobContainerClient containerClient = blobServiceClient.GetBlobContainerClient(containerName);

     // List all blobs under the specified path
     await foreach (BlobItem blobItem in containerClient.GetBlobsAsync(prefix: path))
     {
         // Check if the blob name starts with the specified prefix
         if (blobItem.Name.StartsWith(path + "/" + prefix))
         {
             // Get a reference to the original blob
             BlobClient sourceBlobClient = containerClient.GetBlobClient(blobItem.Name);

             // Define the new blob name (retaining the same name)
             string newBlobName = $"{destinationPath}/{blobItem.Name.Substring(blobItem.Name.LastIndexOf('/') + 1)}"; // Get the original blob name

             // Get a reference to the destination blob
             BlobClient destinationBlobClient = containerClient.GetBlobClient(newBlobName);

             // Copy the blob to the new location
             await destinationBlobClient.StartCopyFromUriAsync(sourceBlobClient.Uri);

             // Optionally, delete the original blob after copying
             await sourceBlobClient.DeleteIfExistsAsync();
             Console.WriteLine($"Moved blob: {blobItem.Name} to {newBlobName}");
         }
     }
 }

This method was instrumental in helping us restructure our storage, making it easier to manage and locate files within our content hierarchy.

Automating the Process with a Console App

To make this process more efficient, I created a console application that reads a CSV file containing a list of IDs and statuses. For each row in the CSV file, the console app performs the corresponding action—either calling DeleteBlobByPrefixAsync to remove files or MoveBlobByPrefixAsync to relocate them based on the status specified in each row. This console app allowed us to process file management tasks in bulk, minimizing manual work and ensuring consistency across our storage environment.

Featured Blogs