Paper Notes: Finding a needle in Haystack – Facebook’s photo storage

pitchfork on brown dried grass

Photo storage is one of the key functionalities of any social media platform & Facebook is no exception to this. But the scale at which Facebook operates, retrieving a photo from the storage can be analogous to finding a needle in haystack. At the time of this paper Facebook was storing 260 billion images with growth of around 1 billion new photos each week. So now whenever a user asks for a specific photo, a cost-effective technique is required to retrieve the photo with minimal latency & this has led to creation of Haystack.

Existing storage system

Facebook generates & stores 4 different sized images for each photo uploaded on the social media platform. To store & retrieve these images, Facebook built an object store known as Haystack. But before jumping into the internals of Haystack let us try to understand what led to its creation & why the existing solution failed at scale.

Initially they used a network attached storage build on top of a NFS. But that results in around 3 disk operations to retrieve a photo i.e. finding the inode that maps to given filename, reading the inode from disk & then finally reading the actual image file from disk. This degrades the throughput & latency for retrieving the photos which leads to unpleasant user-experience. Even using CDN didn’t resolved for latency as CDNs only take care of hot keys whereas a social media application should be prepared to serve older contents to the user which has not been queried in long time.

The solution that replaces the existing system needed to decrease the latency for retrieval & at the same time should remain fault-tolerant, cost-effective & simple enough to reason about as it is facilitating such a core functionality of social media application. Haystack ends up checking all the boxes & that is why Facebook built it to replace their existing photo serving system.

Haystack Design

With Haystack Facebook makes use of CDN to serve popular images & haystack helps in retrieving older content using lower number of disk seeks & reducing the memory spent in storing filesystem metadata. Haystack architecture consists of 3 major components.

  • Haystack Store: Encapsulates the storage & manages filesystem level metadata for photos.
  • Haystack Directory: Maintains mapping of physical storage volumes to logical volume.
  • Haystack Cache: Acts as internal CDN & avoids unnecessary calls to Haystack store

Core Components

In this section we will dive deeper into the three Haystack components. Starting with the haystack directory that performs four main operations:

  • Provide mapping from logical to physical volume
  • Balancing load across logical & physical volume
  • Decide whether a request is fulfilled by CDN or cache
  • Identifying read-only logical volumes

Whenever a machine reaches its capacity, it is turned into a read-only volume & they just serve read traffic. Newly provisioned machines are write-enabled machines that support upload(write) operations. If a machine goes down, its mapping is removed from the directory & an entry is added when a new machine is provisioned.

Haystack cache is distributed hash-table with photo-id as the key that serves request both from the CDN & from the user. If the cache does not contain the key then it fetches it from the Haystack store & then return it as part of the response.

For the queried object it stores the image against the key only if the request came from user(not from the CDN) & photo is fetched from a write-enabled store. It is because if the request is sent from the CDN then CDN will anyhow store it & next time the request won’t reach the cache and we also want to reduce the amount of read traffic going to write stores. If the request is fulfilled by a read-only store then it can be directed to the store next time too as it doesn’t impact the write traffic. We also store these images in the cache as this maps to a user behavior where an image most heavily read immediately after the image is uploaded(Verifying if the filter ended up working as we expected :P).

Haystack store exposes a very simple interface for querying photos. Request needs to specify the photo-id along with logical volume & physical machine from which photo needs to be read. If the photo is not found, the store returns an error. All the metadata for a photo is queried initially from Haystack directory.

Store machine consists of physical volumes containing a super-block followed by sequence of needles. Each needle represents a photo & to retrieve the needles store machine keeps an in-memory structure. Once a needle is mapped to the requested key, the store can retrieve the photo in just one disk seek.

Journey of request

Handling failures & Optimizations

Conclusion