The last few months I have been quite busy working on the recently launched OneFrameOfFame.com. In this blogpost I’ll explain the architecture that runs the site.
The site OneFrameOfFame.com might look simple, from the front. It’s basically 2 different pages, and even those are really similar. However, the backend is quite complex: Each frame is handled by at least 10 different machines, likely on 3 continents, of different services, before it ends up in the clip.
When a user starts his webcam to contribute a frame, a request is sent to our server. That takes the next queued frame number at Amazon’s Simple Queue services (SQS), and sends it to the user. When he completes the frame, the resulting image is sent back. It’s then scaled to 3 different resolutions, and stored at Amazon’s S3, and removed from the SQS queue. The image is added to the recent frames at our homepage. It’s also sent to Crowdflower, who creates a new job at Amazon Mechanical Turk. At least 3 different workers rate the image, check for nudity, and tell us how well the user did in mimicking the original frame.
Once enough workers have rated the image, a signal is sent back to our server (via a webhook). Every hour our server collects all judgements. Then the the render server checks if there are new, moderated images. If that’s the case it pulls them from S3 (and ads them to a local cache, so they will only be downloaded to the render server once). It then generates a new clip, which it sends to Blip.TV, which is embedded on our frontpage. Blip was the only video service we found, where it’s possible to replace a video via the API.
The main webpage is hosted at Mediatemple, which (is supposed to) run on grid hosting, enabling a lot of traffic without breaking down.
Without all these scaling services (especially the ones provided by Amazon), this whole project probably wouldn’t have been feasible with our limited budget. So go shoot your One Frame Of Fame!
I’ve considered using Cloudfront to distribute the images. Since they’r already stored in S3, using it would only be adding a setting (and a DNS entry). However, since the frames shown are highly dynamic (by default we only show the last 10 frames), caching isn’t that useful, as the requested images will change often.
Also, because we have a lot of files, on average not being requested too often, the origin fetches would be significant (both in terms of costs, and in speed). So we stayed with S3, which is fast enough (we use S3 Europe, since most of our target audience is in Europe).