Timelapse Video with a Web Camera

I’m working on an ongoing project in which I am trying to use a web camera to produce time lapse video. Usually, I’ve used DSLRs for this purpose. But doing so also removes that camera from being available for other needs for up to 20 days. I have a lot of low power PCs that I could repurpose for this need. Since I use many of my computers over SSH or Remote Desktop, I could assign a computer to a project and still have it available for other needs.

There are a couple of approaches that I could use to do this. I initially tried to use Windows Media Foundation APIs. Using those, one can get access to a stream from the camera and write it to a file in the format of their choice. This worked, but I decided to not stay on that path since I sometimes ran into conflicts with the media formats that a source could provide frames in and the format in which I wanted the results saved. This could be fixed by adding some transformations between a source and destination file. But I decided to do something simpler.

I am capturing images from the cameras and saving those images to a drive. The software I prefer to use for editing videos, Davinci Resolve, can import sequences of images as a video without a fuss. As of now I have a minimally viable solution for capturing the photos for a timelapse. If you want to try it out, I have a signed binary available for download.

What about Multiple Cameras?

I thought about some options on what to do if a computer has multiple video sources on it. One of my home desktops is connected to multiple video capture devices (a couple of web cams, an HDMI capture card, and occasionally another device that presents a part of its functionality as a web cam). Rather than deal with the complexities of having a user identify a camera from the command line, I decided to just take photos from all of them. When the program starts, it enumerates the cameras. When the time to take a photo comes, a photo is taken from each camera. The image files incorporate the name of the camera, the date/time from at which the capture session was started, and the image file is appended with a number

The information that I must track on the camera is kept in the following structure.

struct Camera
{
    std::wstring     friendlyName;
    std::wstring     safeName;      // sanitized for filenames
    Microsoft::WRL::ComPtr<IMFSourceReader> reader  = nullptr;
    UINT32           width   = 0;
    UINT32           height  = 0;
    LONG             stride  = 0;   // negative = bottom-up

    Camera() = default;
    Camera(const Camera&) = delete;
    Camera& operator=(const Camera&) = delete;

    ~Camera()
    {
        reader = nullptr;
    }
};

To enumerate the cameras, we must create a properties object that describes the type of device to which we seek access. Media Foundation devices could also be audio-only devices. We don’t want those. We specify that we want a video capable device. The properties object is passed to a call to MFDeviceSources along with a pointer that the function can assign and a numerical field that the call will populate with the number of devices found.

Microsoft::WRL::ComPtr<IMFAttributes> pAttrs = nullptr;
HRESULT hr = MFCreateAttributes(&pAttrs, 1);
if (FAILED(hr)) return cameras;

hr = pAttrs->SetGUID(MF_DEVSOURCE_ATTRIBUTE_SOURCE_TYPE,
                    MF_DEVSOURCE_ATTRIBUTE_SOURCE_TYPE_VIDCAP_GUID);

IMFActivate** ppDevices = nullptr;
UINT32 count = 0;

if (SUCCEEDED(hr))
hr = MFEnumDeviceSources(pAttrs.Get(), &ppDevices, &count);
if (FAILED(hr) || count == 0)
{
    if (ppDevices) CoTaskMemFree(ppDevices);
    return cameras;
}

Once we have an array of cameras, we can examine more information about the cameras, capture frames from them, and perform other operations.

Why C++?

I’m making this using C++ because a C-language gives me direct access to the APIs that I need. I love C#, but I would have to make a lot of declarations to get access to the Win32 APIs. This may be possible to make in NodeJS or Electron, but once again I would need to either go on the hunt to find a library that gives me access to what I need or make my own.It is easier to just use the APIs directly.

Settings/Arguments

There are a couple of arguments that are mandatory for invoking the program. Those are --output and --delay. The --output argument specifies a file path in which the files will be deposited. The --delay argument specifies how many seconds to wait against each image captured. Optionally, a --count argument can be provided to limit the number of frames that are taken. At any point, a user can bring the capture session to an end by pressing CTRL-C.

Capturing the Image

Frames are provided to us through COM pointers. The camera, which is accessed through an object that implements a IMFSourceReader interface, provides access to a function named ReadSample, which returns an object that implement the interface IMFSample. Given a sample, we use ConvertToContinuousBuffer to get the image data.

DWORD    streamIndex = 0, flags = 0;
LONGLONG timestamp   = 0;
Microsoft::WRL::ComPtr<IMFSample> pSample{};

HRESULT hr = cam.reader->ReadSample(
    (DWORD)MF_SOURCE_READER_FIRST_VIDEO_STREAM,
    0, &streamIndex, &flags, &timestamp, &pSample);

if (FAILED(hr) || !pSample)
{
    return false;
}
Microsoft::WRL::ComPtr<IMFMediaBuffer> pBuf = nullptr;
hr = pSample->ConvertToContiguousBuffer(&pBuf);

Before we can read data from the buffer, we need to lock it for reading.

    hr = pBuf->Lock(&data, nullptr, &curLen);

    if (SUCCEEDED(hr))
    {
        UINT32 absStride = static_cast<UINT32>(std::abs(cam.stride));
        bool   bottomUp  = (cam.stride < 0);

        // Normalise to top-down BGRA.
        std::vector<BYTE> topDown(absStride * cam.height);
        for (UINT32 row = 0; row < cam.height; ++row)
        {
            UINT32 srcRow = bottomUp ? (cam.height - 1u - row) : row;
            memcpy(topDown.data() + row * absStride,
                    data           + srcRow * absStride,
                    absStride);
        }
}

Saving the Image

After a Windows Media Foundation capture, I have an array of the pixel data in memory. This must be written to a file. As has been the case for a lot of images processing, I’ve done over the past several months, the Windows Imaging Component (WIC) has been my go-to API solution for converting image data to image files. My image data is in BGRA format (Blue, Green, Red, Alpha at 8-bits per channel). WIC provides functionality through a COM interface.

static HRESULT SaveJpeg(
    const BYTE*        pixels,  // top-down, row-major, BGRA
    UINT32             width,
    UINT32             height,
    UINT32             rowBytes,
    const std::wstring& path)
{
    Microsoft::WRL::ComPtr<IWICImagingFactory>    pFactory    = nullptr;
    Microsoft::WRL::ComPtr<IWICBitmap>            pBitmap     = nullptr;
    Microsoft::WRL::ComPtr<IWICStream>            pStream     = nullptr;
    Microsoft::WRL::ComPtr<IWICBitmapEncoder>     pEncoder    = nullptr;
    Microsoft::WRL::ComPtr<IWICBitmapFrameEncode> pFrame      = nullptr;
    Microsoft::WRL::ComPtr<IPropertyBag2>         pProps      = nullptr;

    HRESULT hr = CoCreateInstance(
        CLSID_WICImagingFactory, nullptr, CLSCTX_INPROC_SERVER,
        IID_PPV_ARGS(&pFactory));

    if (SUCCEEDED(hr))
        hr = pFactory->CreateBitmapFromMemory(
            width, height,
            GUID_WICPixelFormat32bppBGRA,
            rowBytes, rowBytes * height,
            const_cast<BYTE*>(pixels),
            &pBitmap);

    if (SUCCEEDED(hr)) hr = pFactory->CreateStream(&pStream);
    if (SUCCEEDED(hr)) hr = pStream->InitializeFromFilename(path.c_str(), GENERIC_WRITE);
    if (SUCCEEDED(hr)) hr = pFactory->CreateEncoder(GUID_ContainerFormatJpeg, nullptr, &pEncoder);
    if (SUCCEEDED(hr)) hr = pEncoder->Initialize(pStream.Get(), WICBitmapEncoderNoCache);
    if (SUCCEEDED(hr)) hr = pEncoder->CreateNewFrame(&pFrame, &pProps);

    if (SUCCEEDED(hr))
    {
        // Set JPEG quality to 92%.
        PROPBAG2 opt{};
        opt.pstrName = const_cast<LPOLESTR>(L"ImageQuality");
        VARIANT v{};
        v.vt    = VT_R4;
        v.fltVal = 0.92f;
        pProps->Write(1, &opt, &v);

        hr = pFrame->Initialize(pProps.Get());
    }

    if (SUCCEEDED(hr)) hr = pFrame->SetSize(width, height);

    if (SUCCEEDED(hr))
    {
        WICPixelFormatGUID fmt = GUID_WICPixelFormat32bppBGRA;
        hr = pFrame->SetPixelFormat(&fmt);
    }

    if (SUCCEEDED(hr)) hr = pFrame->WriteSource(pBitmap.Get(), nullptr);
    if (SUCCEEDED(hr)) hr = pFrame->Commit();
    if (SUCCEEDED(hr)) hr = pEncoder->Commit();

    return hr;
}

Trying the Application Out

If you want to try the application yourself, you can download it from here. This is a signed executable. Note that this is a work in progress.


Posts may contain products with affiliate links. When you make purchases using these links, we receive a small commission at no extra cost to you. Thank you for your support.

Mastodon: @j2inet@masto.ai
Instagram: @j2inet
Facebook: @j2inet
YouTube: @j2inet
Telegram: j2inet
Bluesky: @j2i.net

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.