December 2021 – j2i.net

In the interest of keeping a cleaner file system, I sometimes try to minimize the number of files that I need to keep data organized. A common scenario where this goal is expressed is when writing an application that must sync with some other data source such as a content management system. Having a copy of the files from another system isn’t always sufficient. Sometimes additional data is needed for keeping the computers in a solution in sync. For a given file, I may need to also track an etag, CRC, or information on the purpose of a file. There are some common solutions for organizing this data. One is to have one additional file that contains all of the additional meta data for the files being synced. Another is to make an additional data file for each content file that contains this information. The solution that I prefer isn’t quite either of these. I prefer to have the data within an “alternate stream” of the same file. If the file get’s moved elsewhere on the filesystem, the additional data will move with it.

This is very much a Windows-Only solution. This will only work on the NTFS file system. If you attempt to access an alternative stream on a FAT32 file system, it will fail since that file system does not support them.

I most recently used this system of organization when I inherited a project that was, in my opinion, built with the wrong technology. In all fairness, many features of the application in question were implemented through scope creep. I reimplemented the application in about a couple of days using .Net technologies (this was much easier for me to do since, unlike the original developers, I had the benefit of having a complete requirements). There were a lot of aspects of that project that will be expressed in posts in the coming weeks.

The reason that I call this feature “hidden” is because the Windows UI does not give any visual indicator that a file has an additional data stream. There is no special icon. If you check on the size of a file, it will only show you the size of the main data stream. (In theory, one could add 3 gigs of data to the secondary stream of a 5 byte file and the OS would only report the file as being 5 bytes in size).

I’ll demonstrate accessing this stream from within the .Net Framework. There’s not built-in support for the alternative datastreams, but there is built in support for Windows file handles. Using a P/Invoke you can get a Windows file handle and then pass it to a .Net FileStream object to be used with all the other .Net features.

Every file has a default data stream. This is the stream you would see as being the normal file. It contains the file data that you are usually working with. With our normal concept of files and directories, files contain data and directories contain files, but no data directly. A file can contain any number of alternative data streams. Each one of these streams has a name of your choosing. Directories can have alternative data streams too!

To start experimenting with streams, you only need the command prompt. Open the command prompt and navigate to a directory in which you will place your experimental streams. type the following.

echo This is data for my alternative stream > readme.txt:stream

If you get a directory listing, we find that the files is listed as zero bytes in size.

c:\temp\streams>echo This is data for my alternative stream > readme.txt:stream

c:\temp\streams>dir
 Volume in drive C has no label.
 Volume Serial Number is 46FF-0556

 Directory of c:\temp\streams

10/11/2021  11:22 AM    <DIR>          .
10/11/2021  11:22 AM    <DIR>          ..
10/11/2021  11:22 AM                 0 readme.txt
               1 File(s)              0 bytes
               2 Dir(s)  108,162,564,096 bytes free

c:\temp\streams>

Is the data really there? From the command line, we can view the data using the more command (the type command doesn’t accept the syntax needed to refer to the stream).

c:\temp\streams>more < readme.txt:stream
This is data for my alternative stream

Windows uses alternative data streams for various system purposes. There are a number of names that you may encounter in files that Windows manages. This is a list of some well known stream names.

$DATA – This is the default stream. This stream contains the main (regular) data for a file. If you open README.TXT, this has the same effect as opening README.TXT:$DATA.
$BITMAP – data used for managing a b-tree for a directory. This is present on every directory.
$ATTRIBUTE_LIST – A list of attributes for a file.
$FILE_NAME – Name of file in unicode characters, including short name and hard links
$INDEX_ALLOCATION – used for managing large directories

There are some other names. In general, with the exception of $DATA, I would suggest not altering these streams.

Windows does give you the ability list alternative streams through PowerShell. We will look at that in a moment. For now, let’s say you had to make your own tool for managing such resources. The utility of this example is it is giving us an opportunity to see how we might work with these resources in code. One of the first tools that I think is useful would be a command line tool that would transfer data from one stream to another. With this tool, I can read from a stream and either write it to a console or to another stream. The only thing that affects where it is written is the file name. It only took a few minutes to write such a tool using C++. It is small enough to put the entirety of the code here.

#include <Windows.h>
#include <iostream>
#include <string>
#include <vector>
#include <list>

using namespace std;

const int BUFFER_SIZE = 1048576;

int main(int argc, CHAR ** argv)
{
    wstring InputStreamName = L"";
    wstring OutputStreamName = L"con:";
    wstring InputPrefix = L"--i=";
    wstring OutputPrefix = L"--o=";

    wstring Instructions =
        L"To use this tool, provide an input and an output file for it. The syntax looks like the following.\r\n\r\n"
        L"StreamStreamer --i=inputFileName --o=OutputFileName.ext::streamName\r\n\r\n";

    HANDLE hInputFile = INVALID_HANDLE_VALUE;
    HANDLE hOutputFile = INVALID_HANDLE_VALUE;

    vector<wstring> arguments(argc);

    for (auto i = 0; i < argc; ++i)
    {
        auto arg = string(argv[i]);
        arguments[i] = (wstring(arg.begin(), arg.end()));
    }

    for (int i = 0; i < argc; ++i)
    {
        if (!arguments[i].compare(0, InputPrefix.size(), InputPrefix))
            InputStreamName = arguments[i].substr(InputPrefix.size());
        if (!arguments[i].compare(0, OutputPrefix.size(), OutputPrefix))
            OutputStreamName = arguments[i].substr(OutputPrefix.size());
    }

    if ((!InputStreamName.size()) || (!OutputStreamName.size()))
    {
        wcout << Instructions;
        return 0;
    }

    hInputFile = CreateFile(InputStreamName.c_str(), GENERIC_READ, FILE_SHARE_READ, 0, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, 0);
    if (hInputFile != INVALID_HANDLE_VALUE)
    {
        hOutputFile = CreateFile(OutputStreamName.c_str(), GENERIC_WRITE, FILE_SHARE_READ, 0, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, 0);
        if (hOutputFile != INVALID_HANDLE_VALUE)
        {
            vector<char> buffer = vector<char>(BUFFER_SIZE);
            DWORD bytes_read = 0;
            DWORD bytes_written = 0;
            do {
                bytes_read = 0;
                if (ReadFile(hInputFile, &buffer[0], BUFFER_SIZE, &bytes_read, 0))
                    WriteFile(hOutputFile, &buffer[0], bytes_read, &bytes_written, 0);
            } while (bytes_read > 0);
            CloseHandle(hOutputFile);
        }
        CloseHandle(hInputFile);
    }
}

Usage is simple. The tool takes two arguments; an input stream name and an output stream name are passed prefixed with either --i= or --o=. If no output name is specified, it defaults to an output name of con:. This name, con:, refers to a console. That had been a reserved file name for the console. I have the vague idea there may be some other console name, but could not find it. con: is a carry-over of the DOS days of 30+ years ago. It worked then, and it still works now. So I’m sticking with it. Note that there is

After compiling this, I can use it to retrieve the text that I attached to the stream earlier.

c:\temp\streams>StreamStreamer.exe --i=readme.txt:Stream
This is data for my alternative stream

c:\temp\streams>

I can also use it to take the contents of some other arbitrary file and attach it to an existing file in an alternative stream. In testing, I took a JPG I had of the moon and attached it to a file. Then I extracted it from that alternative stream and wrote it to a different regular file just to ensure that I had an unaltered data stream.

c:\temp\streams>StreamStreamer.exe --i=Moon.JPG --o=readme.txt:moon

c:\temp\streams>StreamStreamer.exe --i=readme.txt:moon --o=m.jpg

c:\temp\streams>dir *.jpg
 Volume in drive C has no label.
 Volume Serial Number is 46FF-0556

 Directory of c:\temp\streams


10/12/2021  02:54 PM         3,907,101 m.jpg
12/21/2020  07:46 PM         3,907,101 Moon.JPG
               3 File(s)      7,814,202 bytes
               0 Dir(s)  105,063,383,040 bytes free

c:\temp\streams>

You will probably want the ability to see what streams are inside of a file. You could download the Streams tool from System Internals, or you could use PowerShell. PowerShell has built in support for streams. I’ll be using that throughout out the rest of this writeup. To view streams with PowerShell, use the Get-Item command with the -stream * parameter.

PS C:\temp\streams> Get-Item .\readme.txt -stream *


PSPath        : Microsoft.PowerShell.Core\FileSystem::C:\temp\streams\readme.txt::$DATA
PSParentPath  : Microsoft.PowerShell.Core\FileSystem::C:\temp\streams
PSChildName   : readme.txt::$DATA
PSDrive       : C
PSProvider    : Microsoft.PowerShell.Core\FileSystem
PSIsContainer : False
FileName      : C:\temp\streams\readme.txt
Stream        : :$DATA
Length        : 15

PSPath        : Microsoft.PowerShell.Core\FileSystem::C:\temp\streams\readme.txt:moon.jpg
PSParentPath  : Microsoft.PowerShell.Core\FileSystem::C:\temp\streams
PSChildName   : readme.txt:moon.jpg
PSDrive       : C
PSProvider    : Microsoft.PowerShell.Core\FileSystem
PSIsContainer : False
FileName      : C:\temp\streams\readme.txt
Stream        : moon.jpg
Length        : 3907101

PSPath        : Microsoft.PowerShell.Core\FileSystem::C:\temp\streams\readme.txt:stream
PSParentPath  : Microsoft.PowerShell.Core\FileSystem::C:\temp\streams
PSChildName   : readme.txt:stream
PSDrive       : C
PSProvider    : Microsoft.PowerShell.Core\FileSystem
PSIsContainer : False
FileName      : C:\temp\s\readme.txt
Stream        : stream
Length        : 43



PS C:\temp\streams>

If you are making an application that uses alternative streams, you will want to know how to list the streams from within it also. That is also easy to do. Since the much beloved Windows Vista we’ve had a Win32 API for enumerating streams. The functions FindFirstStreamW/FindFirstStreamTransactedW and FindNextStreamW will do this for you. Take note that there only exist Unicode versions of these functions. ASCII variations are non-existent. If you have ever used FindFirstFile or FindNextStreamW the usage is similar.

Two variables are needed to search for streams. One variable is a HANDLE that is used as an identifier for the resources and state of the search request. The other is a WIN32_FIND_STREAM_DATA structure into which data on streams that were found are put. FindFirstStreamW will return a handle and populate a WIN32_FIND_STREAM_DATA with the first stream it finds. From there, each time FindNextStreamW is called with the HANDLE that had been returned earlier, it will populate a WIN32_FIND_STREAM_DATA with the information on the next stream. When no more streams are found, FindNextStreamW will have a return value of ERROR_HANDLE_EOF.

#include <Windows.h>
#include <iostream>
#include <vector>
#include <string>

using namespace std;

int main(int argc, char**argv)
{
    WIN32_FIND_STREAM_DATA fsd;
    HANDLE hFind = NULL;
    vector<wstring> arguments(argc);

    for (auto i = 0; i < argc; ++i)
    {
        auto arg = string(argv[i]);
        arguments[i] = (wstring(arg.begin(), arg.end()));
    }

    if (arguments.size() < 2)
        return 0;
    wstring fileName = arguments[1];

    try {
        hFind = FindFirstStreamW(fileName.c_str(), FindStreamInfoStandard, &fsd, 0);
        if (hFind == INVALID_HANDLE_VALUE) throw ::GetLastError();
        const int BUFFER_SIZE = 8192;
        WCHAR buffer[BUFFER_SIZE] = { 0 };
        WCHAR fileNameBuffer[BUFFER_SIZE] = { 0 };

        wcout << L"The following streams were found in the file " << fileName << endl;
        for (;;)
        {
            swprintf(fileNameBuffer, BUFFER_SIZE, L"%s%s", fileName.c_str(), fsd.cStreamName);
            swprintf_s(buffer,BUFFER_SIZE, L"%-50s %d", fileNameBuffer, fsd.StreamSize);
            wstring formattedDescription = wstring(buffer);
            wcout << formattedDescription << endl;

            if (!::FindNextStreamW(hFind, &fsd))
            {
                DWORD dr = ::GetLastError();
                if (dr != ERROR_HANDLE_EOF) throw dr;
                break;
            }
        }
    }
    catch (DWORD err)
    {
        wcout << "Oops, Error happened. Windows error number " << err;
    }
    if (hFind != NULL)
        FindClose(hFind);
}

For my actual application purposes, I don’t need to query the streams in a file. The streams of interest to me will have a predetermined name. Instead of querying for them, I attempt to open the stream. If it isn’t there, I will get a return error code indicating that the file isn’t there. Otherwise I will have a file HANDLE for reading and writing. With what I’ve written so far, you could begin using this feature in C/C++ immediately. But my target is the .Net Framework. How do we use this information there?

In Win32, you can read or write these alternative data streams as you would any other file by using the correct stream name. If you try that within the .Net Framework, it won’t work. Before even hitting the Win32 APIs, the .Net Framework will treat the stream name as an invalid file name. To work around this, you’ll need to P/Invoke the Win32 API for opening files. Thankfully, once you have a file handle, the .Net Framework will work with that file handle just fine and allow you to use all the methods that you would with any other stream.

Before adding the P/Invoke that are needed to use this functionality in .Net, let’s defined a few numerical constants.

    public partial class NativeConstants
    {
        public const uint GENERIC_WRITE = 1073741824;
        public const uint GENERIC_READ = 0x80000000;
        public const int FILE_SHARE_DELETE = 4;
        public const int FILE_SHARE_WRITE = 2;
        public const int FILE_SHARE_READ = 1;
        public const int OPEN_ALWAYS = 4;
    }

These may look familiar. These constants have the same names as constants that were used in C when calling the Win32 API. These constants, as their names suggest, are used to indicate the mode in which files should be opened. Now fot the P/Invokes to the calls to open the files.

    public partial class NativeMethods
    {
        [DllImportAttribute("kernel32.dll", EntryPoint = "CreateFileW")]
        public static extern System.IntPtr CreateFileW(
            [InAttribute()][MarshalAsAttribute(UnmanagedType.LPWStr)] string lpFileName,
            uint dwDesiredAccess,
            uint dwShareMode,
            [InAttribute()] System.IntPtr lpSecurityAttributes,
            uint dwCreationDisposition,
            uint dwFlagsAndAttributes,
            [InAttribute()] System.IntPtr hTemplateFile
        );

    }

That’s it! That is the only P/Invoke that is needed.

The data that I was writing to these files was metadata on on files for matching them up with entries in a CMS. This includes information like the last date that it was updated on the CMS, a CRC or ETAG for knowing if the version on the local computer is the same as the one on the CMS, and a title for presenting to the user (Which may be different than the file name itself). I’ve decided that the name of the stream in which I am placing this data to simply be meta. I’m using JSON for the data encoding. For your purposes, you could use any format that fits your application. Let’s open a stream for writing.

I’ll use the Win32 CreateFileW function to get a file handle. That handle is passed to the .Net FileStream constructor. From there, there is no difference in how I would read or write from this

var filePath = Path.Combine(ds.DownloadCachePath, $"{fe.ID}{extension}");
FileInfo fi = new FileInfo(filePath);
var fullPath = fi.FullName;
if (fi.Exists)
{
    var metaStream = NativeMethods.CreateFileW(
        $"{fullPath}:meta",
        NativeConstants.GENERIC_READ,
        NativeConstants.FILE_SHARE_READ,
        IntPtr.Zero,
        NativeConstants.OPEN_ALWAYS,
        0,
        IntPtr.Zero);
    using (StreamReader sr = new StreamReader(new FileStream(metaStream, FileAccess.Read)))
    {
        try
        {
            var metaData = sr.ReadToEnd();
            if (!String.IsNullOrEmpty(metaData))
            {
                var data = JsonConvert.DeserializeObject<FileEntry>(metaData);
                fe.LastModified = data.LastModified;
            }
        } catch(IOException exc)
        {
        }
    }
}

I said earlier that this is a Windows-Only solution and that it doesn’t work on the Fat32 file system. The two implications of this is that if you are using this in in a .Net environment that is running on another operating system this won’t work. It will likely fail since the P/Invokes won’t be able to bind. The other potential problem demands an active check within the code. If a a program using alternative file streams is given a FAT32 file system to work with, it should detect that it is on the wrong type of file system before trying to perform actions that will fail. Detecting the file system type only requires a few lines of code. In .Net, the following code will take the path of the currently running assembly, see what drive it is on, and retrieve the file system type.

 String assemblyPath = typeof(FileSystemDetector).Assembly.Location;
 String driveLetter = assemblyPath[0].ToString();
 DriveInfo driveInfo = new DriveInfo(driveLetter);
 string fsType = driveInfo.DriveFormat;
 return fsType;

If this code is run on a a drive using the NTFS file system, the return type will be the string value NTFS. If it is anything else, know that attempts to access alternative streams will fail. If you try to copy these file to a FAT32 drive, Windows will warn you of a loss of data. Only the default streams will be copied to the FAT32 drive.

In the next posts on this I will demonstrate a practical use. I’ll also talk about what some might see as a security concern with alternate file streams.

j2i.net

Mostly Development, but what ever interests me at the time

Month: December 2021

Alternate File Streams::Security Concerns?

Working With Alternative Data Streams::The “Hidden” Part of Your Windows File System on Windows