October 2023 – j2i.net

Recompiling the V8 JavaScript Engine on Windows

Note Added 2025 March 10 – These instructions no longer work. Google has dropped support for using MSVC. It is still possible to build on Windows using Clang. But this presents new challenges, such as linking CLang binaries to MSVC binaries. More information on this change can be found in a Google Group discussion here.

Note Added 2024 September 3 – I tried to follow my own instructions on a whim today and found that some parts of the instructions don’t work. I made my way through them with adjustments to get to success.

I decided to compile the Google V8 JavaScript engine. Why? So that I could include it in another program. Google doesn’t distribute the binaries for V8, but they do make the source code available. Compiling it is, in my opinion, a bit complex. This isn’t a criticism. There are a lot of options for how V8 can be built. Rather than making available the permutations of these options for each version of V8, one could just set options themselves and build it for their platform of interest.

But Isn’t There Already Documentation on How to Do This?

There does exists documentation from Google on compiling Chrome. But there are variations from those instructions and what must actually be done. I found myself searching the Internet for a number of other issues that I encountered and made notes on what I had to do to get around compilation problems. The documentation comes close to what’s needed, but isn’t without error and deviation.

Setting Up Your Environment

Before touching the v8 source code, ensure that you have installed Microsoft Visual Studio. I am using Microsoft Visual Studio 2022 Community Edition. There are some additional components that must be installed. In an attempt to make this setup process as scriptable as possible, I’ve have a batch file that will have the Visual Studio Installer add the necessary components. If a component is already installed, no action is taken. Though the Google V8 instructions also offer a command to type to accomplish the same thing, this is where I encountered my first variation from their instructions. Their instructions assume that the name of the Visual Studio Installer command to be setup.exe (it probably was on a previous version of Visual Studio) where my installer is named vs_installer.exe. There were also additional parameters that I had to pass, possibly because I have more than one version of Visual Studio installed (Community Edition 2022, Preview Community Edition 2022, and a 2019 version).

pushd C:\Program Files (x86)\Microsoft Visual Studio\Installer\

vs_installer.exe install --productid Microsoft.VisualStudio.Product.Community --ChannelId VisualStudio.17.Release --add Microsoft.VisualStudio.Workload.NativeDesktop  --add Microsoft.VisualStudio.Component.VC.ATLMFC  --add Microsoft.VisualStudio.Component.VC.Tools.ARM64 --add Microsoft.VisualStudio.Component.VC.MFC.ARM64 --add Microsoft.VisualStudio.Component.Windows10SDK.20348 --includeRecommended

popd

You may need to make adjustments if your installer is located in a different path.

While those components are installing, let’s get the code downloaded and put int place. I did the download and unpacking from powershell. All of the commands that follow were stored in a power shell script. Scripting the process makes it more repeatable and is easier to document (since the scripts are also a record of what was done). You do not have to use the same file paths that I do. But if you change them, you will need to make adjustments to the instructions when one of these paths is used.

I generally avoid placing folders directly in the root. The one exception to that being a folder I make called c:\shares. There’s a structure that I conform to when placing this folder on Windows machines. For this structure, Google’s code will be placed in subdirectories of c:\shares\projects\google. In the following script you’ll see that path used.

$depot_tools_source = "https://storage.googleapis.com/chrome-infra/depot_tools.zip"
$depot_tools_download_folder= "C:\shares\projects\google\temp\"
$depot_tools_download_path = $depot_tools_download_folder + "depot_tools.zip"
$depot_tools_path = "c:\shares\projects\google\depot_tools\"
$chromium_checkout_path = "c:\shares\projects\google\chromium"
$v8_checkout_path = "c:\shares\projects\google\"

mkdir $depot_tools_download_folder
mkdir $depot_tools_path
mkdir $chromium_checkout_path
mkdir $v8_checkout_path

pushd "C:\Program Files (x86)\Microsoft Visual Studio\Installer\"
.\vs_installer.exe install --productID Microsoft.VisualStudio.Product.Community --ChannelId VisualStudio.17.Release --add Microsoft.VisualStudio.Workload.NativeDesktop  --add Microsoft.VisualStudio.Component.VC.ATLMFC  --add Microsoft.VisualStudio.Component.VC.Tools.ARM64 --add Microsoft.VisualStudio.Component.VC.MFC.ARM64 --add Microsoft.VisualStudio.Component.Windows10SDK.20348 --includeRecommended
popd

Invoke-WebRequest -Uri $depot_tools_source -OutFile $depot_tools_download_path
Expand-Archive -LiteralPath $depot_tools_download_path -DestinationPath $depot_tools_path

After this script completes running, Visual Studio should have the necessary components and the V8/Chrome development tools are downloaded and in place.

There are some environment variables on which the build process is dependent. These variables could be set within batch files, could be set to be part of the environment for an instance of the command terminal, or set at the system level. I chose to set them at the system level. This was not my first approach. I set them at more local levels initially. But several times when I needed to open a new command terminal, I forgot to apply them, and just found it easier to set them globally.

ENVIRONMENT VARIABLE	VALUE
DEPOT_TOOLS_WIN_TOOLCHAIN	0
vs2022_install	C:\Program Files\Microsoft Visual Studio\2022\Community
PATH	c:\shares\projects\google\depot_tools\;%PATH%

Environment Variables that must be set

From here on, we will be using the command prompt, and not PowerShell. This is because some of the commands that are part of Google’s tools are batch files that only run properly in the command prompt.

From the command terminal, run the command gclient. This will initialize the Google Tools. Next, navigate to the folder in which you want the v8 code to download. For me, this will be c:\shares\projects\google. The download process will automatically make a subfolder named v8. Run the following command.

fetch --nohistory v8

This command can take a while to complete. After it completes you will have a new directory named v8 that contains the source code. Navigate to that directory.

cd v8

The online documentation that I see from Google for v8 is for version 9. I wanted to compiled version 12.0.174.

git checkout 12.0.174

Update 2025 March 7

Reviewing the instructions now, I find that the above command fails. It may be necessary to fetch the labels for the versions with the following commands to get version 13.6.9.

git fetch --tags
git checkout 13.6.9

Today I am trying to only rebuild v8 for Windows. Eventually I’ll rebuild it for ARM64 also. Run the following commands. It will make the build directories and configurations for different targets.

python3 .\tools\dev\v8gen.py x64.release
python3 .\tools\dev\v8gen.py x64.debug
python3 .\tools\dev\v8gen.py arm64.release
python3 .\tools\dev\v8gen.py arm64.debug

The build arguments for each environment are in a file named args.gn. Let’s update the configuration for the x64 debug build. To open the build configuration, type the following.

notepad out.gn\x64.debug\args.gn

This will open the configuration in notepad. Replace the contents with the following.

is_debug = true
target_cpu = "x64"
v8_enable_backtrace = true
v8_enable_slow_dchecks = true
v8_optimized_debug = false
v8_monolithic = true
v8_use_external_startup_data = false
is_component_build = false
is_clang = false

Chances are the only difference between the above and the initial version of the file are from the line v8_monolithic onwards. Save the file. You are ready to start your build. To kick off the build, use the following command.

ninja -C out.gn\x64.debug v8_monolith

Update 2024 September 3 – Compiling this now, I’m encountering a different error. It appears the compilier I’m using takes issues with some of the nested #if directives in the source code. There was in in src/execution/frames.h around line 1274 that was problematic. It involved a line concerning enabling V8 Drumbrake. Nope, I don’t know what that is. This was for a call to DCHECK, which is not used in production builds. I just removed it. I encountered similar errors in src/diagnostics/objects-debug.cc, src\wasm\wasm-objects.cc,

This will also take a while to run, but this will fail. There is a third party component that will fail concerning a line in a file named fmtable.cpp. You’ll have to alter a function to fix the problem. Open the file in the path .\v8\third_party\icu\source\i18n\fmtable.cpp. Around line 59, you will find the following code.

static inline UBool objectEquals(const UObject* a, const UObject* b) {
     // LATER: return *a == *b
     return *((const Measure*)a) == ((const Measure*)b);
}

You’ll need to change it so that it contains the following.

static inline UBool objectEquals(const UObject* a, const UObject* b) {
     // LATER: return *a == *b
     return *((const Measure*)a) == *b;
}

Save the file, and run the build command again. While that’s running, go find something else to do. Have a meal, fly a kite, read a book. You’ve got time. When you return, the build should have been successful.

Hello World

Now, let’s make a hellow world program. Google already has a v8 hellow would example that we can use to see that our build was successful. We will use it for now, as I’ve not discussed anything about the v8 object library yet. Open Microsoft Visual Studio and create a new C++ Console application. Replace te code in the cpp file that it provides with Google’s code.

// Copyright 2015 the V8 project authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include "libplatform/libplatform.h"
#include "v8-context.h"
#include "v8-initialization.h"
#include "v8-isolate.h"
#include "v8-local-handle.h"
#include "v8-primitive.h"
#include "v8-script.h"

int main(int argc, char* argv[]) {
    // Initialize V8.
    v8::V8::InitializeICUDefaultLocation(argv[0]);
    v8::V8::InitializeExternalStartupData(argv[0]);
    std::unique_ptr<v8::Platform> platform = v8::platform::NewDefaultPlatform();
    v8::V8::InitializePlatform(platform.get());
    v8::V8::Initialize();

    // Create a new Isolate and make it the current one.
    v8::Isolate::CreateParams create_params;
    create_params.array_buffer_allocator =
        v8::ArrayBuffer::Allocator::NewDefaultAllocator();
    v8::Isolate* isolate = v8::Isolate::New(create_params);
    {
        v8::Isolate::Scope isolate_scope(isolate);

        // Create a stack-allocated handle scope.
        v8::HandleScope handle_scope(isolate);

        // Create a new context.
        v8::Local<v8::Context> context = v8::Context::New(isolate);

        // Enter the context for compiling and running the hello world script.
        v8::Context::Scope context_scope(context);

        {
            // Create a string containing the JavaScript source code.
            v8::Local<v8::String> source =
                v8::String::NewFromUtf8Literal(isolate, "'Hello' + ', World!'");

            // Compile the source code.
            v8::Local<v8::Script> script =
                v8::Script::Compile(context, source).ToLocalChecked();

            // Run the script to get the result.
            v8::Local<v8::Value> result = script->Run(context).ToLocalChecked();

            // Convert the result to an UTF8 string and print it.
            v8::String::Utf8Value utf8(isolate, result);
            printf("%s\n", *utf8);
        }

        {
            // Use the JavaScript API to generate a WebAssembly module.
            //
            // |bytes| contains the binary format for the following module:
            //
            //     (func (export "add") (param i32 i32) (result i32)
            //       get_local 0
            //       get_local 1
            //       i32.add)
            //
            const char csource[] = R"(
        let bytes = new Uint8Array([
          0x00, 0x61, 0x73, 0x6d, 0x01, 0x00, 0x00, 0x00, 0x01, 0x07, 0x01,
          0x60, 0x02, 0x7f, 0x7f, 0x01, 0x7f, 0x03, 0x02, 0x01, 0x00, 0x07,
          0x07, 0x01, 0x03, 0x61, 0x64, 0x64, 0x00, 0x00, 0x0a, 0x09, 0x01,
          0x07, 0x00, 0x20, 0x00, 0x20, 0x01, 0x6a, 0x0b
        ]);
        let module = new WebAssembly.Module(bytes);
        let instance = new WebAssembly.Instance(module);
        instance.exports.add(3, 4);
      )";

            // Create a string containing the JavaScript source code.
            v8::Local<v8::String> source =
                v8::String::NewFromUtf8Literal(isolate, csource);

            // Compile the source code.
            v8::Local<v8::Script> script =
                v8::Script::Compile(context, source).ToLocalChecked();

            // Run the script to get the result.
            v8::Local<v8::Value> result = script->Run(context).ToLocalChecked();

            // Convert the result to a uint32 and print it.
            uint32_t number = result->Uint32Value(context).ToChecked();
            printf("3 + 4 = %u\n", number);
        }
    }

    // Dispose the isolate and tear down V8.
    isolate->Dispose();
    v8::V8::Dispose();
    v8::V8::DisposePlatform();
    delete create_params.array_buffer_allocator;
    return 0;
}

If you try to build this now, it will fail. You need to do some configuration. Here is a quick list of the configuration changes. If you don’t understand what to do with these, that’s find. I’ll will walk you through applying them.

VC++ Directories : 
	Include : v8\include
	Library Directories<Debug>: v8\out.gn\x64.debug\obj
	Library Directories<Release>: v8\out.gn\x64.release\obj

C/C++
	Code Generation
		Runtime Library <Debug>: /MTd
		Runtime Library <Release> /Mt
	Preprocessors
		V8_ENABLE_SANDBOX;V8_COMPRESS_POINTERS;_ITERATOR_DEBUG_LEVEL=0;
		
Linker
	Input
		Additional Dependencies: v8_monolith.lib;dbghelp.lib;Winmm.lib;

Right-click on the project file and select “Properties.” From the pane on the left, select VC++ Directories. In the drop-down on the top, select All Configurations. On the right there is a field named Include. Select it, and add the full path to your v8\include directory. For me, this will be c:\shares\projects\google\v8\include. If you build in a different path, it will be different for you. After adding the value, select Apply. You will generally want to press Apply after each field that you’ve changed.

Change the Configuration drop-down at the top to Debug. In the Library Directories entry, add the full path to your v8\out.gn\x64.debug\obj folder and click Apple. Change the Configuration dropdown to Release and in Library Directories add the full path to your v8\out\gn\x64.release\obj folder.

From the pane on the left, expand C/C++ and select Code Generation. On the right, set the Debug value for Runtime Library to /MTd and set the Release value for the field to /Mt.

Change the Configurations option to All and set add the following values to Preprocessors

V8_ENABLE_SANDBOX;V8_COMPRESS_POINTERS;_ITERATOR_DEBUG_LEVEL=0;

Keep the Configurations option on ALL. Expand Linker and select Input. For Additional Dependencies enter v8_monolith.lib;dbghelp.lib;Winmm.lib;

With that entered, press Okay. You should now be able to run the program. It will pass some values to the JavaScript engine to execute and print out the values.

What’s Next

My next set of objectives is to demonstrate how to project a C++ object into JavaScript. I also want to start thinning out the size of these files. On a machine that is using the v8 binaries, the entire build tools are not needed. At the end of the above process the b8 folder has 12 gigs of files. If you copy out only the build files and headers needed for other projects, the file size is reduced to 3 gigs. Further reductions could occur through changing some of the compilation options.

Mastodon: @j2inet@masto.ai
Instagram: @j2inet
Facebook: @j2inet
YouTube: @j2inet
Telegram: j2inet
Twitter: @j2inet

Making a Web Crawler using the Android Web Client

Source Code

Like many others my coworkers and I have been called back to work in the office for part of the week. Returning to the office hasn’t been without its challenges, especially since the environment has substantially changed. At the end of one week I was asked to collect some information on ads served to the browser in certain countries. To gather this information, I used a VPN to browse from a different country and I created a web-crawler using JavaScript and Node. It created a browser instance, followed links starting from a specific set of pages, and kept track of resources that the pages loaded, and download content that was accessed from certain domains. The app worked fine and it collected the information that I needed. On Monday, when I was in the office, I was asked to produce a similar dataset as seen from a different country. I started my software to do this tasks only to find that the network now actively blocks VPN connections.

I thought about driving back home to complete the task, but decided to just make a new web crawler to run from my Android tablet. That’s what I did. I made an app with a WebView and had it to load each one of ty starting pages. For each page that loaded, there were two sets of data that I needed to capture; the resources that the page requested, and the links that were in the page. To retrieve this information, I would need a WebViewClient for the WebView. The WebViewClient is an object with a number of methods that get called that let one intercept or get notifications of what the WebView is doing. I was only concerned with a few methods on this object.

onPageFinished – Fires once a page has finished loading
onLoadResource – Fires when a page is requesting a resource, such as an image

When a page finishes loading, I grab the links. There is not API specifically for querying the page’s DOM. There is, however, a method on the WebView to execute JavaScript and return the results as a string object. I inject a small function into the page that grabs the links and extract them from the JSON array of strings that comes back. This is the JavaScript.

function extractLinks(){
     var list = Array.from(document.getElementsByTagName('a')); 
     for(var i=0;i<list.length;++i) { 
           list[i] = list[i].href;
     }
     return list;
})()

To execute the JavaScript in the webview, I use the WebView’s evaluateJavascript() method. The method accepts a ValueCallback object. The value is a string of the JSON encoding of the information. I convert that to a String array and save the links. The two references to the dataHandler object are from a class that I defined. The two methods of interest here are LinksExtracted(String[]) and PageLoadComplete(). The LinksExtracted method receives all of the URLs of the links in the page. The dataHandler is responsible for saving those. PageLoadComplete is used to create demarcaction in the data between the pages. Note that this method of capturing links isn’t perfect; it is possible that after a page loads, the page could dynamically adjust the HTML to remove some links and add others. For my application, the result of this apparent oversight is fine.

    override fun onPageFinished(view: WebView?, url: String?) {
        super.onPageFinished(view, url)

        view!!.evaluateJavascript("(function extractLinks(){var list = Array.from(document.getElementsByTagName('a')); for(var i=0;i<list.length;++i) { list[i] = list[i].href};; return list;})()",
            object:ValueCallback<String> {
                override fun onReceiveValue(value: String) {
                    if(value != null && value != "null")
                    {
                        val gson = GsonBuilder().create()
                        val theList = gson.fromJson<ArrayList<String>>(value, object :
                            TypeToken<ArrayList<String>>(){}.type)
                        if(theList != null) {
                            dataHandler.LinksExtracted(theList.toTypedArray());
                        }
                    }
                    dataHandler.PageLoadComplete()
                }
            }
            )
    }

The links are persisted to an SqLite database. To do this, I’ve defined a data class for holding a row of data.

package net.j2i.webcrawler.data
import kotlinx.serialization.Serializable

@Serializable
data class UrlReading(val sessionID:Long=0L, val pageRequestID:Long = 0L, val url:String = "", val timestamp:Long = -1L) {
}

The sessionID will be the same for all values captured during the same run of the program. pageRequestID increments every time a new page loads. urlString contains the information of interest, the URL. And timestamp contains the time at which the URL was captured.

Creation of the database and insertion of data into it fairly plain-vanilla code. I won’t post the code here, but if you would like to see it, it’s on GitHub and can be found through this link: https://github.com/j2inet/sample-webcrawler/blob/main/app/src/main/java/net/j2i/webcrawler/data/UrlReadingDataHelper.kt

When the data is to be extracted, the program will write it to a CSV file with headers. To minimize the memory demand for this, I have a method on the data helper that will write the data as a cursor is reading it.

    fun writeAllRecords(os:OutputStreamWriter):List<UrlReading>  {

        os.write("SessionID, PageRequestID, Timestamp, URL\r\n")

        val readings = mutableListOf<UrlReading>()
        val db = writableDatabase
        val projection = arrayOf(
            BaseColumns._ID,
            UrlReadingsContract.COLUMN_NAME_SESSION_ID,
            UrlReadingsContract.COLUMN_NAME_PAGE_REQUEST_ID,
            UrlReadingsContract.COLUMN_NAME_URL,
            UrlReadingsContract.COLUMN_NAME_TIMESTAMP
        )
        val sortOrder = "${UrlReadingsContract.COLUMN_NAME_TIMESTAMP} ASC"
        val cursor = db.query(
            UrlReadingsContract.TABLE_NAME,
            projection,
            null,
            null,
            null,
            null,
            sortOrder
        )
                    with(cursor) {
            while (moveToNext()) {
                val reading = UrlReading(
                    //source = getString(getColumnIndexOrThrow(BaseColumns._ID)),
                    sessionID = getLong(getColumnIndexOrThrow(UrlReadingsContract.COLUMN_NAME_SESSION_ID)),
                    pageRequestID = getLong(getColumnIndexOrThrow(UrlReadingsContract.COLUMN_NAME_PAGE_REQUEST_ID)),
                    url = getString(getColumnIndexOrThrow(UrlReadingsContract.COLUMN_NAME_URL)),
                    timestamp = getLong(getColumnIndexOrThrow(UrlReadingsContract.COLUMN_NAME_URL)),
                );

                val line = "${reading.sessionID}; ${reading.pageRequestID}; ${reading.timestamp}, ${reading.url}\r\n";
                os.write(line);
                readings.add(reading)
            }
        }
        return readings
    }

The program keeps track of the URLs that it has found links for and ads them to a list. When going to the next page, it randomly selects from this list (and removes the item selected). However, the program will first visit all of the initial set of URLs that it was given before randomly selecting. If I don’t do this, then the links found on the first page loaded might result in the other initial set of pages not being visited or not having a chance of having as much of an impact in the pages visited. Those initial URLs are added to the list and a count of the URLs is saved.

        UrlList.add("https://msn.com")
        UrlList.add("https://yahoo.com");
        linearLoadCount = UrlList.count()

The method for loading random URLs initially dequeues URLs from the beginning of the list. After all of the intial URLs have been read, random reads occur.

    fun openRandomSite() {
        var index = 0;
        if(linearLoadCount>0) {
            --linearLoadCount
            var index = random.nextInt(UrlList.count())
        }
        val nextUrl = UrlList[index];
        UrlList.removeAt(index);
        mainWebView!!.loadUrl(nextUrl)
    }

To keep the pages cycling, in the PageLoadComplete()handler the next call to load a random page is queue (with a delay).

            override fun PageLoadComplete() {
                ++pageSessionID;
                mainHandler.postDelayed(object:Runnable {
                    override fun run() {
                        openRandomSite()
                    }
                },NAVIGATE_DELAY)
            }

It took less time to write this than it would have to drive home. The initial set of URLs in the code are in the source code. This was written to only be used once, so I skipped practices that would have made the program of more general utility. Nevertheless, I think it might be useful to someone. You can find the complete source code on GitHub.

https://github.com/j2inet/sample-webcrawler

Mastodon: @j2inet@masto.ai
Instagram: @j2inet
Facebook: @j2inet
YouTube: @j2inet
Telegram: j2inet
Twitter: @j2inet

USA Testing Emergency Alert System on 4 October 2023 around 2:20 pm

On 3 October 2023 around 2:20PM, the USA is testing its emergency alert system. The test will be broadcast over radio (including TV) and mobile phone. Expect phones to be blaring around you around this time. Don’t worry, this is only a test.

If you are likely to be in a situation where you cannot afford or tolerate your phone going off, then you might want to keep your phone powered off around this time. Some environments, such as courthouses, have rules on phones being in silent mode or turned off (I believe a phone going off in court in Atlanta can get someone in trouble for contempt of court). Even if you’ve muted all your settings on your phone, this alert might not respect those settings. While some phones expose settings to silence other alerts, the national alert system’s setting has been unalterable on the phones that I’ve examined over the years.

When the test goes off, don’t be alarmed. If you have one of those emergency tests radios, it might be a good opportunity to see how well it works.

j2i.net

Mostly Development, but what ever interests me at the time

Month: October 2023