Scanner | FIE Security Scanner

Scanning Overview

After receiving an extension ID, our system downloads the corresponding extension package (either a .zip or .crx file) from sources such as the Chrome Web Store or Chrome Stats. Once downloaded, the contents are extracted so that we can analyze each component of the extension individually.

The extracted files are sorted into several categories based on their type: JavaScript, HTML, CSS, JSON, and a catch-all bin for any remaining file types. Organizing files this way allows each file to be routed to a specialized scanner designed for that specific format.

Each scanner extracts features that may indicate potentially malicious behavior — including both structural characteristics (such as formatting patterns or entropy) and behavioral indicators (such as suspicious function calls or risky permissions). All extracted features are aggregated into a centralized Extension class, which acts as a container for metadata, file contents, and collected features, ready to be passed to our machine learning models for classification.

JavaScript Scanner

JavaScript files are first run through JSBeautify to reverse minification and restore a readable structure. The beautified code is then parsed with Esprima into an Abstract Syntax Tree (AST), which allows us to systematically analyze relationships between functions, variables, and expressions.

Structural Features

Structural features identify suspicious formatting patterns that may indicate packed or obfuscated code — for example, very long lines, low whitespace percentage, or unusually high string entropy.

Average line length
Frequency of specific characters
Average word size
String entropy
Keyword density

Behavioral Features

Behavioral features capture potentially dangerous functionality. We traverse the AST for CallExpression nodes to locate risky APIs commonly abused in malicious extensions.

Code generation functions
DOM change methods
Event handlers
Number of HTTPS scripts
Modification callbacks
XMLHttpRequests
eval calls

HTML Scanner

HTML files are analyzed using Python's native file reading functionality combined with rule-based detection and pattern matching. The scanner looks for elements commonly used in web-based attacks — such as hidden frames, embedded external content, or inline JavaScript execution — that could inject malicious scripts or silently redirect users.

Suspicious Objects & XSS Vectors

num_object_tags
num_embed_tags
num_applet_tags
num_inline_event_handlers
num_javascript_urls
num_data_urls
num_external_script_src
num_meta_refresh

Iframes & Forms

num_iframe_tags
num_external_iframe_src
num_form_tags
num_external_form_actions
num_password_inputs

Feature extraction works by scanning the HTML structure and counting occurrences of specific tags, attributes, and URL patterns.

Manifest Scanner

Every Chrome extension includes a manifest.json file that defines its configuration and permissions. We load this file using Python's built-in json module, which converts the data into a Python dictionary for straightforward key-based extraction.

Features Extracted

Permissions requested by the extension
Whether a Content Security Policy (CSP) is defined
Domains from which external content is allowed to load

Why It Matters

Extensions requesting excessive permissions or allowing content from untrusted domains may pose significant security concerns. The manifest provides high-signal metadata that complements the behavioral features extracted from code files.

CSS Scanner

CSS files are analyzed using regex-based pattern matching, scanning line by line to track the frequency of properties and directives that can sometimes be abused to load external resources or alter page behavior unexpectedly.

Features Extracted

background-image properties
behavior properties
@import rules

Extraction Method

Regular expressions detect the presence of each pattern across all CSS files in the extension. The scanner records how frequently each feature appears, and the results are added to the feature set stored in the Extension class.

Scanner Pipeline

Scanning Overview

JavaScript Scanner

Structural Features

Behavioral Features

HTML Scanner

Suspicious Objects & XSS Vectors

Iframes & Forms

Manifest Scanner

Features Extracted

Why It Matters

CSS Scanner

Features Extracted

Extraction Method