Datascan data model

Design

All ONYPHE categories related to Attack Surface Management (ASM) share a common data model, based on the Datascan category. Datascan, Vulnscan and Riskscan form the backbone of our ASM capabilities.

datascan is the broadest category. It contains results from active Internet scanning, storing application-level responses from every open service we probe. From a single service, datascan can capture the raw response, protocol fingerprints, TLS certificate details, detected products and OS, extracted URLs and IPs, and application-specific metadata. It supports a wide range of protocols: HTTP, SMB, FTP, Telnet, RDP, VNC, databases, industrial protocols (Modbus), and many more.

vulnscan builds on the same base data model as datascan but is exclusively focused on vulnerability detection. It targets known vulnerable products and services, applying two complementary detection techniques: check-based detection (active, non-intrusive exploit-derived probes that conclusively confirm or deny a vulnerability) and version-based detection (version fingerprinting cross-referenced against CVE lists). The tag and cve fields are central to vulnscan: they tell you exactly which CVEs were detected and how confident the detection is.

riskscan is an enriched subset of datascan, ctiscan, and vulnscan. After comparing scan results with a CTI-informed risk baseline, riskscan retains records that match at least one meaningful risk condition, such as an exposed database, a critical CVE, a sensitive protocol (RDP, SMB, Telnet), or a compromised device. Every riskscan entry carries one or more *risk::** tags that classify the risk type. The goal is to give security teams a pre-filtered, actionable view of their Internet-exposed attack surface, without having to write complex queries from scratch.


The shared data model is structured around a few organizing principles:


All three categories share the same field types, with the same search implications:

Sample queries

Find all exposed RDP services on a specific domain

category:datascan protocol:rdp domain:example.com

Find databases exposed without authentication

category:datascan device.class:database tag:open

Find a specific CVE across all vulnerable services

category:vulnscan cve:CVE-2021-44228

Find confirmed vulnerable services (check-based detection)

category:vulnscan tag:vulnerable

Find services with known exploited vulnerabilities (CISA KEV)

category:vulnscan tag:"cisa::kev"

Find all risks for a given IP

category:riskscan ip:198.51.100.1

Find all critical CVE risks

category:riskscan tag:risk::criticalcve

Find SMB null session exposures

category:riskscan tag:risk::smbnullsession

Find VPN servers as potential ransomware entry points

category:riskscan tag:risk::vpnserver

Find exposed management interfaces with login

category:riskscan tag:risk::loginmanagement

Find services with obsolete software

category:datascan tag:obsolete

category:datascan app.http.title.text:admin

Find services by product name

category:datascan product:apache productversion:2.4.29

Find open S3-compatible buckets

category:riskscan tag:risk::openbucket

Find exposed SCADA and industrial devices

category:riskscan tag:risk::sensitivedevice

Find services with expired TLS certificates

category:riskscan tag:risk::certexpired

Search by certificate subject common name

category:datascan subject.commonname:"*.example.com"

Find services with anonymous FTP access

category:datascan app.ftp.anonymous:true

Search for a specific HTTP response body hash

category:datascan app.http.bodymd5:"d41d8cd98f00b204e9800998ecf8427e"

Find services with a specific ASN

category:datascan asn:AS15169

Find compromised devices

category:riskscan tag:risk::compromised

Fields

Common fields

@timestamp

@category

tag

source

Network identification

ip

alternativeip

ipv6

port

transport

protocol

protocolversion

status

reason

tls

hostname

host

domain

tld

subdomains

forward

reverse

url

Geolocation (physical)

country

city

asn

organization

subnet

location

Extended geolocation (logical, via whois)

The geolocus sub-object contains geolocation information derived from whois data, providing a logical view of the IP address ownership as opposed to the physical hosting location.

geolocus.asn

geolocus.continent

geolocus.continentname

geolocus.country

geolocus.countryname

geolocus.domain

geolocus.isineu

geolocus.latitude

geolocus.longitude

geolocus.location

geolocus.netname

geolocus.organization

geolocus.subnet

Scanner node

The node sub-object contains information about the ONYPHE scanner that collected the data.

node.id

node.groupid

node.country

node.physicalcountry

Software identification

product

productvendor

productversion

productversionpatch

OS identification

os

osvendor

osversion

osversionpatch

osbits

osdistribution

osdistributionversion

Device classification

The device sub-object classifies the scanned asset into a functional category and associates it with a detected product.

device.class

device.product

device.productvendor

device.productversion

device.productversionpatch

Vulnerability indicators

These fields are populated in all three categories but are especially central to vulnscan.

cve

cvecount

cpe

cpecount

Application data

data

datamd5

datammh3

summary

summarymd5

summarymmh3

app.length

TLS certificate

These fields are populated when a TLS handshake was performed and a certificate was retrieved.

serial

ca

basicconstraints

wildcard

version

keyusage

extkeyusage

fingerprint.md5

fingerprint.sha1

fingerprint.sha256

issuer.commonname

issuer.organization

issuer.organizationalunit

issuer.country

issuer.city

issuer.email

issuer.serial

subject.commonname

subject.altname

subject.organization

subject.organizationalunit

subject.country

subject.city

subject.email

subject.serial

publickey.algorithm

publickey.length

publickey.exponent

signature.algorithm

validity.notbefore

validity.notafter

Application-specific fields

The app object contains protocol-specific sub-objects populated based on the detected application protocol. Only the sub-object corresponding to the detected protocol is populated for a given record.

app.extract.ip

app.extract.hostname

app.extract.domain

app.extract.url

app.extract.file

app.http.realm

app.http.headermd5

app.http.headermmh3

app.http.bodymd5

app.http.bodymmh3

app.http.title

app.http.keywords

app.http.description

app.http.copyright

app.http.component

app.http.header

app.http.tracker.ga

app.http.tracker.gaw

app.http.tracker.gtm

app.http.tracker.gpub

app.http.tracker.fbq

app.http.tracker.snaptr

app.http.tracker.newrelic

app.smb.workgroup

app.smb.nullsession

app.smb.servername

app.smb.share

app.ftp.anonymous

app.telnet.fingerprint

app.database.name

app.database.count

app.elasticsearch.clustername

app.elasticsearch.luceneversion

app.mongodb.name

app.dns.versionbind

app.modbus.function

app.modbus.code

app.modbus.objectcount

app.modbus.product

app.modbus.productvendor

app.modbus.productversion

app.modbus.productversionpatch

app.modbus.information

app.ntp.leap

app.ntp.version

app.ntp.mode

app.ntp.stratum

app.rtsp.realm

app.favicon.url

app.favicon.filename

app.favicon.format

app.favicon.length

app.favicon.image

app.favicon.imagemd5

app.favicon.imagemmh3

Company context

The company sub-object provides business context about the organization associated with the scanned asset. This information is correlated externally and may not always be present.

company.name

company.globalrank

company.fortunerank

company.sector

company.industry

company.country