ONYPHE Query Language (OQL)
OQL can be used with the following APIs:
It allows to search for data using filters and boolean operators. A number of integrations exist in various languages if you want to avoid developping your own integration with our APIs. See integrations chapter.
You can either use it from the CLI tools or from the Web interface which leverages the Search API under the hood.
General OQL syntax
The syntax is the following:
category:<CATEGORY> filter1:<VALUE1> filter2:<VALUE2> -<FUNCTION1>:<FUNCTION_VALUE1> -<FUNCTION2>:<FUNCTION_VALUE2>
- category: you chose which category of information you want to query. For instance, category:datascan, category:vulnscan or any of all the other categories we have;
- filter: you can pass as many filters as you need;
- function: you may also pass as many functions as needed.
Examples:
- Search historical data for a given domain & protocol:
category:datascan domain:google.com protocol:rdp -monthago:3
- Identify all exposed VPN servers seen in the last 30-days:
category:datascan device.class:"vpn server"
NOTE: field values are NOT case sensitive, while fields ARE case sensitive but always available as lowercase.
NOTE2: if you need to pass values containing space characters, you have to enclose values with double-quotes. Examples: device.class:"vpn server", device.class:database.
Supported boolean operators
OQL supports the following boolean operators:
- AND: implied by default between all filters;
- NOT: by prefixing a field name with ! character;
- OR: by prefixing a field name with ? character.
NOTE: OR boolean operator is a feature available starting from Lion Views.
Examples:
- Search for some exposed protocol AND associated with a specific domain name:
category:datascan protocol:rdp domain:google.com
- Search some domain for all identified assets, except on given organization:
category:datascan domain:google.com !organization:google
- Search on some domain for either rdp or ssh protocols:
category:datascan ?protocol:rdp ?protocol:ssh domain:google.com
Full-text search vs exact search
By default, all fields are searchable with exact values only. That means you have to correctly enter the value for a filter. For instance, to search against protocol:rdp, you have to give the exact rdp string.
For specific fields, you can search for words in a full-text index. The following list are the full-text enabled fields in the datascan/riskscan/vulnscan data model:
- data: the raw data we have collected. For instance, the raw application response to an application request;
- summary: the most imported words taken as a subset of the data field;
- app.http.title: the HTML title of an HTTP response body;
- app.http.description: the HTML description from an HTTP response body;
- app.http.keywords: the HTML keywords from an HTTP response body;
- app.http.copyright: the HTML copyright from an HTTP response body.
Fields with full-text search enabled in Ctiscan are indicated with a .text suffix. For example: app.data.text
Therefore, only the aforementioned list of fields can be used to perform full-text searches, all the others only accept exact values.
Examples:
- Full-text search on HTML title:
category:datascan app.http.title:confluence
- Exact search of aforementioned software:
Listing all available filters
You can either navigate through the Web interface to find the fields that you need to refine your search, either from displayed tabs or from the JSON tab. In fact, all fields displayed in JSON output can be used as filters except reserved fields prefixed with an @ such as @timestamp.
IP vs CIDR or network searches
When you need to find assets on a specific network block, you can use CIDR notation. However, to avoid performing I/O intensive searches, you cannot specify networks larger than /16. You may use the splitsubnet CLI procedure to auto-split CIDR searches in smaller subnets.
- Search a specific IP address:
- Search a specific network block:
category:datascan ip:8.8.8.0/24
- You can even search for an entire ASN, if that makes sense to you:
Not all fields support CIDR searches. The following fields are capable of that:
- ip: asset IP address, that’s the one that has been connected and from where the data content comes from;
- alternativeip: DNS resolution from hostname bound to the given address;
- app.extract.ip: when some IP addresses were identified from the raw data content, we extract them and makes them searchable from this field.
NOTE: the subnet field is NOT capable of CIDR searches, you have to pivot from this field value and use the value against the ip field.
How hostnames are split
Our approach for building an Attack Surface Discovery & Attack Surface Management inventory is domain-based. To achieve that goal, we split hostnames (or Fully-Qualified-Domain-Names, or sometimes called subdomains) into different components. Thus, we split a FQDN into the following fields:
- tld: the Top-Level-Domain part, being a 1st-level, a 2nd-level or even at the regional or sector levels. Example: net for sam.probe.onyphe.net;
- domain: the domain name, which includes the tld. Example: onyphe.net for sam.probe.onyphe.net;
- subdomains: when a FQDN has several dots in it, we may have an array of values for this field. Example: probe.onyphe.net for sam.probe.onyphe.net;
- host: the hostname part, like sam for sam.probe.onyphe.net.
In the end, when you don’t want to know how to query for a specific domain-based value, you can always perform an OR query:
NOTE: to perform this split, we rely on a list of TLDs built from IANA the list. Our list is also available on our Github.
Search functions
To refine your searches, we have a number of functions available. They may help you identify assets exposed in the past, reverse the sort results or refine your search for specific assets.
NOTE: functions are only available with Enterprise licenses.
Time range functions
These functions allows you to search through historical data.
-hourago
Query data collected some hours ago. The use case is to automate your searches every hour to search for specific gems on previous hour of collected information.
- -hourago:1:
category:datascan protocol:rdp -hourago:1
To query the current hour:
- -hourago:0:
category:datascan protocol:rdp -hourago:0
NOTE: an hour starts at minute 00 and ends at minute 59.
NOTE2: you can increment the hour counter to as much as your license allows for. For Lynx Views, that number may be up-to 30-days of data, so -hourago:720.
-dayago
In the same way, you may want to execute searches at the day granularity level. To query the previous day of data:
- -dayago:1:
category:datascan protocol:rdp -dayago:1
To query current day:
- -dayago:0:
category:datascan protocol:rdp -dayago:0
NOTE: a day starts at 00:00 hour and ends at 23:59 hour.
NOTE2: you can increment the day counter to as much as your license allows it. For Lynx Views, that number may be up-to 30-days of data, so -dayago:30.
-weekago
Same as before, at the week granularity level. To query previous week of data:
- -weekago:1:
category:datascan protocol:rdp -weekago:1
To query current week:
- -weekago:0:
category:datascan protocol:rdp -weekago:0
NOTE: a week starts on Monday at 00:00 and ends on Sunday at 23:59.
NOTE2: you can increment the week counter to as much as your license allows it. For Lynx Views, that number may be up-to 30-days of data, so -weekago:4.
-monthago
Same as before, at the month granularity level. To query previous month of data:
- -monthago:1:
category:datascan protocol:rdp -monthago:1
To query current month:
- -monthago:0:
category:datascan protocol:rdp -monthago:0
NOTE: a month starts the 1st at 00:00 and ends last day of the month at 23:59.
NOTE2: you can increment the month counter to as much as your license allows it. For Lion Views, that number may be up-to 90-days of data, so -monthago:3.
-since
Sometimes, you may want to query on the full time range allowed by your license. Please note that this function is subject to some limitations based on your license.
For instance, Eagle Views can use the -since:7M from Search API but not from the Export API. Griffin Views can use the full time range on all APIs, up-to 48 months of historical data for the relevant categories. To search for all exposed rdp services on the full 7-month time range:
- -since:7M:
category:datascan protocol:rdp -since:7M
Wildcard functions
OQL also has the capability to search using wildcards. This is possible only against exact search fields, not against full-text search fields. Also, these functions have the same limitations as the -since function, you can only use it against last 30-days of data for Eagle Views but on full time range for Griffin Views.
Wildcards accept the same syntax as usual UNIX shells:
- ?: substitute exactly one unknown character;
- *: substitute zero or more unknown characters.
-wildcard
The syntax for wildcard functions is as follows:
category:<CATEGORY> -wildcard:<FIELD_NAME>,<SEARCH STRING>
category:<CATEGORY> -wildcard:<FIELD_NAME>,"<SEARCH STRING>" # with quotes if the string contains spaces
One of the use cases for wildcard searches is to identify typosquatting or phishing hostnames or domains. You may want to identify domains that look like yours, or to search against all TLDs for a given domain:
- Search typosquatting against google.com:
category:resolver -wildcard:domain,g??gle.com !domain:google.com
- Search phishing hostnames for google.com:
category:datascan -wildcard:hostname,*.google.com.* -notwildcard:domain,google.*
WARNING: this request is I/O intensive. You may receive request timeout errors. Feel free to relaunch your search until it succeeds.
- Search all TLDs for google:
category:resolver -wildcard:domain,google.*
-orwildcard
You may also want to pass multiple wildcard conditions. Simply replace your -wildcard functions with multiple -orwildcard functions:
- Search some typosquatting against google.com:
category:resolver -orwildcard:domain,g?ogle.com -orwildcard:domain,googl?.com !domain:google.com
-notwildcard
You can even exclude some wildcards:
- Search some typosquatting against google.*:
Regular Expressions
-regexp
Similar in syntax to wildcard functions, -regexp allows for powerful queries within exact match fields. Regular expressions can’t be used against full-text search enabled fields. The Ctiscan data model includes both exact match (.raw suffix) and full-text versions (.text suffix) of certain key fields, such as the HTML title. This allows for either full-text or regular expression searches against those fields.
The syntax for regexp functions is as follows:
category:<CATEGORY> -regexp:<FIELD_NAME>,"<REGULAR_EXPRESSION>"
- Search for certificates likely to be used in phishing attacks against google.com:
category:ctl -regexp:domain,"g[^\\.o]ogle[a-z0-9-]*\\.[a-z\.]{1,}" -since:1w
Escaped special characters within the expression, must themselves be escaped within OQL. So therefore a regular expression for a full-stop/period character requires two back-slashes (\\) to be correctly interpreted.
WARNING: regexp requests can be I/O intensive. You may receive request timeout errors. Feel free to relaunch your search until it succeeds.
-orregexp
You may also want to pass multiple regular expressions conditions or combine a regexp function with OR conditions. Simply replace your -regexp functions with multiple -orregexp functions:
- Search some typosquatting against google.com OR apple.com:
-notregexp
Results can be excluded by regular expression using -notregexp.
- Search for certificates likely to be used in phishing attacks against google.com excluding a selection of TLD patterns:
Other functions
-exists
The use case for this function is to identify assets which a specific field set. For instance, you may want to identify assets with a CVE identified, whatever the CVE is. datascan & vulnscan categories are the most interesting categories to use that function against.
- Identified potential CVEs:
category:datascan domain:google.com -exists:cve
category:vulnscan domain:google.com -exists:cve
-notexists
Does the opposite of -exists function. For instance, you may want to check an asset has been scanned for vulnerabilities and that they are not vulnerable.
- Identify not vulnerable to verified CVEs we check:
category:vulnscan domain:google.com -notexists:cve
-orexists
You may also want to search for different existing fields with the -orexists function. A use case would be to search for an existing CVE or an existing product:
- Search for CVEs impacting an asset or identified CPEs:
category:vulnscan domain:google.com -orexists:cve -orexists:cpe
-fields
This function has been designed to reduce the volume of data before applying some local processing or to integrate within a SIEM where license price is based on volume of indexed data. Sometimes, you may only be interested in identifying IP addresses from a specific search, thus you want to receive only the ip field as a result.
- Fetch a list of IP addresses with some specific open ports and only fetch ip & port information:
category:datascan ?port:3389 ?port:3390 ?port:3391 -fields:ip,port
-sort
By default, latest result is displayed first on output. In some cases, you want to identify the older result.
- Search oldest compromised asset from ESXIargs compromissions:
category:datascan app.http.title:"How to Restore Your Files" -since:7M -sort:0
-tlsexpired
Forgetting to renew a certificate is a thing. Also, entreprises not decommissioning assets is a thing. By searching for expired certificates, you can find lost treasure.
- Search expired certificates:
category:datascan domain:google.com -tlsexpired:1
Dorkpedia
You may be wondering how you can search for specific products or devices? The dorkpedia is for you. You also have a list of dorks to help you identify the most important risks exposed by your assets.
OQLv2 (version 2)
Although retro-compatible with OQLv1 queries, OQLv2 is a full rewrite of the ONYPHE application engine which allows for new and more-powerful features. In this initial version the following capabilities have been added:
- Condition groups
- Regular Expressions
- Clearer error conditions and feedback on incorrect queries
OQLv2 features are available for ASM-level and Ctiscan licences. See the Pricing page or contact us for more information.
Condition groups
As with OQLv1, Boolean conditions in OQLv2 are specified as follows:
- AND: implied by default between all filters;
- NOT: by prefixing a field name with ! character;
- OR: by prefixing a field name with ? character.
Condition groups allow for precedence when the query is parsed and executed. Parenthenses are used to start and end a group, with a leading space required within the group.
The syntax is as follows:
category:<CATEGORY> ( ?filter1:<VALUE1> ?filter2:<VALUE2> -<OR_FUNCTION1>:<FUNCTION_VALUE1> ) filter3:<VALUE3> -<FUNCTION2>:<FUNCTION_VALUE2>
In this example, the expression within the parentheses is executed and then joined as a Boolean AND with the other filters and functions in the query. Multiple Condition groups can be specified, as follows:
category:<CATEGORY> ( ?filter1:<VALUE1> ?filter2:<VALUE2> -<OR_FUNCTION1>:<FUNCTION_VALUE1> ) ( ?filter3:<VALUE3> ?filter4:<VALUE4> -<NOT_FUNCTION2>:<FUNCTION_VALUE2> )
- Search Ctiscan for EITHER medical devices OR SCADA devices using the http protocol AND having EITHER a .nl TLD OR a .uk TLD from EITHER DNS OR TLS certificates:
- Search Riskscan for the same devices (but using the Datascan/Riskscan data model)
New error conditions
- syntax error: validate: missing closing group after: Condition group was opened but not closed.
- syntax error: validate: closing group never opened after: Condition group was closed but matching opening parenthesis was not found
- syntax error: validate: invalid word found:: the indicated operator or word is invalid in the query