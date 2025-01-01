Extracting your data

This guide only applies to Next-Gen WAF customers with access to the Next-Gen WAF control panel.

Next-Gen WAF stores requests that contain attacks and anomalies, with some qualifications. If you would like to extract this data in bulk for ingestion into your own systems, we offer a request feed API endpoint that makes available a feed of recent data, suitable to be called by (for example) an hourly cron.

This functionality is typically used by security operation center (SOC) teams to automatically import data into security information and event management (SIEM) solutions such as Datadog, ELK, and other commercial systems.

Data extraction vs searching

We have a separate API endpoint for searching request data. Its use case is for finding requests that meet certain criteria, as opposed to bulk data extraction:

Feature Data Extraction Searching Endpoint path /feed/requests /requests Returned requests All, as filtered by signals Only as specified by query syntax Max results All requests, 1000 at a time 1000 requests Time window All requests from the specified time period and as paginated according to the limit query parameter Up to 7 days worth of requests at a time Data retention Access data from up to 7 days ago Access data from up to 30 days ago

Time span restrictions

The following restrictions are in effect when using this endpoint:

The until parameter has a maximum of five minutes in the past. This is to allow our data pipeline sufficient time to process incoming requests - see below.

parameter has a maximum of in the past. This is to allow our data pipeline sufficient time to process incoming requests - see below. The from parameter has a minimum value of 24 hours and five minutes in the past.

parameter has a minimum value of in the past. Both the from and until parameters must fall on full minute boundaries.

and parameters must fall on full minute boundaries. Both the from and until parameters require Unix timestamps with second level detail (e.g., 1445437680 ).

Delayed data

A five-minute delay is enforced to build in time to collect and aggregate data across all of your running agents, and then ingest, analyze, and augment the data in our systems. Our five-minute delay is a tradeoff between data that is both timely and complete.

Pagination

This endpoint returns data either 1,000 requests at a time or by the size specified in the limit query parameter. If the time span specified contains more than 1,000 requests (default) or more than defined by the limit parameter, a next URL will be provided to retrieve the next batch. Each next URL is valid for one minute from the time it's generated.

Retrieved data can vary in size, sometimes greatly. To avoid exceeding URL size limitations, send the next parameter and its value as POST parameters in a POST request using a Content-Type of application/x-www-form-urlencoded .

Sort order

As a result of our data warehousing implementation, the data you get back from this endpoint will be complete for the time span specified, but is not guaranteed to be sorted. Once all data for the given time span has been accumulated, it can be sorted using the timestamp field, if necessary.

Rate limiting

Limits for concurrent connections to this endpoint:

Two per site (also known as a workspace)

per site (also known as a workspace) Five per corp (also known as an account)

Example usage

A common way to use this endpoint is to set up a cron that runs at 5 minutes past each hour and fetches the previous full hour's worth of data. In the example below, we calculate the previous full hour's start and end timestamps and use them to call the API.

Python

import requests import os import json import calendar from datetime import datetime , timedelta , timezone NGWAF_EMAIL = os . getenv ( 'NGWAF_USER_EMAIL' ) NGWAF_TOKEN = os . getenv ( 'NGWAF_TOKEN' ) NGWAF_CORP = os . getenv ( 'CORP_NAME' ) NGWAF_SITE = os . getenv ( 'SITE_NAME' ) if not NGWAF_EMAIL or not NGWAF_TOKEN or not NGWAF_CORP or not NGWAF_SITE : raise EnvironmentError ( "Please set NGWAF_EMAIL, NGWAF_TOKEN, NGWAF_CORP, and NGWAF_SITE environment variables." ) base_url = 'https://dashboard.signalsciences.net/api/v0' headers = { 'x-api-user' : NGWAF_EMAIL , 'x-api-token' : NGWAF_TOKEN } until_time = datetime . now ( timezone . utc ) . replace ( minute = 0 , second = 0 , microsecond = 0 ) from_time = until_time - timedelta ( hours = 1 ) until_time = calendar . timegm ( until_time . utctimetuple ( ) ) from_time = calendar . timegm ( from_time . utctimetuple ( ) ) get_url = f' { base_url } /corps/ { NGWAF_CORP } /sites/ { NGWAF_SITE } /feed/requests?from= { from_time } &until= { until_time } ' print ( f"Fetching data from: { get_url } " ) print ( f"from_time: { from_time } , until_time: { until_time } " ) def fetch_paginated_data ( url ) : data_list = [ ] next_value = None response_raw = requests . get ( url , headers = headers ) if response_raw . status_code != 200 : raise RuntimeError ( f"Failed to fetch data from { url } . Status Code: { response_raw . status_code } " ) response = response_raw . json ( ) data_list . extend ( response . get ( 'data' , [ ] ) ) next_uri = response . get ( 'next' , { } ) . get ( 'uri' , '' ) while next_uri : next_value = next_uri . split ( 'next=' ) [ - 1 ] post_url = f' { base_url } /corps/ { NGWAF_CORP } /sites/ { NGWAF_SITE } /feed/requests' post_data = { 'next' : next_value } headers [ 'Content-Type' ] = 'application/x-www-form-urlencoded' post_response_raw = requests . post ( post_url , headers = headers , data = post_data ) if post_response_raw . status_code != 200 : raise RuntimeError ( f"Failed to fetch paginated data from { post_url } . Status Code: { post_response_raw . status_code } " ) post_response = post_response_raw . json ( ) data_list . extend ( post_response . get ( 'data' , [ ] ) ) next_uri = post_response . get ( 'next' , { } ) . get ( 'uri' , '' ) if not next_uri : break next_value = next_uri . split ( 'next=' ) [ - 1 ] return data_list data = fetch_paginated_data ( get_url ) print ( json . dumps ( data , indent = 4 ) )

