Suggestive signals: how to tell good bot traffic from bad

Senior Director of Product Marketing, Fastly

June 16, 2021

Bots get a bad rap, but not all are built to cause harm. Simply put, bots are used to perform tasks without human intervention — and they account for more than 40% of all web traffic.

But while some bots are benign search engine crawlers or website health monitors, others are on the prowl with nefarious intent, looking to execute account takeovers and compromise APIs. These are the ones you need to put on your radar, but how do you tell them apart in order to allow the good bots and block the bad ones?

Bad bots, which run code repeatedly and generate automated web traffic, unleash floods of web requests against the login and other key transactional web pages of retailers and banks, or any organization that stores valuable personal or financial data and makes it accessible via a web application or API. Let’s examine three of the more common scenarios in which a threat actor leverages a bot to achieve a fraudulent goal and what signals to look for in your traffic.

Content scraping

Web scraping bots automatically gather and copy data from other websites. They can disguise themselves as innocuous search engine crawlers as they scan content, but these search bot imposters steal content without the knowledge or permission of the website owner.

Attackers can repurpose this content in various exploitative ways. They might republish copyrighted television shows or paywalled news articles, or they might duplicate blog posts to steal SEO value and organic traffic. Similarly, they’ve been known to use scraped content to gather product pricing or inventory data to gain a competitive advantage or compile contact information to sell to other businesses as sales targets.

Inventory scraping bots visit product pages, perform searches, and scrape site data — you can identify these bots through pre-identified bad IP ranges from various sources like SANS. And in the case of our next-gen WAF (formerly Signal Sciences), they can be spotted through our Network Learning Exchange, proprietary technology that identifies suspicious traffic from sources that have been confirmed to be malicious and protects customers against subsequent attacks originating from identified IP addresses.

Account takeover

When breaches occur, it often results in large dumps of user credentials on the Dark Web. Threat actors can purchase the stolen usernames and passwords and use automated bots to rapidly test the combinations in the authentication flows of major retail and financial websites. This process is known as “credential stuffing.” Once valid credentials are found, they’re deployed against other sites to take over website accounts and lock out legitimate users. Attackers can also take personally identifiable information (PII) and stored payment methods from those accounts to commit further fraud.

The key here is to monitor key authentication events, which requires visibility into where ATO happens — at the point of account creation and login. Define a baseline for what is a “normal” or expected volume of request traffic over a specific time frame to provide a guidepost for what is not normal. Then, when authentication events, like login attempts or password resets, spike above an expected threshold, alerts can notify stakeholders and automated blocking can be put in place to prevent these abnormal requests from reaching an app or API endpoint. Monitoring web request activity in this manner also empowers DevOps and security teams to quickly identify malicious activity that causes requests to spike.

API abuse

APIs function as the backbone of modern web, cloud, and mobile applications, so it’s no surprise that attackers use bots to mimic legitimate API consumers. Gartner estimates that by 2022, APIs will be the most frequent attack vectors for enterprise web application data breaches. Clearly, API security must be part of any strategic security plan.

APIs transfer a variety of data as organizations carry out their business operations. Automated bots probe APIs in an attempt to extract sensitive data like PII or credit card numbers. For example, adversaries use bots against public-facing APIs by spoofing XFF header information to execute account takeover attacks.

Detecting and blocking malicious requests is key to preventing attackers from abusing APIs and causing service disruption, data leakage, or account lockouts. Defeating API abuse requires visibility into where and how attackers are attempting to manipulate your application’s business logic, including authentication events. In order to surface those real-time insights, you’ll need to instrument and monitor your application for key application transaction events, which vary based on the API’s function.

For example, gift card cracking occurs when bot operators attempt to brute force an API that enables users to check their gift card balances with an end goal to validate gift card numbers. Abnormally high requests against the gift card API and failures from a single IP can indicate a brute force attempt in progress.

Scalping bots are deployed against a purchase flow in an attempt to buy discounted or limited-edition items. The fraudsters use stolen credit or stored value cards to purchase merchandise and then sell it elsewhere at a premium or at high volume. An indicator of this type of bot would be higher than expected “add to cart” activity from a single IP.

The key to bot mitigation

Although visibility is important to protecting your web apps and APIs, it’s not enough to stop the bad bot-generated traffic. You also need accurate insights into traffic targeting applications or API endpoints, as well as advanced rate limiting that can both detect and prevent non-human behavior on applications and APIs.

Look for a solution that allows you to rapidly establish application-specific rules to help prevent app and API abuse, including user agent, path, method, scheme, post or query parameter, request cookies, and more. When those signals are tripped, automated actions occur and can include blocking a web request, alerting your teams, or other appropriate action.