Glossary

Data terms made simple.

ABCDEFGHIJKLMNOPQRSTUVWXYZ

A

Anonymous proxy

An anonymous proxy is a server that acts as an intermediary between a user's device and the internet, masking the user's IP address to enhance privacy. It allows users to browse the web without revealing their identity, helping to bypass restrictions and maintain anonymity while accessing online content.

Read more
Alternative data

Alternative data refers to non-traditional data sources used to gain insights and inform decision-making. This can include social media activity, satellite imagery, web scraping, and transaction data, among others. Businesses leverage alternative data to enhance analytics, improve forecasting, and gain a competitive edge in various industries, particularly finance and marketing.

Read more
AI

Artificial intelligence (AI) refers to the simulation of human intelligence in machines designed to think and learn like humans. It encompasses various technologies, including machine learning, natural language processing, and robotics, enabling systems to perform tasks such as problem-solving, decision-making, and language understanding, often improving over time through experience.

Read more
AI agents

AI agents are software programs that use artificial intelligence to perform tasks autonomously or assist users. They can analyze data, make decisions, and interact with users through natural language processing. Common examples include virtual assistants, chatbots, and recommendation systems, which enhance user experience and streamline processes across various applications.

Read more
Antidetect browser

An antidetect browser is a specialized web browser that masks or randomizes identifying attributes (such as user‑agent strings, canvas/WebGL fingerprints, and IP addresses) so each session appears as a distinct, generic user.

Read more
API

An API (Application Programming Interface) is a set of rules and specifications that allows one application to access the features or data of another application. This enables different software systems to communicate and interact with each other

Read more

B

Brand safety

Brand safety refers to the measures and practices that ensure a brand's advertisements do not appear alongside inappropriate, harmful, or controversial content. It aims to protect a brand's reputation by maintaining a safe environment for its messaging, thereby fostering consumer trust and ensuring effective marketing outcomes.

Read more
Botnet

A botnet is a network of compromised computers or devices, controlled remotely by a cybercriminal. These infected machines, often called 'bots' or 'zombies,' can be used to perform malicious activities, such as launching distributed denial-of-service (DDoS) attacks, sending spam, or stealing data, without the owners' knowledge.

Read more
BeautifulSoup

Beautiful Soup is a Python library used for parsing HTML and XML documents. It creates parse trees from page source code, making it easier to navigate, search, and modify the parse tree. Ideal for web scraping, it helps developers extract data from websites efficiently and handle poorly formatted markup.

Read more
Bandwidth sharing

Bandwidth sharing refers to the practice of distributing available network bandwidth among multiple users or devices. This allows for efficient use of internet resources, enabling simultaneous connections and data transfer. While it enhances accessibility, excessive sharing can lead to reduced speeds and performance for individual users, especially during peak usage times.

Read more
Bandwidth

Bandwidth refers to the maximum rate of data transfer across a network or internet connection, measured in bits per second (bps). It determines how much information can be transmitted simultaneously, affecting the speed and quality of online activities such as streaming, gaming, and downloading. Higher bandwidth allows for faster and more efficient data communication.

Read more
Backconnect proxy

A backconnect proxy is a type of proxy server that automatically rotates IP addresses for each request, allowing users to maintain anonymity and avoid detection while web scraping or accessing restricted content. This technology enhances security and reduces the risk of IP bans by distributing traffic across multiple IPs.

Read more
Browser fingerprinting

Browser fingerprinting is the process web services use to collect browser data from their users to generate unique digital fingerprints for tracking purposes.

Read more
Bot traffic

Bot traffic is any traffic to a website or app that is generated by automated software programs (known as bots) rather than humans. Bots can simulate human behavior by performing tasks like browsing web pages, clicking links, filling out forms, or even making purchases.

Read more
Bots

A bot (short for "robot") is a software program that performs automated tasks over a network. Bots follow instructions to carry out actions, often mimicking human behavior, but at a much faster pace.

Read more
Bot detection

Bot detection is the method by which a website identifies bot traffic. There are a number of processes that websites can use to distinguish bot traffic from traffic generated by real people.

Read more

C

CSV

A CSV, or Comma-Separated Values file, is a plain text format used to store tabular data. Each line represents a data record, with fields separated by commas. CSV files are commonly used for data exchange between applications, making it easy to import and export data in spreadsheets and databases.

Read more
CSS selectors

CSS selectors are patterns used to select and style HTML elements in a web page. They allow developers to apply specific styles based on element types, classes, IDs, attributes, and more. By targeting elements effectively, CSS selectors enable precise control over the appearance and layout of web content.

Read more
Cron job

A cron job is a scheduled task in Unix-based systems that automates the execution of scripts or commands at specified intervals. It allows users to run tasks like backups, updates, or maintenance scripts without manual intervention, enhancing efficiency and ensuring regular system operations.

Read more
Client

In computing, a client refers to a device or software application that requests services or resources from a server. Clients can be computers, smartphones, or applications that interact with servers over a network, enabling users to access data, applications, or services hosted remotely.

Read more
CGNAT

CGNAT, or Carrier-Grade Network Address Translation, is a technology used by Internet Service Providers (ISPs) to manage IP address allocation. It allows multiple users to share a single public IP address by translating private IP addresses within a local network. This helps conserve IPv4 addresses and facilitates efficient network management.

Read more
cURL

cURL (short for Client URL) is a command-line tool you can use to transfer data to or from a server using various network protocols. It’s a versatile tool that allows you to make different types of requests, like downloading files, sending data, and interacting with APIs.

Read more
CAPTCHA

CAPTCHAs are tests that determine whether traffic to a website originates from a human or a bot. They work by providing challenges that are difficult for computers to solve but easy for humans.

Read more

D

Dynamic content

Dynamic content refers to web or digital content that changes based on user behavior, preferences, or real-time data. It personalizes the user experience by delivering tailored information, such as product recommendations or targeted messages, enhancing engagement and relevance.

Read more
Data collection

Data collection is the systematic process of gathering, measuring, and analyzing information from various sources to gain insights, inform decisions, and support research. It involves methods such as surveys, interviews, observations, and digital tracking, ensuring the data is accurate, relevant, and reliable for effective analysis and interpretation.

Read more
Device fingerprint

A device fingerprint is a unique identifier generated by collecting specific attributes of a device, such as its operating system, browser type, installed plugins, and screen resolution. This data helps websites and applications recognize and track devices for security, fraud prevention, and personalized user experiences without relying on cookies.

Read more
Data parsing

Data parsing is the method used to extract data from unstructured sources and convert it into a structured format. This makes it easier to analyze, send, and integrate.

Read more
Dataset

A dataset is a collection of data that's organized and stored in a structured format that makes the data easy to analyze or use.

Read more
Datacenter proxies

A datacenter proxy is a type of proxy server that is hosted in a data center, rather than being tied to a residential internet connection. Datacenter proxies are known for their high speeds and scalability, since they often run on powerful servers with significant bandwidth.

Read more

E

F

G

H

I

J

K

L

M

N

O

P

Proxylist

A proxy list is a compilation of proxy servers, which act as intermediaries between a user's device and the internet. These servers mask the user's IP address, enhance privacy, and enable access to restricted content. Proxy lists are commonly used for web scraping, bypassing geo-blocks, and improving online anonymity.

Read more
Protocol

A protocol in networking is a set of rules and conventions that govern how data is transmitted and received over a network. It ensures reliable communication between devices by defining formats, timing, and error handling, enabling interoperability and efficient data exchange across diverse systems and platforms.

Read more
Price monitoring

Price monitoring is the process of tracking and analyzing the prices of products or services over time. It helps businesses understand market trends, competitor pricing, and consumer behavior, enabling them to make informed pricing decisions, optimize profit margins, and enhance competitive positioning in the marketplace.

Read more
Port

A port in networking is a virtual point of connection that allows data to flow between devices over a network. It is identified by a number, ranging from 0 to 65535, and is used by protocols to differentiate between multiple services or applications running on a single device, facilitating communication and data exchange.

Read more
Puppeteer

Puppeteer is a Node.js library developed by Google that provides a high-level API for controlling headless Chrome or Chromium browsers. It allows developers to automate web tasks such as testing, scraping, and rendering web pages, enabling efficient interaction with web applications programmatically. Puppeteer is widely used for web automation and performance monitoring.

Read more
Python

Python is a popular programming language known for its clear syntax and readability. It's used for a wide range of tasks, from building websites and analyzing data to automating tasks and creating artificial intelligence.

Read more
Proxy server

A proxy server is an intermediary server that sits between your device (or application) and the internet, forwarding your requests and responses while hiding your IP address from the online resources you access. This helps you to access restricted content and bypass some bot detection measures when web scraping.

Read more

Q

R

Robots.txt

Robots.txt is a text file placed on a website's server that instructs web crawlers and search engine bots on which pages to crawl or avoid. It helps manage site indexing, control bandwidth usage, and protect sensitive information from being accessed or displayed in search results.

Read more
Rate limits

Rate limits in web scraping refer to restrictions set by websites on the number of requests a user can make within a specific timeframe. These limits help prevent server overload, protect against abuse, and ensure fair access for all users. Adhering to rate limits is crucial for ethical scraping and maintaining access to web resources.

Read more
rate limiting

Rate limiting is a technique used in network management to control the amount of incoming or outgoing traffic to or from a server. It restricts the number of requests a user can make in a given timeframe, preventing abuse, ensuring fair usage, and maintaining optimal performance and security of web applications and APIs.

Read more
Reverse proxies

A reverse proxy is a type of proxy server that sits in front of one or more web servers, intercepting all client requests before they reach the origin server. This allows reverse proxies to perform various functions like load balancing, security filtering, and caching to improve the performance, security, and reliability of the web server(s).

Read more
Residential proxies

Residential proxies are a type of proxy server that routes your internet traffic through a residential IP address. This means your online activity appears to originate from a real home or individual user, rather than a data center or business.

Read more

S

T

U

V

W

WebRTC

WebRTC (Web Real-Time Communication) is an open-source technology that enables real-time audio, video, and data sharing directly between web browsers without the need for plugins. It facilitates peer-to-peer connections, enhancing communication applications like video conferencing, online gaming, and file sharing, while ensuring low latency and high-quality interactions.

Read more
Web data

Web data refers to information collected from websites, including text, images, videos, and user interactions. It encompasses structured data (like databases) and unstructured data (like social media posts). This data is crucial for analytics, marketing strategies, and improving user experiences by providing insights into online behavior and trends.

Read more
Web Application Firewall

A Web Application Firewall (WAF) is a security solution designed to monitor, filter, and protect web applications from malicious traffic and attacks, such as SQL injection and cross-site scripting. By analyzing HTTP requests and responses, a WAF helps safeguard sensitive data and ensures the integrity and availability of web applications.

Read more
Web crawler

A web crawler, also known as a spider or bot, is an automated program that systematically browses the internet to index content from websites. It collects data for search engines, helping them understand and rank web pages based on relevance and quality, ultimately improving search results for users.

Read more
wget

wget is a free command-line utility that you can use to download files from the internet. It’s a robust tool that’s able to handle unstable network connections and supports various protocols, including HTTP, HTTPS, and FTP.

Read more
Web scraping

Web scraping is the process of collecting data from the web and aggregating it into one place. Although this can be a manual process (i.e. copy and pasting from websites yourself), “web scraping” generally refers to automating that process.

Read more
Web scraper

A web scraper is a tool that automatically extracts data from websites. It finds the information you want from web pages and puts it into a format you can use, such as a spreadsheet.

Read more

X

Y

Z