How to Use Residential IPs for Data Journalism
How to Use Residential IPs for Data Journalism
In 2013, the UK online news magazine The Guardian published a report on American government contractor Edward Snowdon, who had leaked classified files from the country’s most secretive spy shop, the National Security Agency. The contents of those files, and their implications, were difficult for most Americans to grasp. However, The Guardian took special steps to make the story simple with a series of detailed graphs, images, and charts on its Datablog.
With Snowdon's story and many others, The Guardian became a pioneer of data journalism — a new breed of journalism dedicated to discovering and sharing the stories hidden in large data sets of all kinds. Today, data journalism is a fast-growing field in which reporters work to make sense of the ever-expanding flow of data from billions of devices in constant communication around the world.
Mining the stores of Big Data can pose challenges for journalists who rely on personal devices with a residential Internet Protocol (IP) address. That’s because conducting a high volume of searches from a unique residential IP can be a red flag for suspicious online activity — and it can get a user flagged or blocked entirely. But residential IP proxies can help data journalists still get the information they’re looking for — safely and anonymously.
Data Journalism: Investigative Reporting in the Age of the Internet of Things
News reporting has always depended on facts and hard data — the building blocks of stories that change the lives of people and even the world. But we’re now living in the age of the Internet of Things, when torrents of data are constantly being generated and shared, leaving consumers overwhelmed and confused.
Data journalists specialize in extracting insights from data and crafting stories that make those insights real and relevant to readers’ lives. These stories combine an impressive array of visuals such as charts, animations, and infographics with the traditional journalistic techniques of narrative and reportage to help readers make sense of the news.
The data that forms the backbone of data journalism comes from a multitude of sources, and collecting that data often involves web scraping — extracting data from a website with dedicated scraping software that directly accesses the web itself. Web scraping and other information-gathering tools are staples of doing business online, and it’s legal within certain limits, even when done without the site’s knowledge or consent. But whether you’re collecting statistics for an article or tracking the performance of your closest competitor, scraping sites isn’t always easy.
Websites have ways to detect the activity of scraper bots or crawlers, and their security systems can recognize the scraper’s IP address and either block it or feed it false information. Along with that, repeated, high-volume searches and scraping activity from a single, identifiable IP address can get a user blocked or reported, shutting down essential avenues for collecting data for developing a story.
What Are Residential IPs, and What Do They Mean for Data Journalists?
An Internet Protocol, or IP, address is a unique address assigned by an Internet Service Provider, or ISP, to every device capable of accessing the World Wide Web. Residential IP addresses are attached to physical devices such as smartphones and computers, but data centers and other internet-capable entities also have their own IP addresses.
Every time a device such as a computer or smartphone accesses the Internet, its IP address is publicly available, revealing the device’s location and other details about its owner. That means that when you’re collecting data using scraper software or other tools, your “real” residential IP address is revealed, with its attendant security risks and potential for blocking. Search engines also track the number of searches made from a unique IP within a set period of time, and a large amount of search activity that suggests a bot can get an account blocked.
To avoid those and other problems, journalists, marketers, and many others with legitimate reasons to access data on the Web are turning to reliable, secure residential IP proxies. Along with data mining and web scraping, those reasons could include monitoring competitor activity, getting access to geo-specific content, or conducting market research.
Residential IPs Protect Identities and Locations
Residential proxies provide a way to conceal a user’s actual IP address by routing traffic through an intermediary that changes the original IP address to that of another residential IP. Because most online services recognize residential IP addresses as belonging to real people, a residential IP proxy makes it appear that you’re simply a different individual conducting a search from your IP. And because residential proxies can originate from many different locations, they can also allow you to access geo-restricted content, or be limited to a specific country or region.
Since search engines typically allow only a specific number of searches per minute from the same IP, many proxy providers offer proxy rotation, which hides your real IP address behind a pool of proxy addresses that switch at regular intervals to make it appear that searches are originating from multiple users.
Residential IP proxies aren’t the only option for hiding a user’s real residential IP, although they are the safest ones. Datacenters also provide proxies that cloak a user’s real, residential IP, but datacenter proxies are easily recognized by most online security systems. These proxies are often sold cheaply in bulk, and they’re popular among the web’s bad actors, such as identity thieves and scammers. Other proxy providers route users’ requests through mobile proxy networks so that traffic appears to originate from a personal mobile device.
Although these other types of IPs also cloak a user’s actual IP address, IT experts warn that they aren’t always secure and might easily be recognized by a site’s security systems. Residential IP proxies, which are actual residential IP addresses, are a safer and more effective option for legitimate users, which include not only journalists but also marketers, content creators, and advertisers tracking the performance of campaigns.
How to Set Up Residential IPs for Scraping and Monitoring
Residential IPs are sold and managed through many online proxy providers, and they can be purchased in a variety of ways such as by subscription or in bulk. These IPs originate from countries around the world, so users can use them to access content that’s not available in their area. Proxy users can typically customize their proxy setup from an account dashboard, with options for setting the number and speed of proxy rotations, specifying which area’s addresses to use, and selecting a “sticky” proxy that stays the same for a specified amount of time.
Once you establish an account with a reputable IP provider, you can connect to the proxy server from your web browser. To do that, you’ll need to configure your browser settings. Under “Settings,” choose “Advanced” and then select “Internet Properties.” Here you’ll find the current LAN and proxy settings. Enable “Proxy Settings” and enter the IP address and port number of the IP proxy server you want to connect.
When properly configured, this server will act as the intermediary between you and the World Wide Web. Some browsers, such as Google Chrome, also support extensions that enable setting up your proxy server connection.
Because residential IPs are assigned to devices owned by individual users, they signal web browsers that any searches originating from that IP are from a real person doing legitimate business online. But when that legitimate business requires high volume searching, residential IP proxies can help journalists and other professionals protect their privacy, search safely, and sidestep the red flags of suspicious activity.