Is Web Scraping Legal?
Is Web Scraping Legal?
Modern business runs on data. It is how a business keeps tabs on competitors, discovers insights on its industry, including where they rank in it, and provides important information for customers. In fact, the business model of many enterprises depends on big data. And one of the most common, cost-effective ways to gather this data is by scraping web pages for it.
Web scraping is an automated process of downloading web pages and extracting specific data from them. The practice of web scraping is in a gray area. Is it legal? Is it illegal? In this article, we will try to answer that question for you, provide you some information on legal cases that involved web scraping, and give you some tips to help make sure you aren't breaking the law.
Before we get started, please remember that the purpose of this article is to provide you with information regarding web scraping and should not be considered legal advice.
Is It Legal or Illegal to Scrape a Website?
So is scraping a website legal or illegal? The good news is that some types of web scraping are perfectly legal. After all, if you want to scrape your own website, you can do that all day long. In fact, many businesses do when they run software to audit their site to determine if it has any errors, is search engine-friendly, or is prone to security breaches. Without this ability, they would have to check the site manually, which can be tedious and time-consuming.
And even scraping a site you don't own can be legal, as long as the information you are scraping is available to anyone browsing the internet. But it can also be illegal. Here are specific cases where scraping a website could break the law:
- If you must log in to a website to scrape the data you need, then this data is not available to the public. So you could be doing something illegal. Usually, sites that give you an account have you agree to the site's Terms of Service, which is legally binding. Most sites don't allow automated data collection.
- Just because you are scraping publicly available data doesn't mean you can do anything you want with it. A lot of artists and writers put their works online, but those works are still copyrighted. Any type of creative work can be copyrighted, including pictures, videos, articles, and blog posts.
- Even if you don't plan to sell the data or use it for anything other than personal use, you could break the law if you agree to a Terms of Service that forbids web scraping. And it is possible to agree to a Terms of Service simply by browsing a website.
Historical Web Scraping Cases
The legality of web scraping has had a bumpy ride since the beginning of the internet. In the early years, web scraping was frowned on. But in recent cases, automated data collection has been given more leeway and has set the tone for future web scraping cases.
eBay, Inc. vs. Bidder's Edge, Inc. (2000)
In this early web scraping legal case, eBay filed a preliminary injunction against Bidder's Edge. Bidder's Edge aggregated the data of multiple auction sites and used a crawler to scrape data from eBay's site. In this injunction, eBay claimed that Bidder's Edge violated the Trespass to Chattels law. In the end, the case was settled out of court. But the court granted this injunction because users of eBay agreed to the Terms of Service. Furthermore, bots could damage eBay's systems.
Intel Corp. vs. Hamidi (2003)
Just a few years later, these opinions changed. This case was brought against Hamidi because after he left Intel, he sent emails to Intel employees who were part of his support group. Intel claimed that these emails, which were critical of Intel's employment practices, constituted a violation of California's Trespass to Chattels law. The Supreme Court ruled that Hamidi didn't violate any laws.
Although this case was not about web scraping, it was based on the same law used in most web scraping cases. Consequently, since that ruling, it has been used as precedence in many other scraping cases that ultimately sided in favor of the parties doing the web scraping.
Facebook, Inc. vs. Power Ventures, Inc. (2009)
Fast-forward to 2009, and we see Facebook winning a case against a web scraper. Power Ventures allowed users to aggregate their social media information from all of their social sites, including Facebook. Facebook charged the company with copyright infringement, trademark infringement, unlawful competition, and violation of the computer fraud and abuse act. The judge ended up siding with Facebook because Facebook's Terms of Service prohibited automated data retrieval.
Ryanair vs. PR Aviation (2015)
The tide changed again a few years later, starting with Ryanair vs. PR Aviation. This case was held in the Netherlands. PR Aviation is a company that compares the prices of flights and provides the results to users. Because one site scraped was Ryanair, they took PR Aviation to court, claiming that PR Aviation breached a contract. The court ruled in favor of PR Aviation because there was no formal contract between the two companies and the scraped data was publically accessible.
Ryanair vs. Expedia (2019)
A few years later, Ryanair tried the same thing again against another, more well-known flight price comparison company—Expedia. Ryanair sent Expedia a Cease and Desist letter. And when Expedia ignored it and continued to scrape data from the Ryanair site, they sued Expedia for breaching the Computer Fraud and Abuse Act. The court ruled that the CFAA applied in this case, and Ryanair and Expedia settled out of court.
HiQ labs vs. LinkedIn (2019)
In the same year, LinkedIn took HiQ labs to court for scraping public LinkedIn profiles. HiQ labs use this data to provide businesses with information on employees. They had been doing this for years until LinkedIn issued a Cease and Desist letter. HiQ labs sought an injunction and won in court. The court decided that the data that HiQ labs were scraping was available to the public and that companies like LinkedIn should not be able to decide who can use this public data.
Web Scraping Best Practices
Here are some tips to help you stay on the right side of the law if you plan on scraping websites. Again, this is only advice, and you should get legal consultation before you start automated data retrieval.
- Check if the site you plan to scrape provides an API for the data you seek. Sometimes the same data is available through an API. If so, use the API instead. It not only benefits the site you are scraping by putting less of a load on their servers; it also benefits you because you can retrieve data more efficiently through an API.
- If the site has a robots.txt file, respect its rules. If the site doesn't allow scraping, then don't scrape it.
- Respect the Terms of Service (ToS) of the website you plan to scrape. If it disallows automated data collection, then don't scrape the data.
- Always check the copyright status of the data you are scraping. Many types of creative work can be copyrighted. If you publish this data, you will have to get permission from the copyright holder first.
- It is also important to scrape a website using a residential proxy provider. While you may be within your legal rights to scrape some websites, the owners of those sites can still blacklist or block your IP address if they choose to do so. A residential proxy provider will allow you to switch to a new IP address whenever you need it, so your scraper won't be blocked.
So is web scraping legal or illegal? The answer still is and always has been "it depends." Even the legal cases that involved web scraping have jumped back and forth over the years. The latest legal cases seem to give more leeway to people and businesses that practice automated data scraping on publically accessible data. However, it still is safer to use a residential proxy provider when you are scraping sites.