Explained

The 10 commandments of ethical data collection” {from a technical standpoint}

Ethical Data Collection: The 10 Commandments

There is a fine line between what is legal and what is ethical. Technology moves at a break-neck speed, and lawmakers have a hard time keeping up with it. There are many things you can do with modern technology that falls into a gray area that hasn't been touched by the legal process or regulation. They are legal, but should they be?

Online data collection and web scraping are some of those gray areas. This is the reason why ethical best practices should be applied to data collection processes. In other words, people and businesses involved in data collection need to police themselves with a set of rules that define when this ethical gray area is actually black.

Creating commandments to promote ethical computer and network usage has precedence. The Ten Commandments of Computer Ethics were created in 1992. They state:

  1. Thou shalt not use a computer to harm other people.
  2. Thou shalt not interfere with other people's computer work.
  3. Thou shalt not snoop around in other people's computer files.
  4. Thou shalt not use a computer to steal.
  5. Thou shalt not use a computer to bear false witness.
  6. Thou shalt not copy or use proprietary software for which you have not paid (without permission).
  7. Thou shalt not use other people's computer resources without authorization or proper compensation.
  8. Thou shalt not appropriate other people's intellectual output.
  9. Thou shalt think about the social consequences of the program you are writing or the system you are designing.
  10. Thou shalt always use a computer in ways that ensure consideration and respect for other humans.

Those commandments are a good starting point, but for data collection, we think there should be more. Here at Spider, we have taken the initiative to create a new set of commandments that we believe should be applied to the data collection process.

1. Opt-In and Opt-Out

The first commandment comes down to choice. Getting the consent of the users in a peer-to-peer network is essential. They should know exactly what they are signing up for when they opt-in, with nothing hidden in pages of legal speak or small print. And opting out should be just as simple as opting in.

Users should be able to opt-out at any time. This is essential to internet transparency and will help ensure the internet continues to be a platform for the decentralized flow of information.

2. Perform KYC (Know Your Customer) Identity Verification

KYC or Know Your Customer is a practice carried out by banks and financial institutions. It comes down to verifying the identity of customers in order to reduce the risk of theft, money laundering, and financial fraud. This same type of due diligence should apply to the customer of a data collection network to ensure that the business they represent, its website, its email accounts, and its social media accounts are legitimate.

3. Blocking API Endpoints That Could Be Misused

An API or Application Programming Interface makes it easy for one piece of software to interact with another in a network. However, they can also become a target of abuse. Users of a data collection network can leverage the power of software to create fake social media accounts, fake reviews, and fake ad clicks. API endpoints that have the potential of being abused in this manner should be blocked from access.

4. Traffic Rate Limiting

Data collection should not interfere with the standard usage of a network or website. Even when a data collection process is not even close to becoming a Denial-of-Service attack, it still will have some effect. It can affect the performance of a website and create a bad user experience for legitimate users. It can skew analytics being used on the site, as well as insights gathered from that data. A data collection network should know its targets and keep all data collection within the target's normal traffic levels by throttling the process.

5. Monitoring Global Network Usage

Automated data collection can and has been misused. Therefore, monitoring the usage of a data collection network is an important part of conducting your business ethically. An out-of-control data collection process can resemble or actually become a Denial-of-Service attack. Monitoring the network can prevent this by throttling usage before it affects services or websites on a network.

A customer's usage of a network should also be monitored to validate that they are using it in a way that fits the details in their KYC process. If a customer claims they are using the service for competitive intelligence but instead are using the network to create fraudulent ad clicks, then their account should be shut down. With global network monitoring, this is possible.

6. Peer Consent

Residential proxy providers depend on the devices of average internet users to provide IP addresses that their customers can use. To maintain this alliance, these providers need to furnish detailed terms of use documents to these peers. Peers should know what they signed up for, and they should never be opted in by default. This cooperation should also be rewarded. Users provide devices that Internet traffic can be routed through and should be compensated for this service.

7. Blacklisting Domains That Aren't Public

Online data collection legal cases have been settled in favor of the data collector only when that data has been freely available to any internet user. Domains that don't contain open source information that is available to the public should be blacklisted by default because they can be the target of abusive activities.

8. Only Idle Resources

The peers on a residential proxy network have signed up to route traffic through their devices. This does not mean that this relationship should be abused. These resources should only be used under specific conditions so that the peers don't experience any degradation of their normal device usage. Network resources should be used when the device is idle, connected to a WiFi network, and has enough battery power to handle the usage.

9. Set Network Limitations

A data collection network should know the average internet usage of the peers who signed up to provide IP addresses to the network. The usage of that device on the network should be limited to fit within that standard usage range. For example, if the peer only browses the internet an hour a day and doesn't do much else, then sending six hours of traffic through that device daily should not be allowed.

10. Adherence to GDPR Rules

The GDPR or General Data Protection Regulation is the toughest privacy and security law in the world, and it should not be overlooked. Personally Identifiable Information or PII should only be collected when the user consents to it. If they haven't consented, no data at all should be gathered.

Conclusion

Residential proxy and real-time crawler providers can choose to overlook the ethics of how they interact with a network and the devices on that network and claim that everything is fine as long as it is legal. Here at Spider, we don't think that is enough. We created these 10 Commandments for Ethical Data Collection and make sure we adhere to them each and every day.

When it comes to your business, it is important to use due diligence when choosing a proxy and data collection provider to ensure that the data you collect has long-term value, is legally viable, and doesn't risk the safety of your own devices, networks, and systems.

It is also important to know that these commandments are just a starting point. They are not written in stone. The technology involved in data collection and residential proxies is evolving rapidly. And these commandments will change in the future to ensure that Spider and the businesses that use our services are transparent and trustworthy into the future.

Next up

Let's get STarted

Start scaling your business with Spider today

Try Spider for Free