Data scraping has revolutionized the process of collecting and formatting data.

Scraping programs allow researchers, statisticians, and other data users to collect information from nearly any public online webpage in a matter of seconds.

Furthermore, many scraping programs can function dynamically. Such dynamic programs do not simply scrape the source webpage a single time; rather, dynamic scrapers repeatedly pull data from the desired online source, allowing users to create data spreadsheets that update themselves automatically.

This dynamic function can be particularly useful for industries that rely on quick, real-time updates for large sets of data, such as trade and investment firms that need to continuously monitor price movements.

Artificial Intelligence Crimes: Be Cautious

Even further, many data scraping programs are very accessible and inexpensive: Microsoft Excel has its own built-in scraping program, and there are several free scraping extensions offered by the Google Chrome Web Store. Indeed, data scraping technology is improving rapidly, but such improvements have raised ethical concerns regarding the potential applications of scraping programs.

Scraping programs can be engineered to extract information from any public webpage. This includes any personal information that is publicly shared via social media, including on platforms such as Facebook, Twitter, Instagram, and YouTube.

Using Ai & Big Data To Predict The Weather

In other words, if you upload any personal information to a public social media profile, a scraping program could potentially retrieve and store such information in an instant. This could include pictures, names, locations, phone numbers, and email addresses.

The possibility of personal information being discreetly scraped and stored is very alarming, and prompts the following questions: Is this legal? How can I prevent this? Is this happening right now?

Deep Fakes: Terror In A Data Driven World

There are legal and corporate regulations that address these questions and concerns.

The Computer Fraud and Abuse Act (CFAA) forbids the retrieval of online information from programs that have “unauthorized access” to a webpage. Furthermore, Twitter, Facebook, YouTube, and Venmo explicitly prohibit scraping of user information in their Automated Data Collection Terms.

Does this mean that your social media profiles are protected from scrapers? Not exactly. Unfortunately, the protection offered by the CFAA does not necessarily apply to public social media profiles; profiles set to a “Public” setting technically grant “authorized access” to all web visitors, including automated scrapers.

Social media users can prevent unwanted scraping by switching their profile settings from “Public” to “Private,” as this would limit the amount of information that is made publicly available and also legally protect such information from any automated programs.

But what if you would rather have a public profile?

Amazon’s Augmented Reality

Responsible Facial Recognition

Do company regulations protect public profiles from being scraped?

In practice, no.

While Twitter, Facebook, and other social media companies prohibit scraping on their platforms, programmers and softwares can simply ignore these rules and scrape user information regardless.

A current and noteworthy example of such a software is Clearview AI: a state of the art facial recognition application that has recently caused controversy regarding the future of data scraping technology.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alexander Fleiss

Alexander Fleiss is the CEO of Rebellion Research, A Scientist, Teacher & Ai Researcher