Stack Overflow, the popular social question and answer platform for developers, spent on April 1 an ingenious' innocent 'on the occasion of' April's Fools: every time a user tried to copy code posted on the platform, a message was displayed warning that they no longer had "free copy / paste" left, and encouraging them to obtain a key to solve it.
But while preparing this joke, they realized that if they could detect when someone was trying to copy code from one of their pages, they could also compile statistics to show which codes were the most copied.
A) Yes, developed a homegrown web tracking tool, creating custom events to monitor when one of the site users copied text from it. These events made it possible to collect data as if the text came from questions, answers or comments, from a block of code or normal text, or to know the reputation of the 'copier'.
They then proceeded to collect that data for two whole weeks, between March 26 and April 9, a period in which 40,623,987 copies were made. And so they discovered some curious data (and others, of course, quite expected):
Most of the copypastes come from anonymous users (86%)- The higher the user's 'reputation' on the site, the lower the code copy count.
Copypastes were carried out more frequently in days and working hours.
One in four users visiting a Stack Overflow question copied a code snippet within five minutes of reaching the web.
Curiously, the majority (52.4%) of the copies come from unaccepted answers, although the accepted ones reap a higher rate of copies per publication.
When checking the labels that originate more copypastes, we find the following ranking:
1. | html | css |
3. | python |
4. | python | pandas |
6. | python | pandas | dataframe |
7. | python | matplotlib |
8. | git |
9. | php |
10. | jquery |
"If you've ever felt bad about copying code from our site instead of writing it from scratch, don't worry! Why redo the wheel when someone else has already done the hard work? [...] Knowledge reuse is not a bad thing: it helps you learn, get working code faster, and reduce your frustration. "
What are the most copied publications?
One response code block: "With a post score of 3,497 and 11,829 copies, I am pleased to announce that 'How to iterate over rows in a DataFrame in Pandas' received the most copies. Answered in 2013, this question continues to help thousands of people every week." .
Plain text of a response: "As for the answer most copied with plain text, we have 'TypeError: this.getOptions is not a function [closed]'with a publishing score of 218 and 1,570 total copies. Although we could not confirm it, we suspect that what is being copied is the 'firstname.lastname@example.org' ".
One question code block: "And the most copied question with a post score of 2,147 and 3,665 copies, we have 'How to create an HTML button that acts like a link'".
Via | StackOverflow Blog