Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

if I ever needed to show proof of my time there it would only be a Google search away

On that topic, has anyone discovered why Google deep-six'd Usenet archives it acquired with Deja News?

You used to be able to find specific posts from specific posters with by: and other operators. Sometime in the aughts it degraded quickly to the point where I can't find threads from which I have explicit excerpts and full author names.

Does someone high up in Google have an embarrassing usenet history? Did it just fall in disrepair?



> Does someone high up in Google have an embarrassing usenet history?

I don't know if that's true (it probably is), but personally I am glad that almost all of my old Usenet posts have vanished. I was horrified when Deja News started up. That was the moment I realized the internet is forever and decided to never use my real name or to upload any pictures of myself to anything connected to the internet.

Of course, I'm still screwed because I use a smart phone and probably several entities have that data and could connect the dots, but the average person I encounter can find out very little about me with just my name.


> I don't know if that's true (it probably is), but personally I am glad that almost all of my old Usenet posts have vanished.

Same. If I could go back I wouldn't share any details about my life. Best case scenario, nothing happens. But everything can (and will) be used against you by the court of public opinion. The last few years shredded any '90s idealism I had about the internet.

> The internet is forever

That must be scary for upcoming generations. Now many of our early screw ups and thoughts are recorded in some form or another and there's little way to move on. There's no time/distance element that allows you to grow out of whatever you – or others – uploaded. In the worst cases (revenge porn, false accusations, lolcows) you can't put your old life behind at all.


It's a lot better now. In the past we had usenet, public forever. Now our wall gardens give ability to share semi privately and hide most info that hasn't been leaked yet.


> That must be scary for upcoming generations.

Not on long time scale, I think. Services will disappear, disks will be reformatted, bit-rot will erode archives.


I take the opposite approach: I always use my name on the basis I never need to be worried about being deanonymized. Also keeps me from posting something I might regret.


Unfortunately we often don’t know that we’ll regret something when we do it.


I take an entirely different approach: My name is basically a globally-unique identifier to me, and most of my aliases are probably easily traced back to me, so I am forced to conduct all of my online business with the threat of being "found out"


I think that might be the same approach...


I posted a lot of crap on Usenet. While nothing that could be held against me, there were definitely a number of nerd wars I engaged in back in 1992.


mac vs. windows: FIGHT!

iOS has joined the room


Nahh back then it was Mac vs Amiga. Windows was so bad back then it wasn’t even a contender.


Ahem. You're all missing the point. But that's probably because you're all reading this using Emacs.


Nope nn over telnet.


I have been thinking about that these past few months and asking myself when and why I started using my real name online even though I was raised in the pseudonym era of the 90's. I think it's around when Skype was introduced and it made sense to use real name for discoverability.


I split the difference. My name namrog84 Looks like a pseudonym and number from previous era.

However it's my last name backwards. Gorman. And I was born in 84 so it's not just a random throw away. Also namrog without number is often taken.

So i still use it professionally because it's still closely related to my identity. Yet majority of people would never know if they didn't know my name.


If the Longshot ever occurred that some decade old posts surface, the simple solution is to lie.

Repeat after me: "no I did not make those posts"


Google allowed you to request your posts be deindexed. I helped somebody get some particularly embarrassing messages deindexed. It’s been more than a few years, so I forget the process, but I do remember that verifying their authorship was somewhat cumbersome.


I actually convinced the guy who founded Deja News to delete most of my early posts in 1996. They still lived on in some quotes, though.


Which doesn’t help if anyone replied to them.


This is the problem.

I've mentioned on multiple occasions that the current post-Snowden security and privacy movement is creating a serious threat to the Internet history preservation, and to some extents, threat the understanding/insights of the human civilization in the digital age.

My personal interest is Internet culture and communities. And I'm not amused by this comment, let me talk about the problem briefly.

Online communication from 1970 to 1995 was almost completely public, archived indefinitely. You can still read every single comment by every hacker from the late 80s in Usenet archives, sometimes even back to the ARPAnet era. There are a million posts to read and no spam and low-effort posting at all (by modern standards, even many flame wars seem to be high-quality). You can easily lost days, months or even years in the Usenet archive.

Records like these are often the only remaining records of the online communities, a snapshot of great historical and cultural value. To me, even the controversial political flamewars are interesting as they reveal parts of the history I would not know otherwise (I guess if someone rereads Reddit threads about Donald Trump today in 2055, he/she may have a similar feeling).

On the other hand, you also have names, addresses, and even phone numbers of almost everyone posted on Usenet. It was not a big problem when the access to Usenet/Internet was exclusive to members of the academia, and at a time when there was almost no systematic, organized abuses of the personal information. But today's different, we have big and little brothers who have "Collect Them All" as their slogan, and they are actively trying to exploit the information available to the maximum extents.

What is the response then? People (at least many in the hacking community) start to prefer private, semi-private, or in-group communities over public communication, often protected by cryptography. Some people also actively erases/purges their footprints, for example, some would delete every single post when they left a community, no matter how insightful they are, others may even deliberately insert misleading or false information. And we have something roughly similar to Vernor Vinge's True Name (describes an underground hacking community in the cyberspace). Good, now personal privacy and security is more or less protected by using the cryptographic barrier.

But what is it doing then? We are now creating a unprecedented, HUGE GAP of information in history, within our life time, we are now entering a new digital Dark Age where no one has seen before.

Centralized and/or proprietary services often delete information when they go out-of-service, too, so we need to archive them, desperately. You can't imagine how many resources/memories that are extremely valuable to members of some communities exist solely on a single web server/service provider. I remember reading a post from Schneier's blog that says a website contains numerous posts of wine culture were gone forever when the hard drive failed, and one commenter said that he uses w3m/lynx CLI browsers, and records everything he reads to his hard drive so he would never lost a single piece of information he has seen.

Is it an act of little brother surveillance? Arguably, it can be seen as one. But is it justified? I would say yes, and even say we need more people doing this, systematically. Naturally, archive.org was born in this way.

But then it faces the same issue. On one hand, many archived information can be abused, on the other hand, the more archive-refusing people we have, the more damage to historical records is made.

I don't know how to solve this problem.

The only way I can think of, is (1) Cypherpunks were correct. Anonymity is crucial in the information age, and we should have more of it: never use True Name and reveal personal information unless absolutely necessary, use an anonymous network (e.g Tor) if possible, discard identities periodically (but if you delete posts, it still reverts to the original problem...), (2) Encourages further developments and applications of anonymity, and (3) Training people to assume every piece of information published publicly cannot be removed, may be abused, and they should be able to withstand all possible consequences of it. But now, "life" and "information" simply become inseparable, if you are active in a community you have to post something...

I don't know the solution.

---

Appendix 1: what you can see in an Usenet archive.

You can see people's reactions to the rumors that Apple would release new 68000-based machines, how Larry Wall was releasing patch-2.0, the debates about the audio fidelity of vacuum tube/transistor amplifier, how /bin/sh on System V was having a problem with "CFLAGS=-g make", the first hand perspective on the impact of Great Renaming, Richard Stallman announcing the GNU project and Linus's flamewar on microkernel with Tanenbaum, early Sci-Fi fandom culture posted from a 4.3-BSD machine (beta version!) and how it influenced the hacker culture, how the anti-spam movement gained momentum due to its intellectual challenges, raw discussions of fringe political movements (some interesting ones related to tech include the Cypherpunk movement, and Exopian, an early sect of transhumanism), FAQs on almost every subjects written by the active participants of the community who have probably spent hundreds of hours, some weird "emergent" memes/phenomena created out-of-nothing from the collective community, and also have a laugh on thousands of forgotten Internet memes, like alt.religion.kibology, Usenet Oracle.

The only downsize is: no external resources is accessible, and nobody is going to reply you. It feels like Otomo Katsuhiro's animation Memories, the protagonists in trapping inside a 3D Hologram simulation of the past, created by the supercomputer in the abandoned space station.


Most communication throughout history has been ephemeral, and lost as soon as the people relevant died without relaying it to someone else.

Consider the 1800s, where much of our understanding of the attitudes of the day comes from newspapers and archived letters. Then consider how many more were discarded once they had served their purpose.

Today it's possible to archive all of that ephemeral information, but it has never been necessary.

Usenet, for example, was thought to be ephemeral because at best you had a few months worth of posts archived on your server and maybe your local machine, so if you said something boneheaded, it was going to naturally fall off the internet sooner rather than later. As it turns out, that was an incorrect view of the world.

Most forums are treated by their users the same way; a place for people to meet and talk about things in quasi-realtime, but not to archive those discussions for all time. Of course, as it turns out, those discussions are archived for all time, or until the forum closes or has a catastrophic data loss (e.g. this one we're on now).


    Consider the 1800s, where much of our understanding of the attitudes of the day comes from newspapers and archived letters. Then consider how many more were discarded once they had served their purpose.
Don't you think the world would be a somewhat better place if we had more records, and therefore more understanding, of what people thought in these times?

Also consider the class implications here. Letter-writing (and archiving) was something more often practiced by folks who were "elite" in terms of wealth and/or education. The thoughts and opinions of these classes have disproportionate representation in our understanding, and those of people in lower classes are underrepresented or erased entirely.


Just wanted to say that I agree with your sentiment. It's supremely aggravating to browse eg. Reddit and find some potentially interesting posts wiped out by some memory-erasing bot. And it is amazing how much we managed to preserve from historical periods, despite (as siblings point out) the theoretical ephemerality of information. I think it is important for subsequent generations to be able to study social phenomena of the past, and at least have a chance to be wiser. We could move beyond anecdotes, speculation and quasi-historical garbage, as "decline of America mimics decline of the Roman empire" meme regularly popping out in newspapers and such.

From my experience people are fascinated when they find some saved communication/memorabilia from years back, and the only downside for them is the fear of being judged. The best dream situation would be where you can have changable identities, but there is an incentive to build respect for them over time (like on HN or Reddit), and there is a strong social convention that you could not be in any way harassed for what you said with such identity. You can also change it, like a mask (maybe in some minimal intervals), and it will be respected. Maybe also you can't refer to anyone's extra-online identity, and it concerns also people from the outside unless they're public figures.

Or, that you can freely say anything under your real name under some online circumstances. It would be similar to carnival in traditional culture, with role reversal, rule suspension and all that. Although for this to work it should have an aura of unseriousness and inherent lack of consequence for the normal reality.

I know what total anonymity with no continual identity leads to (vide *chans), but it is not even what we are talking about here. We could do better with some better social conventions, I think. Ie. have good amounts of both freedom and historical preservation.


99.999+% of history has been lost, and that's OK. The modern fetish of archiving everything is not necessary or healthy.


> 99.999+% of history has been lost, and that's OK. The modern fetish of archiving everything is not necessary or healthy.

There are people who spend their lives trying to read between the lines of what survived, in order attempt to answer some question that could have been easily answered from some of the lost material.

The "modern fetish" of archiving everything is an attempt to avoid culling material that may later turn out to have been valuable. All but 1% of what's saved will always be worthless, it's just impossible to know for certain which 1% that will turn out to be.


Awesome comment. I feel as you do, and believe that early culture, those wide ranging thoughts have more value than we might realize right now.

A global sense of US. That is what it was. People becoming aware, the world smaller...

I do not know the answer either, other than I too oppose Real Names type efforts.


> Online communication from 1970 to 1995 was almost completely public, archived indefinitely.

Only if one thinks that Usenet is all that there was. It wasn't.


Even if you leave out email, IRC, etc. I'm sure the volume of discussions that took place on commercial online services like Compuserve and the thousands upon thousands of BBS systems far outweighed Usenet--and almost all of that is long gone (for better or worse). Personally I sort of wish I had more archives from my BBS days but so it goes.


You've probably already heard of this but there's always http://textfiles.com


I'd be really interested to hear some of your techniques for finding these interesting things in the actual archive. How do you find stuff, do you have to know what your looking for and search by exact text match?


Interesting perspective. With regard to the historical value, though, I think that the volume of information cheapens the value of most of it. The kinds of things that move people and events so that future generations can understand what happened are still preserved -- maybe not all the details, but enough. Of course, there could be some great catastrophe that wipes out most of the collective memory, but the broad outlines are still there and will continue to be there in some form, because ultimately the really important memories are preserved in the minds of those who were affected by them and passed along in some form to the next generation.

So, the burning of the library at Alexandria was a catastrophe for our understanding of ancient civilization, but enough was preserved that we still have what is essentially a Greco-Roman understanding and practice of government, philosophy, history, etc.

Closer to our day, I remember things that were told to me about events that happened over 100 years ago by people who lived through them or who knew people who lived through them. For example, my mom has told me about how her Uncle Henry wheezed when she was a kid in the 30s because he was exposed to chemical warfare in the trenches of France during the Great War. My dad's stories of growing up on a farm during the Depression and World War II likewise tie in nicely with things I have read about the larger events at the time; e.g., he once mentioned how they received extra gas ration stamps once they bought a tractor because they were farmers producing food needed for the war effort. (They farmed with horses until 1942.)

So I feel like I have a pretty good understanding of what life was like for my parents and grandparents because of what they told me. Do I have anything like a complete record of it? No, but I don't need it. Do my children need it? Sure, but they don't really understand what actual Nazis were because that word has been misused and misapplied ever since Gore Vidal used that tactic to "win" an argument with William F. Buckley 50 years ago. And I can tell them about that and why it's important, but it doesn't seem to register because they were never taught any details of what happened in the 30s in Europe, and don't really have a sense of what that generation suffered through as a consequence of the toxic ideologies that flourished during that era.

So please forgive me, but while long posts like this on the internet might be interesting to a tiny percentage of us, for most of the people who live in the future they will at the very best be reduced to a one-sentence quote by some future Ken Burns-like documentarian read by some future Morgan Freeman-like actor.


> So, the burning of the library at Alexandria was a catastrophe for our understanding of ancient civilization, but enough was preserved that we still have what is essentially a Greco-Roman understanding and practice of government, philosophy, history, etc.

I think that's a bold statement. This is circular reasoning: since we know resources that we have, obviously we know what they do contain. I remember university lectures on ancient Roman history, conducted by a serious researcher in the field, where he said how our general understanding of Roman political system could be changed if we had a couple inches less (or more) of papyrus of Festus: an author who lived way after the interesting stuff but managed to survive. In fact historical writings that are preserved for us seem to have seriously warped our understanding (in a pro-Senate & optimates, conservative way). Only relatively recently we are trying to correct that by reaching for some more obscure sources and reading more closely. Many important things we will never know.

Obviously people caring about history will be always a small portion of the population, just as with many other pursuits.


I don't know. We've got a lot of the main writers, and we've got references to some of the books that were lost forever in the fire. I agree that there was some valuable information lost in the fire. But in the main, I think the broad outlines were preserved, or we probably would't even have the concept of circular reasoning.

Obviously there was a period where much of that was irrelevant anyway, after the barbarian hordes overran the weak and degenerate remnants of the Roman empire. When the revival came over half a millenium later, did they get it all right? Probably not, but it was a definite change from what existed in the interim. And some of that culture was preserved through the Roman church, though in a muted way.

More to the point of the original discussion, are you really concerned that anything of lasting value would be lost if the entire internet were deleted? Maybe technical information, sure, but culturally not so much imo.


TBH internetization of reality is still very much an ongoing process, likely to expand much further in the future. At least if we want to extrapolate, there might be a backlash, who knows (I can imagine internet gaining a boring, authoritarian image like television around the turn of milennium). "Big" political events had an important online aspect in recent years. New cultural phenomena, art genres etc. emerge here.

I have to admit, imagining nuking the entire internet is funny intellectually. It would leave us with a low connectivity, low-res "slow" version of history, something more resembling "long 1990s" or what a stubborn pre-milennial person may still be experiencing now. Would it be warped beyond recognition? Probably not, not yet. You could still reconstruct Western societies in broad strokes. But I think the cultural history at least would suffer significantly.


Or don't post things that you'd be embarrassed to be associated with? Easy for me to say of course. My digitized articles from an undergrad newspaper at least went through editors. And anything from my BBS days is almost assuredly lost to the ages. But I've always used my True Name and don't have any problem with that.


Well, sure, but I was still pretty immature 25 years ago. And for younger people, the norms are shifting so rapidly that it's not certain that something posted in jest today won't be considered career-ending heresy in five years.

This is the real reason the internet isn't nearly as interesting as it once was. The concept of 'thinking out loud' that I grew up with is nearly dead.


That's certainly fair. TBH I'm glad there isn't a public record of everything I might have written on a chat board or other public forum as a teenager or college student. At a minimum, there would be things that would require explanations along the lines of "Times were different."

That said, I'm generally pro-True Name unless there's some strong reason for anonymity.


You wrote a whole paragraph explaining exactly what is wrong with True Name.


True to a degree. There are still some places left. I think it fascinating that people often wish for more content moderation...

I don't know why they don't just go to work or something like that. Anything that provides a rigid cage for anything that could be deemed controversial. But why force it on places that are optional to visit?


    I think it fascinating that people often wish for more content moderation...
It's very easy for overmoderation to prevent useful/enjoyable discussion.

It's equally easy for a lack of moderation to prevent useful/enjoyable discussion.

Restaurants are optional places to visit, too, but I'd find it hard to enjoy my meal if people were shouting threats at me or showing me images of child porn.


Yes, the point was that not everything needs to be a restaurant.


From pg's excellent <http://www.paulgraham.com/say.html >:

> Do you have any opinions that you would be reluctant to express in front of a group of your peers? If the answer is no, you might want to stop and think about that. If everything you believe is something you're supposed to believe, could that possibly be a coincidence? Odds are it isn't. Odds are you just think what you're told[....]

> The most important thing is to be able to think what you want, not to say what you want. And if you feel you have to say everything you think, it may inhibit you from thinking improper thoughts.


That's your choice. I prefer to use a different username on each site to make it more complicated to connect the dots.


>Or don't post things that you'd be embarrassed to be associated with?

I can't predict what will be embarrassing or harmful for me to be associated with in ten years. The only way to be (sort of) sure would be to censor everything I say on the internet to be as inoffensive as possible to everyone, which would suck.


Google used to have a timeline showing key points in Usenet history. The first mention of Madonna. The first reference to AIDS. The first mention of the Gulf war. There were articles in Wired on how they recovered this stuff from long-lost tapes. This was such an important piece of internet history and they just abandoned it.


My best guess is the company just doesn't see much value in doing the work. It's a niche community and by definition 20+ year old information. It's important historically maybe but not to Google's business. No one's going to buy or see ads on 1990s rec.arts.sf.tv.babylon5 content.

archive.org has several significant Usenet collections. Henry Spencer's UTZOO collection, stuff from Giganews, and some sort of dump for Google Groups / Deja itself. I don't think anyone's built a useful interface for them yet.

(Ben Swartz made the donation from Google: https://www.bensw.com/blog/Aaron-5-Years-Later/ )


I think that's about right. For better or worse, at least the Google of today does not see itself in a historical archivist role to any significant degree.


I wish they'd hand the data off to some group that actually would do something with it, like the internet archive.


"Does someone high up in Google have an embarrassing usenet history?"

To me it's a lot simpler than that: on Usenet one could find people exchanging opinions about a product while on today's web searches all we find is companies selling that product. Just try searching for anything and see how deep you need to go until you get something that resembles a legit conversation about something rather than people selling it. It was about monetizing every search results page, and that goal became clear to me when they removed the discussion search option from the search engine; that was the final nail in the coffin for the Internet as we knew it.

Some background: https://www.seroundtable.com/google-search-filters-gone-1799...


There was no business model for Google with an open, decentralized Usenet. They tried to replace it with Google Groups and later Google Plus.


I hated Google Groups, the interface seemed so overengineered compared to the simplicity of Usenet. I believe you couldn't even view the (plaintext) posts with Javascript disabled.

I have a suspicion that Stack Overflow's success can be attributed at least somewhat to the experience that was browsing comp.lang.* using Google Groups.


I haven't looked in a while, but for a long time, the single most-voted-for bug in Google's public bug tracker was "give me some API access to the message content in Google Groups."

Ironically, there is already a very simple API mechanism that could have been used to provide exactly all the information people wanted... NNTP. All Google would have to do is provide an NNTP server endpoint to Google Groups (even beyond its Usenet mirror). It's not even that hard to write an NNTP server: it's probably the easiest server to implement of POP3, SMTP, IMAP, and NNTP.


It explains why they never had a good Usenet client but why is the content not really searchable anymore?


I'd imagine that a body of content that's largely from the pre-ecommerce internet isn't of any use to Google.


Seems like the real problem is that there is no defense against spam with an open, decentralized Usenet. NNTP was designed for a more innocent era.


As soon as spam started hitting USENET there really was no way to shut it down. The entire system was held together by "netiqette" and when that died off during the Eternal September, it was only a matter of time before it collapsed.


In an article by the Atlantic it's stated that the books at least are kept around.

> it’s 50 or 60 petabytes on disk, and the only people who can see it are half a dozen engineers on the project who happen to have access because they’re the ones responsible for locking it up.

https://www.theatlantic.com/technology/archive/2017/04/the-t...

They invested non-trivial amounts of money in scanning those books, deleting them would throw that money away. As for Usenet, it's similar: storing the data is cheap, and acquiring it again is probably next to impossible, so unlikely they threw it away unless idk it contains proof that sergey and larry have stolen their ideas from some usenet post or something.


[ "rafael juarez" usenet wesleyan ] returns a post I made in 1990: https://groups.google.com/d/msg/rec.arts.movies/YnOmYAjQq8I/...

and other queries for my name return usenet posts in Google Groups: https://groups.google.com/d/msg/comp.os.linux.development.ap...

but yes, I agree the search results are not nearly as good as they once were. the search engineers who were keen on getting this running years ago have moved on to other things.


Why does “moving onto other things” mean that everything gets worse? It’s not like the contents of the 1990 Usenet posts are in flux.

Is google just hiring so many incompetent engineers that they wreck any project made 5+ years ago?


I think it's kind of insulting to say that. More realistically, maintaining things in the Google environment has a cost- the products are always changing, so just keeping an existing system running well takes a ton of time, and that time comes at the cost of launching new features. Realistically, usenet search results never represented a huge amount of traffic, and traffic is what gets attention at google.


That reeks of incompetence though. Systems shouldn’t be in so much flux that existing products keep breaking. Poor abstraction and tight coupling.

>and traffic is what gets attention at google.

This is likely what resulted in the huge collapse in engineering quality. People that spew out shit with flashy features for product managers get rewarded and move up while people who stabilize and maintain products are cast aside.

The size of the engineering group at google is more than 10x what it was back in the glory days of stuff like gmail, reader, maps, etc. Google now just makes incremental worse changes to existing products (look how slow maps and gmail are) and kills off things that the terrible engineering org can’t keep up with.

There are definitely great engineers at google, but the ratio of garbage to them has become untenable and Google is well on its path to mediocrity. A great example of this is that you have to do an interview again if you want to work on something actually innovative like self driving cars.


In the early days Google wanted to make information accessible to anyone and that fit right into that strategy and goal.

Unfortunately priorities at Google have changed significantly, and the focus is on things that can help sell more targeted ads at scale. Actually providing value is not that important anymore I'm afraid.


I'm imagining Jason Scott/textfiles on an Indiana Jones-like mission, where he finds a hard drive containing all of this, but then has to run to escape, including running from a boulder and jumping over snakes.


The reality is not far from this - AIUI, the Internet Archive does have a fairly sizable archive of very early Usenet that was extracted from backup tapes stored at a random zoology department somewhere in Canada. So dangerous snakes were very likely involved, at least.


A random zoology department somewhere in Canada??!

https://en.wikipedia.org/wiki/Henry_Spencer

That's like saying seismo was a random nuclear warhead detonation monitoring facility somewhere in Northern Virginia.

https://en.wikipedia.org/wiki/Rick_Adams_(Internet_pioneer)

If utzoo was involved, then they probably mounted a scratch monkey, at least.

https://en.wikipedia.org/wiki/Scratch_monkey

https://edp.org/monkey.htm


Here's the story: https://www.salon.com/2002/01/08/saving_usenet/

One of the interesting points the story brings up is that you never know what future people will care about. In this case cultural and political discussions are typically more interesting than arcana about bug fixes in long ago systems.


Is it hosted anywhere, or just in a vault?


WAS I CALLED


I wonder if we could get Kibo to show up.


Before I left a website I was writing for (in amicable terms), I asked if I could have a db dump of the posts authored by me. It's in my archive now, so that I know I can get back to years of writing even if the website won't be there anymore in the future.


PSA: If you write for pubs, especially ones that have paywalls of some sort, it's definitely worth keeping your own copies of anything you've written. Fortunately I either kept direct copies of most things I wrote (or they were mirrored on a tech news site) at one job over the course of about 8 years or it would all pretty much be gone.

Even if your pieces are still online and/or backed up on the Internet Archive, they can still be very hard to track down in a systematic way.

I have pretty much everything I care about from the past 20 years or so but a lot of it would be effectively gone had I not specifically saved it.


> I have pretty much everything I care about from the past 20 years or so but a lot of it would be effectively gone had I not specifically saved it.

While you have it - for "us" it is "effectively gone" - unless you uploaded it somewhere.

In the future, after you die, it may become "absolutely gone" - as your relatives or whomever likely won't care, and will discard it.

Unless you make some effort to preserve it. Most people don't - they never did; it was always left up to future people to do so.

For instance, my mom and dad saved a ton of old photos and letters between them, many while he was in the service during the Vietnam War. My mom saved them - but not in a great way. She basically put them all in a box, then that inside a trash bag, and we found it in my old childhood home after both she and my had passed away. So I saved them.

But - unless I scan them - they will be gone forever. In a sense, though, that's ok - because they were never meant to be public anyhow.

But for stuff on the internet - it was public; it was most likely meant to be public. But if you save it personally, then the site or whatever goes away, what the public had is gone. Things like that didn't really happen in the past - I mean, they did - but there was always the possibility of recovery in the future.

More than a few times was something thought "lost forever" - only to be recovered; it found hidden away somewhere, or buried in some dusty archive in a back room of a basement storage area at a university or something. Sometimes found inside a wall.

But the digital? That is rare indeed. Even precious artifacts like tapes and manuals from times past get thrown out all the time, usually by relatives of some passed on engineer, often by their widows, as "so much of this old junk".

Occasionally, though, it gets saved (usually by bitsavers) - and if we are really lucky, it gets scanned or pulled from the tapes - old archives of code, sometimes of other stuff...

But we lose more than we save.

I'm just rambling now - this kind of thing, this loss of our computing history in general (it's more than just "the early internet" - we've lost and thrown away so much of our "early computing" history, it's not funny)...

Think about the inventions that have changed humanity in radical ways, that have altered our society greatly: The steam engine, the automobile, the airplane, etc. How many museums and such are there that celebrate, archive, and curate information about them?

The computer?

How many museums are there for it in comparison?

The difference is stark. A thought I just now had is that I wonder if it is like this, because the computer represents such a different "threat" to humans - that is, it's the closest we've come to a representation of "us" - our minds. And - as it becomes more tightly integrated into our society, if not our body collective (and literal) - if we unconsciously seek to marginalize it due to fear in some manner; fear of the other? fear of the artificial? fear of the usurper? Maybe we seek to forget what we have wrought?


>While you have it - for "us" it is "effectively gone" - unless you uploaded it somewhere.

The stuff I have that I considered still relevant and interesting, I put on my website and it is presumably mirrored in The Wayback Machine by now although I have not actually checked.

I agree with your basic points. I have a lot of stuff that's scanned and online but it's a job.

For example, I have a large format book chronicling a year in product development at Data General in the mid-80s. It's really a fascinating snapshot of the time. I'm guessing it's not the only copy still around but it's probably one of a relatively few and should really be online.


this is interesting because I know many companies would not allow you to do that. Your work belongs to them


You may or may not have the rights to post it yourself online, but very few pubs are going to have a problem with you keeping your own copies for personal use.


Exactly. Those were the terms, basically.


Because they're Google. One might think that Google would see preserving the world's information as part of their mission. I believe they once said something along those lines. But that's pretty much gone by the wayside. And it's probably a reflection of the way Google is managed that even projects with absolutely trivial costs relating to things like RSS and Usenet just fade away because no one wants to be associated with such non-strategic things. Scholar and Books also pretty much went by the wayside though that wasn't really Google's fault.


Yeah. To "organize the world's information and make it universally accessible and useful"

https://www.google.com/search/howsearchworks/mission/

I agree they're not really faithful to it. I mean don't get me wrong. Google Search and Maps are two inventions that are so insanely useful, they may standalone be responsible for decades of faster progress. But I do think that if they were really true to the mission statement, they'd take things such as archival more seriously.

(If you explained the entire internet to an alien, and then told them that Google aren't the ones running the Internet Archive, they'd say "Seriously?")


>(If you explained the entire internet to an alien, and then told them that Google aren't the ones running the Internet Archive, they'd say "Seriously?")

+1000

Mind you. At some level, I'm happier that it's a non-profit pursuing this as its mission rather than Google. But, given the vast amounts of money that Google spends on all sorts of things, I don't really grok the mindset that doesn't really prioritize the preservation of information--and the organization of that information--as at least a sideline.

There are admittedly growing headwinds, especially in the case of Europe, about what information can be preserved, but that hasn't really been a big issue until recently.


It might be that they figure the Internet Archive is doing a great job, and there's no use to do the same job twice. They do contribute, being on of the largest (the largest?) book sponsor.


I suspect that at least part of it is that mirroring copyrighted content is a gray area of law. The fact that the Internet Archive is a non-profit archive may give them some leeway and, in any case, they're a less tempting target than Google would be. Look at the ongoing issues that Google has around news sites for example.


Ideally the Internet Archive stays out of jurisdictions that can force them to remove most types of content. The non-profit aspect won't help them in the least.

Being all over the planet in terms of business, infrastructure and physical presence is where Google acting as archive would fail very badly. They might be the absolute last organization you want serving as that entity.

The IA in theory could operate all of its infrastructure and organization out of a preferential jurisdiction (or a few, so as to have backups in case one favorable location goes bad legally/politically), and archive anything it wants to from around the world while entirely ignoring the local laws from a given place (eg the EU, or China, or Brazil, New Zealand, or Turkey, or wherever).


>The non-profit aspect won't help them in the least. //

AIUI one of the tests for Fair Use [in USA] looks at whether use is non-commercial (not the same as non-profit; commercial use can be free-gratis, for example; nor is it a sufficient condition in itself), so it could be a key element in a court decision I feel; what's more important perhaps is that people are less inclined to sue non-profits because of the potential harm to their own public image.

Google are pretty canny, I'd expect them to let IA lead - eg assumed consent with old books - in order to set a non-binding precedent so that they can go to the press should they be challenged and say "well we just followed what the noble souls of IA are doing, and this court decision will harm the IA".

Last I looked, IIRC, Papua New Guinea wasn't signatory to copyright treaties, but I think they were planning on signing. There's probably a country in a similar circumstances that would be a reasonable place for holding a backup archive that includes the stuff less liberal regimes want you to ditch.


Though the legal situation is a bit murky even in the US. After all, I can't set up a "Comics Archive" and start populating it with all sorts of copyrighted comic strips and expect not to hear from the publishers. But as a non-profit who isn't making money off the content it mirrors, respects robots.txt even retroactively, and will generally honor takedown requests that are remotely legit, it gets cut a lot of slack that a corporation doing this for profit-making purposes wouldn't.


Though the legal situation is a bit murky even in the US.

I don't think it's all that murky.

It's my understanding that IA is allowed to have all that copyrighted stuff because it took the effort to legally register as a real library.


As far as I know, there's no such registry. There are exceptions under Section 108 for institutions that fit a certain definition of library, but from what I can tell as a non-laywer, they don't allow the kind of indiscriminate reproduction that the IA engages on: https://www.law.cornell.edu/uscode/text/17/108


From Wikimopedia: "The Archive is a member of the International Internet Preservation Consortium and was officially designated as a library by the state of California in 2007."

Related newspaper article: http://old.post-gazette.com/pg/07175/796164-96.stm


Copyright is federal law, I don't think the State of California can exempt any institution from it.


The short answer is that libraries do not get a magical exemption to make copies of copyrighted works although they have some limited exemptions (that seem to have mostly been written with physical artifacts in mind). For example, a library cannot rip a DVD and make it available to the public with no usage restrictions.

IANAL but there is maybe an argument to be made that the IA can mirror web sites for preservation purposes but then could only make it available to one researcher at a time.


Ideally, any archive respects the wishes of copyright holders and we don't need to rely on legislation. I certainly want control over my data, and thankfully most legal systems are on my side. My rights over my data trump other peoples need to preserve absolutely everything, no matter how trivial. Like the collection of personal letters my grandparents wanted destroyed after their death, which did not end up in a library vault and the historical significance of which is lost to time.

I pity future historians who will need to wade through the petabytes of crap like so much landfill because we outsourced curating it to the future. Because just maybe the rubbish I spout on my personal blog will be of interest to future generations (hint, it isn't, and I'll be spinning in my grave from embarrassment if it is). I doubt they will wade through it, since we have the ability to leave future generations actual historic records and not force them to learn about us from fragments decoded by archaeologists.


>there's no use to do the same job twice

And why bother looking both ways before crossing the street, or even testing your backups?


Checking left and right are two jobs. And how often do you test each of your backups? It might get corrupted at any moment - is perpetual validation the answer?


Sure, why not? I do regular automated validation of my backups against each other using "rsync --checksum --dry-run" and get notified if anything beyond a tiny threshold is out of whack. (The threshold being due to small files updated between the two backup runs)


It says "organize", not "preserve". They are organizing information following the Mary Kondo method.


"Organize" is just a weasel word for "censor".


Curation is a method of limiting and reducing access, it's true.

I think you're being downvoted because many folks are touchy about the word censor being applied to non-government organizations.

OTOH, when is it that an organization becomes the defacto governing body?


I think you're being downvoted because...

...because I hope that's the most uncharitable interpretation of someone else's words that I read today.


I find that I am happier when I presume goodwill and allow for the risk that I am disappointed. Most of the time people don't disappoint.


Yeah, their new motto might more accurately be stated as "to organize the world's information, to feed it into our machine learning models, and to throw away anything we deem unprofitable."


> I mean don't get me wrong. Google Search and Maps are two inventions that are so insanely useful, they may standalone be responsible for decades of faster progress

I'm wondering if I'm the only one finding Google Search lacking lately. Increasingly, its search results are seriously out of date and linking to dead and/or ad-heavy sites. I've come to use DDG and, recently, Startpage.com more than Google Search.


Thank God Google isn't running the Internet Archive. Everyone should donate to them so they can remain an independent nonprofit.


> To "organize the world's information and make it universally accessible and useful"

They still do a few things like that, very occasionally. When they got rid of Freebase (notice a pattern?) they did preserve that data and it was used to seed Wikidata, which now in turn feeds their "SERP boxes". But I absolutely agree that there's zero focus otherwise on that core enabler of their business-- they're just coasting on their earlier (very substantial) efforts, and managing to stay afloat somehow.


Internet Archive has a clear mission and pure motives though. As an independent non-profit they are free from “business” motivations.


Does Google allow the Internet Archive to archive all of its YouTube videos?


I think someone would have to donate a datacenter


There's a lot of tech billionaires out there. Just saying.


Could the Internet Archive afford to do so?


Even if they can not, I suppose it's nice to not have to ask for permission for every video.


> If you explained the entire internet to an alien, and then told them that Google aren't the ones running the Internet Archive, they'd say "Seriously?"

They are running an internet archive. That's what their cache is. There's more than one internet archive; why would Google necessarily be running all of them?


> Google Search and Maps are two inventions that are so insanely useful, they may standalone be responsible for decades of faster progress.

I don't know about this. Google Search was a great improvement in ranking by relevance, but an important invention itself? As for maps: as far as I'm aware, Google Maps was "just" a combination of two existing technologies - online maps and car navigation systems. They launched Maps in 2005, even Germany had online maps by 2000. Granted, they didn't look as good as Google's and the UX was inferior, but it wasn't a horses vs cars situation imho.


And the iPhone is "just" a more expensive version of its predecessors with a slightly better touchscreen and UX, and Dropbox is "just" a ftp mount with svn on top of it.

Also cars are just faster horses so that works too :)

Honestly I find it hard to argue that Maps and Search haven't been some of the internet's biggest worldwide productivity boosts.


The question is what you would have if that product never existed. That's different from measuring how good the product is in a vacuum.

Without the iPhone it may have taken another year or two but the wave of full-screen smartphones had already started.

For Dropbox I'm unsure but they definitely have a lot of competitors doing the same thing at this point.

Maps... had much better scrolling than competitors? Being pretty isn't revolutionary.

Google search itself might qualify.

Cars are not faster horses, but if you removed any particular car company from history we would still have cars.


> ...but the wave of full-screen smartphones had already started.

I’ve heard it argued that someone would have gotten there eventually, but my suspicion is that without a big player committing all their resources to marketing and selling it a similar device would have failed to make headway. Phone manufacturers would never have pushed like Apple did against the headwinds of physical keyboards and flash and operator-managed app distribution (and phone crippling).

But I’ve never heard someone argue that the design was already established and going to become a tidal wave. There were a few devices which vaguely resembled a few superficial elements. What examples can you provide to illustrate a wave was already underway?


The first of its kind was the LG Prada, which came out slightly before the iPhone and sold a million units. The technology had all come together just enough to make devices like this possible, and they were starting out at barely good enough.

Batteries, screens, CPUs, all of those were advancing rapidly whether phones used them or not. And 3G was spreading rapidly. Even if it took two more iterations of moore's law, the market was growing more and more feasible every month. Even half-baked attempts 2-3 years down the line could easily have been more compelling than the original iPhone.

I guess a chunk depends on how critical the operator-independent apps were, but let's not forget that the iPhone was ATT-only for years.


LG had a touch screen it didn’t have a full OS that allowed it to do the things that the iPhone could do.

LG would have never built an entire ecosystem.

Apple was AT&T only in the US.


> Maps... had much better scrolling than competitors? Being pretty isn't revolutionary.

It's a combination of being always immediately available and having so much useful information about every place, all presented in a UI that is accessible.

The digital maps we had in our country pale in comparison to current day Google Maps. It even has graphs showing working hours and crowding level and reviews, and picture-perfect 3D simulation. That surely wasn't possible for any GPS or phone maps of that period.


Started by who? Early pre-iPhone Android prototypes were modeled after the BlackBerry and RIM was preaching the need for keyboards long after the iPhone was introduced. Microsoft was also aiming phones at businesses.


I agree with you insofar that "inventions" seem to imply that they invented the concept of Search Engines and Mapping Software from scratch, which is obviously not true (both existed for a long time before Google entered the market) but in both cases Google managed to offer a superior service for free (for the end user at least). I wouldn't call them inventions, but they're better implementations of an existing concept.

Compare Google maps to the mapping and GPS services of yore, you'd have to pay a fortune to get the same feature set. I remember when you had to pay to add regions to your GPS and then pay again to update them later.

The word "disruption" is quite a buzzword these days but in this case Google truly disrupted both these markets. Everybody had to play catch-up after that. I remember how all other search engines started copying Google's slick interface when they realized they were losing badly.


I do agree that they were vastly superior, but I believe that the new and disruptive thing that Google brought to the table in those markets was the monetization model. They offer valuable services without any visible price tag because the data those services generate are driving the profits from ads. I don't know if they were the first to do this, but that's an invention in my book..


The big innovation with Google Maps was the interactive zooming/panning interface, and it was a huge improvement over any other mapping site of the time. The popular map service of the time, MapQuest, had clunky scroll and zoom buttons around the sides of the map image, and moving around or zooming would reload the entire page. The GMaps interface wasn't just a marginal improvement over the existing pages, it was a completely new way of doing it that was 100x easier to use.


That's right, the online maps problem was solved in 2000, so we should just keep using those 19-year-old maps.


That's a gross misrepresentation of what I'm saying.

My point is: the technology was there. Better UX, nicer integration into other services etc, those are valuable and good, and they alone can be enough to win a huge market share, but they are not innovations as in "introducing new concepts". A smartphone capable of using GPS to show your position on a map is great and way more accessible than a Magellen NAV 1000, but it's the GPS itself that is the actual breakthrough, the big innovation.


>One might think that Google would see OWNING the world's information as part of their mission

FTFY

But also, I do like the theory that early high level Google hires, across the org likely have embarrassing Usenet histories -- I mean, c'mon - I've bee 'online' since the early 90s...

I've said some cringeworthy stuff: for example - I used to actively participate in the Haiku thread on Craigslist in the early 2000s -- and I thought I was /r/IamReallySmart

It was fun at the time - but I put way to much time into writing long Haikus, and once spent a ton of time attempting to write my masterpiece, a palindromatic senryu...

((HAHAH I just went back to CL and read some of my Haiku I wrote on there... from 2003!))

EDIT: for self deprecation, here is what I wrote on 9/11/2003 ABOUT 9/11/2001:

https://forums.craigslist.org/?ID=8734800


This makes me think that if any company has a mission to digitize/preserve the world's information, they should now be required to setup a proper non-profit organization/foundation around that. That way this prevents the company from later on exploiting it or killing it off due to a conflict of interest.


> One might think that Google would see preserving the world's information as part of their mission.

Why would one might think that? It seems obvious that their objective is to control information at best.


They rolled it into Google Groups iirc. Then it just vanished altogether.


It's still there, but certainly less searchable than before:

https://groups.google.com/forum/#!forum/comp.os.linux


As I recall it, they tried to incorporate Usenet into Google Groups. It was a mess, and instead of fixing it, they probably discovered that adoption of not-Usenet functions was more used and abandoned it.

This is a feature of development based on sprints vs. a bigger vision. That design/development model has alot going for it, but it only works well when the scope of the system is narrow.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: