lunes, agosto 03 de 2020

Datasketch

Mayo 22, 2017

Why every journalist should use The Archive's Wayback Machine

The Wayback Machine allows anyone to explore historical captures (snapshots) of internet pages. For example, this was the homepage of the Washington Post on September 12th 2001.

Juan Pablo Marín Díaz

Juan Pablo Marín Díaz

@jpmarindiaz

"I would certainly be open to closing areas where we are at war with somebody. I sure as hell don’t want to let people that want to kill us and kill our nation use our internet. Yes, sir, I am." - Donald Trump. CNN December 15, 2015.

 

Just as the White House webpage took down many of their contents and pages right after Donald Trump took office, including all pages in spanish and pages about civil rights and the LGBT community, it is useful to be able to access copies of that content later in time. Not only governmental sites, discontinued sites, fake-news and tweets can be subject to deletion at any point in time. It is very useful, in fact necessary, to contrast sources with the precise information they had at the moment you consulted them. It is also necessary to be able to share this information when it is hosted by a trusted third party and not only using screenshots that can be easily manipulated by anyone. In case you want to have a trusted source to keep the actual contents of a page at the moment you visit it then you can use the Internet Archive and its Wayback Machine.

 

The Wayback Machine is one of the services from The Internet Archive. The Archive is basically a collection of historical snapshots of the internet, its main objective is to bring universal access to all knowledge to the world. It began in 1996 as a project to download all the public pages available on the internet and keep them as a reference. It now has over 286 billiion pages saved over time, that amounts to more than 9 petabytes of data and it currently adds more than 20 terabytes every week. For reference, 2 Petabytes correspond to the information all US academic research libraries have.

 

The Wayback Machine allows anyone to explore historical captures (snapshots) of internet pages. For example, this was the homepage of the Washington Post on September 12th 2001.

 

 

The Archive is an attempt to keep alive our digital memory so we don't lose years and years of collective intelligence and knowledge in the case of a tremendous hazard or accident, just as it happened to The Library of Alexandria.

 

As of november 2016, The Archive embarked on a quest to keep full copy their data in servers in another country. They currently have partial copies of the Internet Archive in Alexandria, Egypt, and in Amsterdam, the Netherlands. During President Trump's campaign trail the nature of his statements pushed the efforts of the non-profit to make an additional full copy of The Archive in Canada in case of institutional failure in the United States.

 

Many pages do not have a historic capture every single day. So it is necessary for users to manually save the pages they are interested in case they need a specific snapshot. The process in rather simple, you can visit http://archive.org/web/ and simply save your page. You will get a link, with the information of that webpage and the time you saved it, that you can share or publish so your readers can know exactly where and when the information was captured.

 



You can also save the pages using this chrome extension, straight from your browser. With this extension besides saving the web page to The Archive, you can also get the latest snapshot of the page when it is not currently available.

 

So, go ahead and use the Wayback Machine to document your publications and sources before they are taken down.

 

 

compartir

Juan Pablo Marín Díaz

Juan Pablo Marín Díaz

@jpmarindiaz

Juan Pablo es científico de datos. Ha trabajado en temas de estadística computacional aplicada en diversos campos como macroeconomía, hidrología y periodismo de datos.

artículos relacionados

Aprende

Mayo 22, 2017

Hallelujah! Finally a free PNG image bank

The PNGimg platform offers at least 25,000 images in PNG format so you don’t waste time in the tedious process of removing the backgrounds and correcting the images.