HTML and the World Wide Web

From Cultures of the Book at Penn
Jump to navigation Jump to search

HTML: The Building Blocks of the Web

Just about everything a person views on the World Wide Web is constructed using a series of code called HTML. HTML, or Hypertext Markup Language, is at its core a way to represent hypertext content in a readable fashion for users. Developed by Tim Berners-Lee at CERN in the early 1990s, it was intended to “[enable] researchers from remote sites in the world to organize and pool together information.”[1] It was developed in conjunction with the World Wide Web itself to aid in research, but that project would grow to form an infrastructure upon which most every person on the planet engages with every day.

HTML functions via a series of tags that are then read by a computer program. These tags are denoted by the use of angle brackets (“<” and “>”) which enclose the specific media that is being marked up. For example, text found within a “<title>” tag would be marked as the title for the page being displayed, and all the text following the tag until its end (marked by a corresponding “</title>” tag) would be affected by the tag. These styles of tags were not Berners-Lee’s own invention, however; much of the design of HTML was taken from another markup language, Standard Generalized Markup Language, or SGML.[1] Specifically, the paired tag system using angle brackets was lifted from SGML, as well as some of SGML’s own tags like <title> and <p> (paragraph). HTML, however, provided its own expansion upon SGML, as Berners-Lee himself notes, by allowing the navigation of hypertext documents while also being easier to use than SGML, making it an easy to use language that would also make navigating the Web consistently simple to navigate for any user.[2]

Wikipedia page on "HTML" using Google Chrome's "DevTools" to highlight the HTML elements that comprise the page, specifically the first paragraph.

Using the Wikipedia page on HTML as an example, one can see how HTML works in action using the developer tools console in browsers such as Google Chrome or Mozilla Firefox (which can be accessed by hitting the F12 key). Looking through the console, readers can see how every element on the page is formed through a series of HTML tags. The opening paragraph, for instance, is at its base a simple "<p>" tag containing the text of the paragraph. It is followed up by various tags like "<b>" to bold the opening text of the article as per Wikipedia's style guide. "<a href>" is a tag used to create hyperlinks to other articles on Wikipedia, such as "markup language" or "web pages" in the sample image. The table which houses the primary image for the article, as well as the technical specifications, is formed through a nested series of tags, going from "<table>" to "<tbody>" to over a dozen "<tr>" tags that form the individual lines of the table (one of which has further subdivisions that allow for the embedding of the image itself). One table is a complex orchestration of tags in and of itself, to say nothing of the nesting of tags that forms the structure of the entirety of the article. The entire article is formed through an immense series of nested tags, allowing for almost 10,000 words and a series of images to be rendered and displayed to the user by their browser. This incredibly complex series of tags is required just to produce a single Wikipedia page, a relatively simple web page compared to the advanced scripting that goes on in other websites that have animated elements or other similarly complex designs.

It's important to note that the average reader would not necessarily know this page was the Wikipedia page for HTML by looking at the raw tags and elements. Perhaps by looking at the text deep within the tags for long enough it would be possible, but the tags themselves overwhelm the text that actually comprises the page. HTML, these series of nested tags, were never intended to be read by humans. Tim Berners-Lee himself admits he “never intended HTML source code…to be seen by users."[2] It is certainly a language that is simple enough to write so that it can be processed by a web browser, but it is important to stress that it is only meant to form the structure of the content that is actually read by people. Thus it makes for a unique entry in a material text, as the material itself is invisible to almost every user. Countless people read web pages without any idea as to how they are formed, the HTML practically nonexistent when working as intended.

HTML isn’t a static language, either. It is intimately tied to the Web’s history, and as the Web itself has grown and changed in the last 20 years or so, HTML has undergone revisions as well. HTML is developed and revised using standards proposed by the World Wide Web Consortium (W3C), which formed in 1994 in response to the growing popularity of the Web and the need to provide a central structure to “lead the Web to its full potential.”[2] The W3C’s goal is thus to look at the needs of developers to find what changes need to be made to HTML, proposing the new “standard” that essentially governs the very structure of the Web itself. For instance, the most recent iteration of HTML is HTML5, which was developed over a course of years before having its final release in 2014, which allowed for new dedicated tags like “<video>” to seamlessly embed video elements into webpages.[3] As the W3C would likely argue, the act of standardizing and revising HTML serves to provide upkeep to the Web while also satisfying its growing needs as new technologies come into popular use.

Limits of the HTML Web System: Decay, Deletion, and DRM

Yet HTML and its method of standardization are not without their limits and criticisms. Most notably, HTML being a standardized language means that it doesn’t necessarily support every single element a developer would wish to use. It only has a limited number of tags, and old tags can be made obsolete between versions of HTML.[4] Thus, older webpages using now-defunct code are not going to display properly, and that history of browsing experience will be lost in the public consciousness. There are alternatives to HTML that work around this problem, like Extensible Markup Language (XML) which allow for the development of custom tags that work between versions of itself, functioning as an extension to HTML. HTML5 even implements XML elements; unfortunately, XML still has its limits. XML does not provide the structure for a program to be able to read any custom tags; those programs have to be developed separately.[2] Even if new tags are made to support certain content, it isn’t necessarily going to be able to be read by every user unless time and labor was used in producing a program that each user would have to adopt, which isn’t a working solution to this problem.

ReadySet review of “Riot: Civil Unrest,” written by Dante Douglas.

The current structure of the Web also invites its own share of problems, namely the preservation of its texts. One such example of this is ReadySet, a vertical under the ZAM media network. ReadySet’s web page had the typical setup for a gaming news publication: large headline images showing featured stories or reviews, a scrolling list of recent articles that had been published, a navigation bar connecting to the individual categories of coverage the website provided, and a search bar to look for a specific piece the user might want to access. The articles themselves included related articles to allow for further reading, as well as buttons for a variety of social media websites designed so that articles could easily be shared across the Web for others to find. Overall, ReadySet’s website was a polished setup, easy for users to navigate to a specific piece of content they wanted to find while also providing a more generalized structure for them to simply browse. ReadySet no longer exists.

ReadySet’s web page as of October 1, 2018.

Typing in the site's URL only brings users to a static page informing them that the site they are trying to visit has been closed. The decision to shut down came about suddenly, announced in the form of a tweet from the site’s official Twitter account.[5] Freelancers who had pitched to the site were also informed in these tweets that they had less than a month to back up their articles before the site was taken down. Many took this advice, with an assortment of pages saved to places like the Internet Archive. Some of the work produced by ReadySet will be saved, and its Twitter account went on to note that a local backup of everything will be saved, but a vast swath of the content ReadySet’s contributors produced is simply inaccessible to its readership now with no real sign of coming back. ReadySet’s entire presence, years of coverage, is non-existent to virtually everybody. In a digital world with access to seemingly everything, an entire cultural touchstone can disappear in an instant, with little hope of recovering it.

Any page on the Web, regardless of the content coded in for display, can be suddenly at risk of becoming defunct from a host of reasons. As ReadySet’s shuttering shows, that reason can be financial, as their holding company ZAM saw no further profit in keeping the site running. Even web browsers themselves can render countless webpages unsupported, as was the case with many browser game developers when Google announced an update in May for its browser, Google Chrome. This update, intended to push back against one of the Web’s widely perceived nuisances, auto-playing videos with audio, was pushed out without notice over a weekend and began to affect browser games as well.[6] Games in which the audio was an important aspect of their experience were suddenly kicked to the curb and many developers were confused as to what had happened, others suddenly at risk of losing income if the projects they depended on for a living suddenly couldn’t work in Chrome. The implementation of this update would be delayed, but it would still occur, essentially putting the onus on developers to make sure their games could actually be played as they were meant to rather than lose a core feature. It’s a troubling move to have a company with such clout as Google to unilaterally change make changes to the open standards that developers are familiar with. HTML's functionality on the Web is dependent on browsers to interpret and render it, which means that changes to HTML itself cannot override the browser's own design; if Google chooses to remove a feature, it is difficult to counter that decision. Google's changes thus suggest an unstable future for lasting media and art on the Web.

Lastly, there is a developing problem with HTML that threatens to dismantle the open, sharing nature of the Web, a problem that comes directly from the W3C itself. As part of their recommendation process for updating the standards to HTML, the W3C has laid down the foundation for the implementation of Digital Rights Management within HTML itself. Digital Rights Management, or DRM, is essentially a form of copy-protection that is meant to act as a deterrent to piracy of copyrighted works by controlling who has access to particular types of content. In practice, it can take the form of limiting the number of devices that have access to a program or piece of software. Its implementation, regardless of the concern about piracy, could restrict the openness of the Web as well as stunt the progress of preservationist efforts. Content on the Web could be taken down or restricted to countless users even in cases where there is no direct copyright infringement, such as the case for fair use of a copyrighted work. Content shared on social media, such as viral memes, could not exist under such a structure, as it is dependent on rapid sharing and alteration of previously existing work. These protocols being embedded into HTML itself could ultimately dismantle the very culture of communication that has been cultivated through the Web.

The Electronic Frontier Foundation, which works to protect the open Web, made moves to condemn this act, going so far as to resign from the W3C for refusing to back down from moving forward on the recommendation because of how antithetical it was to its philosophy.[7] The recommendation, in their eyes, represented a kind of hostile corporate takeover of the Web, capitulating to their own interests at the expense of the wider populace. Even from Berners-Lee’s original vision for the Web, it was intended to be a space for sharing documents between users, and the implementation of something like DRM at such a grand scale would be a step towards tearing down that vision.[2]

Conclusion

In the digital age, HTML has become a more popular way to present content than even books while also being able to display so much more. It has helped form the very structure of the Web itself, presenting a vast array of content available to just about every person on Earth with a computer or mobile device. Yet, the Web is also in many ways adverse to its own history, incredibly susceptible to erasure and decay, and can be turned in an instant into an exclusionary device that locks out users. As the ever-expanding primary source of information today, the HTML-backed Web is completely unsuitable for a sustainable means of preservation. ReadySet was but one example of years of work, of media produced disappearing into the ether. An enormous part of the modern cultural record could be gone in but a single moment – and so much of it already has.

See Also

Notes

  1. 1.0 1.1 “A history of HTML.” World Wide Web Consortium, 1998. https://www.w3.org/People/Raggett/book4/ch02.html
  2. 2.0 2.1 2.2 2.3 2.4 Berners-Lee, Tim, and Mark Fischetti. Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web by Its Inventor. Harper San Francisco, 1999.
  3. Lardinois, Frederic. “W3C Declares HTML5 Standard Complete.” TechCrunch, 28 Oct. 2014. https://techcrunch.com/2014/10/28/w3c-declares-html5-standard-done/
  4. “HTML5 Differences from HTML4.” W3C, World Wide Web Consortium, 9 Dec. 2014, www.w3.org/TR/html5-diff/
  5. @ReadySetZam. “Hi there @ReadySetZAM freelancers and staff. Thanks for your patience while we sorted through this mess of a week. The tough decision has been made to close the site on October 1. Folks working on content for the site should try to pitch the content elsewhere!” Twitter, 10 Sept. 2018, 7:55 p.m., https://twitter.com/ReadySetZam/status/1039346592669609985
  6. Klepek, Patrick. “Google's Attempt at Fixing Autoplay Videos Has Broken Countless Games.” Waypoint, VICE, 8 May 2018, https://waypoint.vice.com/en_us/article/xwmqdk/googles-attempt-at-fixing-autoplayvideos-has-broken-countless-games
  7. Doctorow, Cory. “An open letter to the W3C Director, CEO, team and membership.” Electronic Frontier Foundation, 18 Sept. 2017. https://www.eff.org/deeplinks/2017/09/open-letterw3c-director-ceo-team-and-membership

License

This page is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. For more information, click here.