An Overview of HTML 5, PhoneGap, and Mobile Apps
Understanding how web languages are used for apps and how they work with native code. Originally published at www.danbricklin.com
In my discussions with business IT people, I'm finding that the use of HTML 5 and perhaps "hybrid" architecture is becoming more and more commonly accepted. However, I'm also finding out that most people don't understand exactly what this means, nor have a clear picture that lets them differentiate between variations of that architecture supported by the multitude of development environments. This essay fills in a lot of the details to help you understand that picture.HTML, JavaScript, and CSS - The evolution of the three browser languages
When browsers first came into being back in the early 1990's, web pages were specified using a single computer language, HTML, the HyperText Markup Language. HTML covered the content (text and images), the presentation of that content (layout, fonts, etc.), and behavior (links, buttons, typing areas, etc.). Everything was right there in the one specification of that language, and if you knew that language you could create any type of web page that could be displayed by the browser. To change what was displayed on the screen, the author would have you switch between a series of different pages. These pages were often created dynamically by a program accessed through a web server, with each successive page having different HTML. People were pretty happy with being able to so easily create usable presentations of information, and do basic interactions and transactions. If a developer knew the HTML language, and knew how to write programs in a language supported by web servers (initially Perl, but later PHP, Java, Cold Fusion, and other languages), they could bring all sorts of useful functionality to users around the world who could connect to their servers through the Internet. Early HTML was very bare-bones. The main content was text. It had very simple visual markup, such as letting you specify text as normal body text, more pronounced headings, mono-spaced font computer program listings, or list items. Over time, different browsers added additional capabilities. One of the earliest was the addition of in-line images in the Mosaic browser, adding this content type. Later we saw the addition of the FONT tag to let the author more precisely specify the look of text and the TABLE tag that could be used for both more complex lists and for control of positional layout. Netscape drove a lot of this expansion of the language, attracting many graphic design-oriented people to its camp. Many web sites were created that depended upon these new features, leading to notices of "Best viewed in Netscape". The Internet community continued to expand the standard language, and the various browser developers (most influentially at the time Microsoft and Netscape) moved in that direction. This brought on new versions of HTML: HTML 2, HTML 3.2 and HTML 4. The initial "HTML does it all" architecture quickly became a limitation. Interaction designers and programmers wanted more control over the behavior and the ability to add functionality. Visual designers wanted more control over the "look" of the pages. Netscape developed a programming language, distinct from HTML, called "JavaScript" to address some of the functionality issue. The name tried to play off of the fame at the time of the Java language (and Netscape got Sun's permission to use the name variant), but was its own language. JavaScript code could be included directly within an HTML page, or could be loaded from separate JavaScript-only files referenced in the HTML and shared among multiple pages on a web site. The main attraction of JavaScript was that it allowed the programmer to dynamically access and change the data in the browser that controlled the display on the screen. Microsoft followed Netscape's addition in Netscape 2.0 of JavaScript with a very similar implementation of a language in Internet Explorer 3.0. The data in the browser that represents the web page is organized, conceptually, as a series of software objects accessed in a program through lists and parent-child relationships (programmers often call this a tree). This access to the browser data is called the Document Object Model -- the DOM. Typical DOM elements and data values were paragraphs of text, form elements, and the URL of images. The JavaScript coding syntax and notation was designed to make it very easy to make reference to the DOM elements and their attributes and manipulate them. The syntax also made it easy for the capabilities of the objects handled by the browser (blocks of text, links, etc.) to be enhanced in later versions of the browser and then accessed without making changes to the JavaScript syntax. Early JavaScript was used to perform operations like data validation of input fields (cutting down on the need to go back to the server for such functionality and then needing to display a new web page with error information after the delay of a server request) and simple visual changes like showing drop-down menus and preloading images for various UI purposes (improving perceived speed by loading them before they were requested). In addition to being a normal computer programming language with variables, arrays, conditionals and loops, etc., JavaScript gives the programmer access to various browser "events". For example, the "load" event lets the programmer execute JavaScript code only after the entire web page has been loaded. This functionality could be used, for example, to wait for all of the web page to be read from the server and the DOM to be created by the browser, and then update the text displayed on certain parts of the screen, such as putting in the current time, or doing layout based on the size of the screen. The "click" event lets the programmer specify exactly which actions to take when an HTML item is clicked with a mouse. The "resize" event lets the programmer change layout when the browser window is resized by the user. Microsoft provided a method in Internet Explorer to access native code on the client computer (the one running the browser) with functionality called "Active-X". Netscape provided similar functionality through a plug-in capability. Unlike somewhat browser-manufacturer agnostic HTML and JavaScript code, these facilities were more browser specific. Access to native code let the web page have functionality not part of the browser but available to native code. For example, audio and video were first provided through added native code. Probably the most popular added native code was the Flash plug-in from Adobe that allowed the execution of content created with the Flash development environment, including video. This was initially used by web sites like YouTube. The community of developers making browsers produced yet another language, CSS (from the words Cascading Style Sheets), for addressing the presentation and layout of HTML pages. Rather than explicitly code positioning information in the HTML with table cells and the colors, fonts, and other visual aspects with special HTML tags and attributes on those tags, a more general purpose method was proposed. This used a hierarchy of definitions written in the CSS language, included either as part of an HTML page or referenced in a separate file on the web server and loaded by the browser, that when processed for any particular HTML element specified how that element was to be presented. This separated the definition of the presentation (the CSS) from the content (the HTML). By careful use of the "cascading" hierarchy of the CSS definitions, a web page author could easily change the look of a page to suit different situations (such as to be different when printed than when viewed on the screen). The CSS language lent itself to enhancement over time on many facets, such as fonts, colors, borders, backgrounds, and more, and having those being able to be applied to all different types of HTML elements without needing to change the definition of the HTML language, or even changing the HTML of an existing page. Support for CSS arrived slowly in the browsers. To meet the needs of web page designers, the behavior of CSS on the different browsers had to be as identical as possible in all cases. Retrofitting such specifications (which were evolving) into existing browser implementations was difficult. However, eventually a reasonable subset of a "full" implementation of CSS could be found in the major browsers. Finally, another event occurred. Microsoft, when trying to provide smooth interaction in a way to access its Outlook mail and calendaring system in a browser, introduced a new JavaScript capability in its Internet Explorer browser. This capability, called XMLHttpRequest (XHR) and also known as an asynchronous HTTP request, is now in all browsers. It lets a web page's JavaScript make a request to a web server and then receive the web server's response a while (fractions of a second to seconds) later for processing without erasing the browser page and displaying another web page. Additionally, the request and response handling is done without blocking any other use of the browser page such as user scrolling. JavaScript could then be used to update the currently displayed page with new information. This made Microsoft's OWA (Outlook Web Access) system act much more like a native code email system. The combination of JavaScript and asynchronous HTTP requests was later given the popular name "Ajax". At this point, around 1999 or so, we had three languages that were used to create content for both viewing and interacting in a browser. Those languages were: HTML with the content and glue for tying the pieces together through references to other files, JavaScript for providing interactivity and asynchronous server communication, and CSS for allowing precise control of the look and layout of each element. Over the next few years more and more users installed or upgraded to browsers capable of using those capabilities. There were three major browsers: Internet Explorer, Firefox, and Opera. The first two had the lion share of users. Firefox arose when Netscape rewrote their main visual display code and released it as Open Source. That code, called the Gecko engine, moved into a separate company, the Mozilla Foundation, which used it to create Firefox. The Microsoft visual display code is called the Trident layout engine. This combination fueled a change in the experience of using the web. Email systems, like Google's Gmail in 2004, and productivity applications from the early Halfbrain.com spreadsheet in 1999 to Google Docs around 2006 changed the feel of the web from "click, wait, click, wait" to desktop-like interaction but with real-time access to data on shared servers. Seeing Google Maps (released in 2005) for the first time was a real game changer for many people and inspired them to better exploit the capabilities of the browser for their own applications. A problem for developers, though, was the wide variety of different browsers and browser versions in use. Each successive release of a browser had better, more complete and "standards compliant" implementations of HTML, JavaScript, and CSS. However, many users were behind, running older implementations. New browsers, like Apple's Safari in 2003, based on the Open Source WebKit visual display code, were added to the mix. Developers needed to test their web pages on many browsers and put in special code to deal with the differences, or drop back to lesser capabilities if functionality was missing. To help deal with the complexity of writing code for common functionality such as drop-down menus, or changing the visual attributes of many parts of a web page, libraries of prewritten JavaScript code became popular. One of the most popular is the jQuery library. It simplified the support of different browsers, hiding the code to do that in library functions you could include on your web site without knowing much JavaScript. Web site developers who only had a basic knowledge of JavaScript could produce quite complex effects on their web pages by simply copying and reusing a few lines of code and including the jQuery library on their web server for loading by the browser along with their web pages.Recent advances and the name HTML5
The latest browsers have pushed things further. At this point, a large percentage of all browser users are running relatively recent versions, and those versions are becoming better at getting the users to continue to upgrade with new releases. The benefits of these new versions include better and better support for HTML5. This version of the standards specification improves on the older HTML 4 which helped usher in the CSS era. HTML 5 includes native video and audio as well as the CANVAS element and other means for supporting vector graphics. This means that the standard browser content can now be text, image, audio, video, and vector graphics. These features lessen the need for native code add-ins for this functionality, making it more standards-based and consistent across multiple browsers. The newer browsers have much faster JavaScript execution than older ones (this is in addition to any performance from upgrading hardware over the years). There have also been additions to the JavaScript language that facilitate faster execution in certain circumstances. In some cases JavaScript code, which had previously been a somewhat slow interpreted system, can now approach the speeds of regular compiled code in other languages. The CSS specification has also continued to advance, with new features such as graphics transformations and animation that the browsers execute using the graphics processors that are part of modern computers and heretofore were mainly used for gaming and simple screen effects. I wrote a recent essay and a blog post with examples of the advancement of JavaScript in the browser: "JavaScript rolls on (as does SocialCalc)" and "More JavaScript news". These additions to HTML, JavaScript, and CSS are making it so that web applications can have more and more of the full capabilities and user interface that you would expect from an application written in native code. More and more people are using browser based email and productivity applications instead of native code ones for their main daily productivity work. While HTML 5 is just part of this mix of improvements to HTML, JavaScript, and CSS, the term "HTML5" is sometimes used as a proxy for all three since they come together in the same browsers. For example, a browser that supports the HTML 5 video elements probably also supports the CSS transformation and animation properties and has fast JavaScript execution.Mobile
The early mobile phones and PDAs that included Internet connectivity and HTML web browsing posed new challenges. These included the Palm/Handspring Treo, the Microsoft-software based phones and handhelds, the Symbian-based devices, and many cell phones with mobile versions of the Opera browsers built-in. Compared to today's smartphones, these devices had limited processing and display capabilities, limited memory, and limited bandwidth. They also had quite small screens, physically and in number of pixels. Supporting web browsing on such devices often meant having a separate web address for mobile (sometimes auto-detected by the identification information browsers have always sent to the server). Web page developers would use quite different layouts (e.g., much less information at a time, fewer pretty pictures, terser text, single columns) and have little use of complex CSS or JavaScript (both of which may or may not be supported). Alternatively, the normal web site may be viewed with many visual aspects perhaps not displayed and some JavaScript functionality missing (for example, with no mouse cursor, you can't have changes that occur when you "hover" over an element). Users of such devices were initially quite pleased to have any web access at all on those devices. It was quite liberating compared to needing to be at a desktop or laptop computer. Over time, though, you miss being able to have a more full experience. A major shift occurred when Apple released the first iPhone in June 2007. It had a very powerful CPU as well as a graphics processing unit for special effects like smooth 3-D animation. For the iPhone's browser, Apple used the same WebKit browser engine as their desktop Safari browser, giving it pretty full access to HTML, JavaScript, and CSS. The LCD display had a reasonable number of pixels compared to the older CRT displays that many web sites were designed for. Apple was able to customize the Open Source WebKit and related code to add additional functionality. The graphics processing unit was used to provide smooth zooming. The iPhone browser could "pretend" to the web page that it had a larger screen, displaying the same total content as a normal laptop screen but zoomed out. This zoomed out view might make the page view almost unreadable for fine text but it was acceptable for navigation to a part you could then zoom in on instantly with a simple two-finger "unpinch" gesture. The browser was also enhanced to recognize various touch gestures, such as dragging to scroll, or tapping to select a link or zoom specific amounts. The standard browser controls, like drop-down lists and text fields, were modified to use standard iPhone controls, with more space to fit your fingers. Apple initially told application developers that the only way they could create applications for the iPhone was as "web apps" -- HTML, JavaScript, and CSS accessed from the web and not installed nor stored on the iPhone other than as favorite links of some kind. To aid the creation of these web apps, Apple made some additions to HTML and JavaScript in the iPhone browser. They added the ability to control the automatic zooming in the HTML code. They also added some new events. The "mouse" event behavior was changed to support the need to wait for recognition of gestures such as pinch and drag-scroll. No longer could JavaScript use "mousedown" or "mousemove" events to produce behavior like drag and drop or highlighting. Instead, there were new "touch" events to meet those needs. Users were happy -- they now had a nice portable device with which they could surf the vast majority of the web sites they were used to. While they may have to zoom in and out, much of the functionality and almost all of the visual appeal they came to expect survived. While the main alternative to JavaScript for web page functionality was missing (Flash), Safari supported enough of the HTML 5, JavaScript, and CSS standards to provide some alternatives. Web site developers were happy -- they could have their regular web sites be accessed wirelessly on mobile handheld devices and their mobile versions still worked as before. Also, this very popular platform had up-to-date support for HTML, JavaScript, and CSS. In July 2008 Apple started to allow developers outside of Apple write applications in "native code" that could be installed and be stored on the iPhone and used without needing an Internet connection. These applications had to be written in the main Apple development language, Objective-C, a language dating back at least to the days of the NeXT operating system developed in the 1980s but accessing the new functionality provided by Apple in the iPhone operating system (later named iOS). In 2009 new mobile phones became available from a variety of manufacturers that ran another operating system, Android, which was spearheaded by Google. Android had its own built-in browser which also used the Open Source WebKit browser engine. This meant that, in general, Android browsing had the same standards-compliant behavior as iOS for regular web sites. (Web page developers let out a sigh of relief.) Android also allowed native development, this time in the Java (not JavaScript) language. The Android operating system programming environment is powerful, like the iOS environment, but is different and not directly compatible.Mobile App Development
Creating apps for iOS (iPhone, iPod touch, and iPad) is similar to programming for a desktop or laptop computer. You write code using the Objective-C language, which looks somewhat similar to other languages derived from the original C programming language used to create Unix. The language has additional syntax to access software objects that make use of the "[" and "]" characters and named arguments to object methods. For example, to set the image on a button, you might use the statement:[stopButton setImage:[UIImage imageNamed:@"stop.png"] forState:UIControlStateNormal];Programs make use of libraries of code provided by Apple as part of the iOS operating system. For example, the "setImage: forState:" method of the UIButton class shown here is provided as part of iOS. Those libraries are quite extensive, providing objects and methods for everything from buttons and lists to timers, strings, network I/O, cameras, GPS, and much more. The definitions of how to use these libraries are called the Application Program Interface (API) for iOS. Once the code for an app is written, it is compiled on a Mac, and then the resulting ".ipa" file (an iOS App Store Package) may be installed on an iOS device. The process for this is described below. For Android apps, the process is similar. The language, though, is Java, the same language commonly used for many other purposes, including for writing corporate server-based applications. Java has a more common syntax and was first released to the public in 1995. The functionality provided by the Android code libraries, the Android API, is extensive, much like for iOS. The files for installing apps are ".apk" files (Android application package). Writing apps in native code has the advantage of having direct access to all of the functionality that the device manufacturer provides to developers through the operating system API. This API usually lets a developer access most of what the manufacturer uses for their built-in apps for the device. Together with native code for speed, using the API for access to functionality gives rise to smooth games, interfaces that use the operating system's standard visual controls, and integration of much of the hardware capabilities of the device including touch, accelerometers, GPS, cameras, camera illumination light, etc. Another benefit of native code apps is that they may be installed on the device and reside in storage on the device. This lets them be started using the device operating system's standard means of launching apps (e.g., the home screen on iOS) and execute without requiring, in most cases, an Internet connection. One of the disadvantages of native code apps is that they require coding specific to the operating system, such as in choice of language (Objective-C, Java) and API specifics (the way you implement buttons is different in Android from the way you do it in iOS). This can be a problem when there are limits on the time and labor available to create an app that needs to run on multiple platforms, or even on one platform. Some common parts of computer applications, such as mixing text, images, and controls in a fluid manner as you usually find on web pages, are not as simple to code as they are in languages designed for such functionality, like HTML, JavaScript, and CSS.
Comment