June 12, 2010

BBC News, pt I

For the inaugural WhitherApps project, we’re going to look at the (excellent) BBC News iPhone/iPad app and see if it could have been built as a web app, rather than in native code.

To kick off with, we’re going to try and emulate the iPad version, in both orientation modes. What we’re basically aiming for is this:

The BBC News iPad app in landscape mode

And this:

The BBC News iPad app in portrait mode

Let’s see how we get on. I’ve decided to break the process down into 4 steps:

  • Figure out how the app pulls content from the BBC
  • Make a wireframe web page that behaves like the app
  • Stitch the content into the wireframe
  • Decorate the wireframe so it looks just like the real thing

This gives me a chance to assess the feasibility of the undertaking before I get too swept up with pushing pixels around.

Behind the scenes

Obviously the app does not ship with the news in it, and my assumption is that the app is little more than a client to render a feed of content and images located somewhere on the web. Architecturally, it’s nothing more than a customized browser-like client… but we do need to figure out where the data comes from and how we might be able to use it in a real browser-based app.

I’m using my iPad on a home WiFi network. I take a quick look at the network settings on the device and I see that I can set up a proxy for web access and so on. This seems like the easiest way to get in to see the traffic coming to and from the app: I run up a Squid proxy on my Mac (which is conveniently on the same local WiFi network) and set the iPad’s proxy to be my Mac’s local IP address.

As a proxy, Squid does lots of clever things. For our purposes though, we really just want to see the requests the device is making, so I tail the access.log file. This shows me the HTTP requests from the device as it makes them:

GET http://www.live.bbc.co.uk/moira/feeds/ipad/news/en/v1 - application/json
GET http://www.bbc.co.uk/moira/feed/news_world/front_page - application/atom+xml
GET http://bbc.112.2o7.net/b/ss/bbcwnewsiphone/0/OIP-2.0/s82818894? - text/html
GET http://cdnedge.bbc.co.uk/nol/ifs_news/hi/front_page/ticker.json - text/javascript
GET http://www.bbc.co.uk/moira/feed/news_world/americas - application/atom+xml
GET http://static.bbc.co.uk/moira/img/ipad/thumbnail/48058000/jpg/_48058752_009512185-2.jpg - image/jpeg
...

This looks very promising. Firstly the app seems to be using HTTP – no proprietary protocols here – so this will work well if we come to use some sort of AJAX technique ourselves. Secondly, the URLs are self-explanatory, so it’s easy to see how things are working. There’s some sort of initialising JSON at the start, then an ATOM feed of the front page news (and shortly afterward the ‘Americas’ page), and then a whole bunch of thumbnails that are used for the navigation icons at the top (or side) of the app.

(Incidentally, if you do do this sort of thing, prepare to be intrigued by the amount of background HTTP that the iPad is sending to Apple!)

Playing it back

My first impulse of course is to fire up a web browser (or, in this case, wget on the command line) and see what the payload of some of these responses is. So I try the initial JSON file:

~ > wget http://www.live.bbc.co.uk/moira/feeds/ipad/news/en/v1
--2010-06-13 13:13:50--  http://www.live.bbc.co.uk/moira/feeds/ipad/news/en/v1
Resolving www.live.bbc.co.uk (www.live.bbc.co.uk)... 212.58.246.160
Connecting to www.live.bbc.co.uk (www.live.bbc.co.uk)|212.58.246.160|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2010-06-13 13:13:53 ERROR 403: Forbidden.

And this doesn’t seem so good. How does the iPad app get the content over HTTP (via my Mac) when my Mac itself can’t?

Well, like all self-respecting mobile technologists, I have something of a fetish for HTTP user-agents. I wonder how the iPad app identifies itself when it makes requests to the BBC server? I go back to Squid, alter the configuration slightly so that the HTTP headers are logged, and refresh the iPad app. A whole load of new HTTP goes past, but this time I can see that the app’s requests include:

User-Agent: BBC News 1.2.1 (iPad; iPhone OS 3.2; en_US)

Call it a hunch, but I wonder how wget on my Mac will get on if I spoof the user-agent header?

~ > wget --user-agent="BBC News 1.2.1 (iPad; iPhone OS 3.2; en_US)" http://www.live.bbc.co.uk/moira/feeds/ipad/news/en/v1
--2010-06-13 13:18:20--  http://www.live.bbc.co.uk/moira/feeds/ipad/news/en/v1
Resolving www.live.bbc.co.uk (www.live.bbc.co.uk)... 212.58.246.160
Connecting to www.live.bbc.co.uk (www.live.bbc.co.uk)|212.58.246.160|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6199 (6.1K) [application/json]

Bingo! That’s all it takes to be not ‘forbidden’ and get our initial 6Kb’s worth of JSON. It looks like this:

{
 "name": "WWW iPad Application Bootstrap",
 "version": "1.0.4",
 "published": "2010-06-07 11:09:39 Etc/GMT",
 "ticker_url": "http://cdnedge.bbc.co.uk/nol/ifs_news/hi/front_page/ticker.json",
 "live_feed_uri_template": "http://www.bbc.co.uk/worldservice/meta/mobile/iphone/%7Bbandwidth%7D",
 "ugc_sms_number": "+447725100100",
 "ugc_email": "talkingpoint@bbc.co.uk",
 "feedback_email": "iphone-feedback@bbc.co.uk",
 "faq_url": "http://www.bbc.co.uk/moira/html/%7bdevice%7d/news/faq/en",
 "feedback_url": "http://www.bbc.co.uk/moira/html/%7bdevice%7d/news/feedback/en",
 "conditions_url": "http://www.bbc.co.uk/moira/html/%7bdevice%7d/news/tandc/en",
 "privacy_url": "http://www.bbc.co.uk/moira/html/%7bdevice%7d/news/privacy/en",
 "copyright": "BBC © 2010",
 "feeds": [
 {
  "type": "group",
  "title": "More",
  "feeds": [
  {
   "title": "Top Stories",
   "feed_url": "http://www.bbc.co.uk/moira/feed/news_world/front_page",
   "default": true,
   "movable": 0
  },
  {
   "title": "Americas",
   "feed_url": "http://www.bbc.co.uk/moira/feed/news_world/americas",
   "default": true
  },
...
  {
   "title": "Audio & Video",
   "feed_url": "http://www.bbc.co.uk/moira/feed/avod/iphone/news/en/v1"
  },
  {
   "type": "group",
   "title": "News in Other Languages",
   "feeds": [
   {
    "title": "Mundo",
    "feed_url": "http://www.bbc.co.uk/worldservice/syndication/mobileiq/iphone/mundo/homepage/full.xml",
    "logo_url": "bbcimage://logomundo/wsdownload.bbc.co.uk/worldservice/images/branding/languages/iphone/mundo_125x19.png"
   },
...
   {
    "title": "Urdu",
    "feed_url": "http://www.bbc.co.uk/worldservice/syndication/mobileiq/iphone/urdu/homepage/full.atom",
    "logo_url": "bbcimage://logourdu/wsdownload.bbc.co.uk/worldservice/images/branding/languages/iphone/urdu_117x28.png"
   }

   ]
  }
  ]
 }
 ]
}

This looks good. We can see that the first thing the app is doing is being told where all the critical feeds are stored, and how they should be structured in the navigational menu. We can see how to get the ticker data for the top of the page, and we can even get some ideas about how the ‘News in other languages’ will be fetched.

Let’s start by looking at the main feed for what must be the front page:

http://www.bbc.co.uk/moira/feed/news_world/front_page

Again, this requires the user-agent to be spoofed, and the response is as follows:

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns='http://www.w3.org/2005/Atom' xmlns:media='http://search.yahoo.com/mrss/' xmlns:dc='http://purl.org/dc/elements/1.1/'>
  <title>BBC News | News Front Page | World Edition</title>
  <updated>2010-06-13T20:29:08+00:00</updated>
  <id>urn:news-bbc-co-uk:section:bbc_news:front_page:world_edition</id>
  <author>
    <name>BBC</name>
  </author>
  <entry>
    <title>Thousands flee Kyrgyzstan unrest</title>
    <summary>Escalating ethnic violence in Kyrgyzstan that has killed nearly 100 people prompts tens of thousands to flee to Uzbekistan.</summary>
    <category label='World/Asia Pacific' term='World/Asia Pacific' />
    <updated>2010-06-13T17:51:40+00:00</updated>
    <id>urn:news-bbc-co-uk:story:8737578</id>
    <link rel='alternate' href='http://news.bbc.co.uk/1/hi/world/asia_pacific/10304165.stm' type='text/html' title='Thousands flee Kyrgyzstan unrest' />
    <media:thumbnail url='bbcimage://urn%3Anews-bbc-co-uk%3Astory%3A8737578/static.bbc.co.uk/moira/img/%7bdevice%7d/thumbnail/48063000/jpg/_48063440_48063241.jpg' />
    <content type='xhtml'>
      <div xmlns='http://www.w3.org/1999/xhtml' class='body'>
        <div class='fullwidth_img'>
          <a href='bbcvideo://urn%3Anews-bbc-co-uk%3Astory%3A8737578/www.bbc.co.uk/moira/avod/%7bdevice%7d/av/urn-news.bbc.co.uk-story-8737578/urn-news.bbc.co.uk-media-48063655/news/world/604000/604036/%7bbandwidth%7d'>
            <img alt='Soldiers in central Osh' src='bbcimage://urn%3Anews-bbc-co-uk%3Astory%3A8737578/static.bbc.co.uk/moira/img/%7bdevice%7d/styfull/48063000/jpg/_48063696_jex_721239_de27-1.jpg' class='fullwidth_512x288' />
          </a>
        </div>
        <p>Escalating ethnic violence in Kyrgyzstan has prompted tens of thousands of ethnic Uzbeks to flee the country.</p>
...
        <div class='inline_img'>
          <img alt='Map of Kyrgyzstan' src='bbcimage://urn%3Anews-bbc-co-uk%3Astory%3A8737578/static.bbc.co.uk/moira/img/%7bdevice%7d/styhalf/48063000/gif/_48063789_kyrgyz_osh_jalal_0610.gif' class='inline_226x170' />
        </div>
...
      </div>
    </content>
  </entry>
...
</feed>

Excellent. This looks like a very straightforward ATOM feed, containing thumbnails for the navigation and fairly simple HTML formatting. The style of BBC articles is to have small, bite-sized paragraphs with small inline images and occasional videos. We’ll probably have to sort out all the styling ourselves, but the markup looks clean and workable.

The bbcimage://urn URLs for the images and thumnails look a bit strange, but we’ve already seen the HTTP traffic when they’re fetched, so we can figure out how they’ll need to be rewritten. At first glance it looks like:

bbcimage://urn%3Anews-bbc-co-uk%3Astory%3A8737578/static.bbc.co.uk/moira/img/%7bdevice%7d/thumbnail/48063000/jpg/_48063440_48063241.jpg

Will become:

http://static.bbc.co.uk/moira/img/ipad/thumbnail/48063000/jpg/_48063440_48063241.jpg

…which is a fairly simply transformation.

What next?

OK, so we’ve figured out how to bootstrap our app, get the structure of the navigation and news categories, and then receive some HTML-like content from the BBC feeds. So far so good: it looks quite simple, and certainly something that a self-respecting web app is going to be able to do.

The user-agent spoofing may not be possible from an AJAX-like context in a web browser, so that might require me to build a small proxy on a server somewhere, and means my web app will ultimately have to be more than a static HTML file. But then I suspected I would need to do that anyway, to avoid cross-site scripting issues, so that’s no big deal. (If the BBC were hosting the web app themselves, both reasons would be moot).

So, next installment, I’m going to be taking a look at the structure and user-interface of the app and see how easy it is to synthesize the same overall look and feel of it. Since I’ve seen the data in the background, I now suspect that much of the app uses Safari browser components anyway… so I’m quietly confident!

Stay tuned.

Comments (8)

  1. June 13, 2010

    [...] This post was mentioned on Twitter by Des Traynor, John Keyes, David Crowley, mobile jones, James Pearce and others. James Pearce said: Rewriting famous native iPad apps in HTML5: http://is.gd/cO1Ri (and please retweet – I might need help!) [...]

  2. August 7, 2010

    [...] it (using the bbcRewriteMediaUrl function to transform its rather strange syntax, as observed in part I), and add the title of the article beneath it. We have to use getElementsByTagNameNS to get the [...]

  3. August 19, 2010
    Steve Souders said...

    I love this write-up. This is what I do everyday! I just started proxying my iPhone through Fiddler, which allows me to see the headers and responses. (See http://conceptdev.blogspot.com/2009/01/monitoring-iphone-web-traffic-with.html .) Fiddler is Windows only, but you might try Charles on Mac.

  4. August 21, 2010

    [...] has already produced three blog posts rewriting the BBC iPhone app but with HTML5 (Part I, Part II, Part III). I encourage you to read them. He’s already gotten impressively far; here [...]

  5. August 22, 2010

    [...] app into a web app based on HTML5. He has already started 3 blog post on the BBC News apps, Part1, Part2 and Part3. If you see the screenshot below he has gotten pretty far. BBC News (HTML5) BBC [...]

  6. August 24, 2010
    Marcio Silva said...

    Just read Part I and even though I don’t understand everything you are doing I loved the idea and hope to learn a lot…

    Regards,

    Marcio

  7. August 26, 2010
    Iman said...

    wow, thats utterly genius, nice one! : ) wish you the best of luck in doing more of these.

Leave a Reply