Apple has begun rolling out its long-in-the-making augmented reality (AR) city guides, which use the camera and your iPhone’s display to show you where you are going. It also shows part of the future Apple sees for active uses of AR.
Through the looking glass, we see clearly
The new AR guide is available in London, Los Angeles, New York City, and San Francisco. Now, I’m not terribly convinced that most people will feel particularly comfortable wriggling their $1,000+ iPhones in the air while they weave their way through tourist spots. Though I’m sure there are some people out there who really hope they do (and they don’t all work at Apple).
But many will give it a try. What does it do?
Apple announced its plan to introduce step-by-step walking guidance in AR when it announced iOS 15 at WWDC in June. The idea is powerful, and works like this:
- Grab your iPhone.
- Point it at buildings that surround you.
- The iPhone will analyze the images you provide to recognize where you are.
- Maps will then generate a highly accurate position to deliver detailed directions.
To illustrate this in the UK, Apple highlights an image showing Bond Street Station with a big arrow pointing right along Oxford Street. Words beneath this picture let you know that Marble Arch station is just 700 meters away.
This is all useful stuff. Like so much of what Apple does, it makes use of a range of Apple’s smaller innovations, particularly (but not entirely) the Neural Engine in the A-series Apple iPhone processors. To recognize what the camera sees and provide accurate directions, Neural Engine must be making use of a host of machine learning tools Apple has developed. These include image classification and alignment APIs, Trajectory Detection APIs, and possibly text recognition, detection, and horizon detection APIs. That’s the pure image analysis part.
This is coupled with Apple’s on-device location detection, mapping data and (I suspect) its existing database of street scenes to provide the user with near perfectly accurate directions to a chosen destination.
This is a great illustration of the kinds of things you can already achieve with machine learning on Apple’s platforms — Cinematic Mode and Live Text are two more excellent recent examples. Of course, it’s not hard to imagine pointing your phone at a street sign while using AR directions in this way to receive an instant translation of the text.
John Giannandrea, Apple’s senior vice president for machine learning, in 2020 spoke to its importance when he told Ars Technica: “There’s a whole bunch of new experiences that are powered by machine learning. And these are things like language translation, or on-device dictation, or our new features around health, like sleep and hand washing, and stuff we’ve released in the past around heart health and things like this. I think there are increasingly fewer and fewer places in iOS where we’re not using machine learning.”
Apple’s array of camera technologies speak to this. That you can edit images in Portrait or Cinematic mode even after the event also illustrates this. All these technologies will work together to deliver those Apple Glass experiences we expect the company will begin to bring to market next year.
But that’s just the tip of what’s possible, as Apple continues to expand the number of available machine learning APIs it offers developers. Existing APIs include the following, all of which may be augmented by CoreML-compatible AI models:
- Image classification, saliency, alignment, and similarity APIs.
- Object detection and tracking.
- Trajectory and contour detection.
- Text detection and recognition.
- Face detection, tracking, landmarks, and capture quality.
- Human body detection, body pose, and hand pose.
- Animal recognition (cat and dog).
- Barcode, rectangle, horizon detection.
- Optical flow to analyze object motion between video frames.
- Person segmentation.
- Document detection.
- Seven natural language APIs, including sentiment analysis and language identification.
- Speech recognition and sound classification.
Apple grows this list regularly, but there are plenty of tools developers can already use to augment app experiences. This short collection of apps shows some ideas. Delta Airlines, which recently deployed 12,000 iPhones across in-flight staffers, also makes an AR app to help cabin staff.
Steppingstones to innovation
We all think Apple will introduce AR glasses of some kind next year.
When it does, Apple’s newly introduced Maps features surely shows part of its vision for these things. That it also gives the company an opportunity to use private on-device analysis to compare its own existing collections of images of geographical locations against imagery gathered by users can only help it develop increasingly complex ML/image interactions.
We all know that the larger the sample size the more likely it is that AI can deliver good, rather than garbage, results. If that is the intent, then Apple must surely hope to convince its billion users to use whatever it introduces to improve the accuracy of the machine learning systems it uses in Maps. It likes to build its next steppingstone on the back of the one it made before, after all.
Who knows what’s coming down that road?