My Vacation in Android Land
Recently I was pulled into a project to build an Android app at Endless. While Android is Linux, it’s quite a bit different than “traditional” Linux1. In our case, we’re trying to assemble a Python app into an Android app using python-for-android (aka, p4a). As you might imagine, this adds a couple more layers to the mix, which is always fun.
Magnets P4A Apps, How Do They Work?
I was pretty mystified by p4a for the first couple weeks, but now that I’ve been down in the weeds for a while I mostly understand how it works. An Android app is typically written as a Java class with several class methods defining the interface to the app.
Python is obviously not Java, so… what’s going on? You can access C or C++ objects using Java’s JNI. What p4a does is build a tiny shared library that provides the JNI entry points and runs Python by embedding the interpreter. The embedded Python interpreter then runs the Python application code. There’s not a lot of code to do this, but it does some things that look a bit unsavory. I’m oversimplifying it and I didn’t even mention services, but I found that to be interesting.
The way the actual Python application is delivered also seems unique.
Since the Python code doesn’t fit in with the Java classes and native
libraries used for the program, a tarball is shipped as an asset
containing all the Python packages and C extension modules. Essentially
a virtualenv tree or pip
install target directory. At runtime the
tarball is expanded in the app’s installation directory. Python itself
is only delivered in the libpython3.x.so
shared library that the above
libmain.so
links to.
Fun with External Storage
The application we’re developing downloads a lot of content, so naturally we want to use it with external storage like an SD card rather than fill the internal storage. In current Android a model referred to as scoped storage is used to mediate access to external storage. In a nutshell, an app has full privileges to its own private directory but has to request permissions to read or write anywhere else.
Using the app private directory seems fine as the app isn’t really intending to share this content or use arbitrary files from external storage2. However, we were seeing unexpected errors in this setup. Thus began my descent into Android filesystem handling.
At first I was convinced these were issues in the filesystem since the
SD card was formatted with exFAT and on Chromebooks that uses
fuse-exfat rather than the more recently added in kernel
exfat
module. It certainly wouldn’t be the first time we’ve
encountered issues with a FUSE filesystem. However, most of the issues
persisted even when formatting the card as regular FAT, which uses the
kernel’s fat
module.
The first issue I saw was an SQLite error - sqlite3.OperationalError: disk I/O error
. After adding in a bunch of hacks to get a more detailed
error out, it became sqlite3.OperationalError: disk I/O error (5386): No such device
. The 5386 value maps to the SQLITE_IOERR_SHMMAP
extended error code, and No such device
is ENODEV
. After looking
closer, the SQLite database was setup to use write-ahead
logging, which relies on mmap
with a MAP_SHARED
mapping. Per the documentation, the error can be
interpreted as:
ENODEV
The underlying filesystem of the specified file does not support memory mapping.
As mentioned above, I was sure this was a fuse-exfat
issue and started
working on making the app only upgrade the database to write-ahead
logging when it was supported. And then suddenly that error stopped
occurring. Maybe this was after a ChromeOS upgrade, but yay?
The fun continued when that error went away but a new failure showed up
saying that the app private directory wasn’t writable. That doesn’t make
sense as the app should have permission to write to the directory and
actually already was writing to the directory. This was a failing
os.access(path, os.W_OK)
call.
For the moment I commented out that sanity check to see if the app would
otherwise work. Next it failed with EPERM
trying to read app private
directory contents with os.listdir
. Again, this makes no sense as
listing the directory entries should be allowed.
Then I got a bit lucky. By now I had found that scoped storage on
external storage was implemented in newer Android with
FUSE3 in conjunction with a component called
MediaProvider
. Since os.listdir
is really opendir
followed by
readdir
, I think I searched for MediaProvider opendir
and got a link
to the MediaProvider
code. With “traditional”
linux I’m familiar with how to get the code for random components. In
Android I had no idea how to do this, so this was hugely helpful.
With the MediaProvider code in hand, I was able to figure out exactly what was happening for these syscalls when they go through Android’s custom FUSE daemon. With that I was able to open an issue detailing the problem along with the commit that likely regressed it.
Since I have no control over when that will be fixed, it was on to some
workarounds. I ended up monkey patching os.access
and os.listdir
so
that if they fail with EPERM
and the path was in the app private
directory, they would squash the failure. For os.access
that was easy
enough to just return True
, but for os.listdir
the best I could do
was return an empty list since I wasn’t aware of another way to get the
directory entries. That obviously isn’t very helpful, but at least for
our app it appears that the worst that will happen is it will keep
making backups when it shouldn’t.
Sadness with Threads
From an Android app you usually need to access some Java class to properly integrate it. The way this is done in p4a apps is to use pyjnius. Something like:
from jnius import autoclass
Bar = autoclass('org.foo.Bar')
Bar.do_something()
This uses the JNI’s FindClass
function to load the Java class then
uses the Class
reflection methods to create a Python
class through the entire Java class hierarchy. It’s all pretty
impressive from both the Java and Python sides.
What you often need to do in a p4a app is access the Java class that
wraps the app itself. In p4a this is the
org.kivy.android.PythonActivity
class along with some other auxilary
classes. These classes need to be loaded using the app
ClassLoader
rather than the Java system
ClassLoader
since the app classes are not exposed to the rest of the
system.
While hacking on some features in p4a, I found that the test app could
not load the PythonActivity
class when using the
webview bootstrap. As our app uses the webview
bootstrap and makes use of jnius
for loading several Java classes, I
really wanted to understand that issue more.
At first I thought that the webview
JNI code was
probably doing something wrong. This code is not used with the default
sdl2
bootstrap where SDL’s JNI code is used and the test
app is able to resolve the app classes. In fact, the webview
JNI code
is a copy of SDL’s code from roughly 5 years ago. In the intervening
years the SDL code had been refactored and become a bit more
sophisticated, so I figured it was handling this case more correctly.
However, after reworking the code in p4a in the same way, nothing
changed. It was also odd that use of jnius
class loading wasn’t
completely broken. It would work when used early in the test app. I had
read the JNI FAQ about FindClass
failing with one
part sticking out:
You can get into trouble if you create a thread yourself (perhaps by calling
pthread_create
and then attaching it withAttachCurrentThread
). Now there are no stack frames from your application. If you callFindClass
from this thread, the JavaVM will start in the “system” class loader instead of the one associated with your application, so attempts to find app-specific classes will fail.
That seemed important to note, but the test app wasn’t using threads. Or
was it? The webview
test app uses Flask with it’s simple WSGI
server. After looking closer, Flask runs werkzeug
’s WSGI
server with threaded=True
by default. Since the
test app tries to load the Java classes within Flask’s view handlers, it
was using the JNI from a new native thread every time. Therefore, it was
using the system class loader and not the app class loader. In
retrospect, I should have thought of this earlier as Flask is going to
have abysmal performance unless the request handlers are running from
threads or asynchronously.
For the purposes of the test app, I just changed it to run Flask with
threaded=False
so it could preserve it’s behavior of only using Java
classes within the view handlers. However, I think the general solution
for p4a is just to document that you need to resolve Java classes before
starting any threads.
Packing Up?
Not yet. Right now I’m working on getting a PR to
enable use of Android’s WorkManager in p4a over the
finish line since that’s desired for our app and will become more
important for Android 12. In fact, that was the motiviation for making
the test app work in webview
mode as I need something simple to
iterate with to make sure the new features work.
I think I’ll be happy to return home to “traditional” Linux, but it’s been a fun learning exercise in Android land. It’s a place I’d been interested in visiting for a while since I use it on my phone and it’s actually orders of magnitude more frequently used than “traditional” Linux.