Core Data Image Caching
It is a common pattern for an app to do something with data that is not local. Maintaining decent app performance while fetching the data is one challenge (and not the focus of this article). Keeping the data, and enough of it, for later use and to prevent having to get it again, is another challenge. There are obvious solutions, some easy and some not, and all with their own advantages and disadvantages.
What is Caching?
Caching is, quite simply, keeping something around for later use that was “hard” to get in the first place. This could be external data, a calculated value, or anything that you’d rather not have to acquire more than once if you can avoid it.
Caching should not be considered a boundless process, however. By that I mean you can’t just cache everything forever! You have to have a way to purge the cache periodically based on some algorithm that makes sense for your application. A common (and simple) algorithm is Least Recently Used, or LRU. With this algorithm, the theory is that the items in your cache that have been used the least are candidates to be purged form the cache based on some threshold of non-use. You can imagine that on a mobile device purging the cache is important due to the inherent space limitations. You could also choose to cache a limited amount of data (but you’d still need to know what to drop if the cache got too big.)
Goals of Caching
The reasons to cache are pretty straightforward:
- Performance
- Performance
- Performance
But how is this achieved? It depends partially on what a reasonable definition of “performance” is for your app. Performance as it relates to the cache will be evident in several different places:
- Initial cache load
- Cache access
- Cache purge
- Cache save
As distinctly as I’ve made this list, it turns out that each of these things are related to one another to some degree. For example, cache purging is something you might execute during cache load. The frequency of cache save should be related to something like how “dirty” the cache is (how many things have changed since the last save) or how much time has elapsed since the the last save.
Implementation Options
There are several options for implementing a cache. Your choice depends a lot on what kind of data you are caching, how much data you need to cache, how you will manage your cache (i.e. purging), and probably a lot of other things specific to your app.
The obvious choices include:
- dictionary
- files
- database
- or some combination of these
Let’s look at each one.
Dictionary
Using a dictionary seems like an obvious choice. They are easy to use, and easy to understand. For very small data sets, this can be a good choice. But once your data set gets “large”, you will start to notice performance issues especially when you need to load it, purge it, or save it. Using a dictionary is certainly easy and fast. So too, reading and writing a dictionary is easy, but as your data set grows, these actions will become slow because you’re reading and writing the entire dictionary.
Pros: Fast and easy, especially for small data sets
Cons: Not scalable, i.e. performance slows as the data set grows.
Files
Individual files could be an improvement. If you’re caching say, thumbnail images, you could devise a scheme for writing each file to the file system in a manner that later allows you to find it again.
This has the advantage that you’re only reading and writing a single file at a time. It has the disadvantages that you could be putting a heavy load on the file system and it is hard, without additional meta data stored elsewhere, to manage the cache in terms of purging old data or limiting it’s size.
This approach is a little more complicated than using a dictionary, but still simple enough to understand and to deal with.
Pros: Still simple, and should be relatively fast to store and retrieve data except in rare cases
Cons: Hard to manage cache purging
Database
You probably wondered when I’d get to CoreData!
A database affords a nice abstraction to your data, not to mention random access to the data and the delegation of certain “hard” tasks to the underlying database.
CoreData actually is useful for caching, I have learned, because of the benefits of database-like operations. (Lest I be reminded of it in the comments: CoreData is not a database. 🙂 )
On more than one occasion now I have used a CoreData-based cache for storing image data locally. The performance benefits alone were worth it. But I didn’t start with CoreData. I actually started with a dictionary. In one app we store about 3MB or more of image data, with a 30 day expiration on individual entries. It turned out that trying to read or write this much data as a dictionary was slow! And worse, the entire dictionary had to be traversed to find entries to expire, which made things worse.
An Example
By way of example, I wanted to point out just a few things about my CoreData image cache, now in use in a couple apps.
The model started out pretty simply:
I was a CoreData noobie when I first developed this, so I defined the image data attribute as Binary Data. This worked, but required that I do the work in my code to transform the data into a UIImage. It turns out CoreData can do this sort of transformation for you using the attribute type Transformable. More on that later.
The imageUrl attribute is actually the key for each record. And the lastUsedTimestamp is the key (so to speak) to managing cache purging.
When my app starts, and the model and associated components are initialized, the first thing that is done is to purge the cache of anything that has expired. This is done using a predefined fetch request:
And the following code:
[code lang=”objc”]
// Fetch *expired* entries to delete them
NSDate *interval = [[NSDate date] dateByAddingTimeInterval:-60*60*24*30];
NSDictionary *subs = [NSDictionary dictionaryWithObjectsAndKeys:interval, @"DATE", nil];
NSFetchRequest *request = [managedObjectModel fetchRequestFromTemplateWithName:@"AllExpired" substitutionVariables:subs];
NSError *error = nil;
NSArray *results = [managedObjectContext executeFetchRequest:request error:&error];
for(NSManagedObject *object in results) {
[managedObjectContext deleteObject:object];
}
[/code]
There are many things to like about this code (I think). For one, it is compact. There is no need to iterate over the entire cache to find the entries in which we’re interested. In fact, we just let CoreData deal with that. And, we actually have to touch a very small number of objects (if any) to delete them. In the end, CoreData manages which objects are dirty, or deleted, and does the right thing.
For what it’s worth, the code to fetch entries based on their URL is similarly compact. We use a different predefined fetch request for that.
As an optimization, I wanted to stop storing Binary Data (and converting it in my code to UIImage) and instead use a Transformable. It seemed straight forward enough: Create a new version of the model, change the data type Binary Data to Transformable, and let Lightweight Migration take care of the rest. Unfortunately, that didn’t work. Here’s the model I ultimately came up with:
as well as the mapping model for the migration:
This allowed Lightweight Migration to work, but I had to do a little extra work in my code to move the data from data to the new attribute image of type Transformable:
[code lang=”objc”]
if (entry.data) {
[entry setImage:[UIImage imageWithData:entry.data]];
entry.data = nil;
}
[/code]
The variable entry is a managed object newly fetched from CoreData. For the case where there is old data hanging around, we need to set the image attribute to a UIImage created using that data. CoreData will take care of converting back to binary data, and store it that way. And in the future, the image attribute will just be an image we can manipulate.
This is perhaps not the most efficient way to have done this, but it seemed to work and I couldn’t find another solution. We also set entry.data to nil so that we don’t have two copies of the binary data in CoreData and so we don’t do this “upgrade” every time for every image.
It’s worth nothing, at least, that this little data migration is only done as images are called for by the UI. So the pain of this is spread out as the app is used. And there is always a chance that cache entries will be aged out before they are ever converted with this code. I take solace in that. 🙂
Conclusions
CoreData rocks! Once you get past the standard but necessary boilerplate code you must write to use CoreData in your app, it is quite nice to use and really frees you from worrying too much about how to manage the data and lets you focus on how you need to use the data in your app. For caching purposes, I think it is a really nice alternative and it seems to be working well for me so far.
What are your thoughts on caching with CoreData? Let me know in the comments.
Would it not be better to store the image data on disk and just use the core data objects to keep track of the meta-data (url, last used time stamp), and then override the prepareForDeletion method to clean up the files? Storing the files so you can find them easily and have them unique is easy – just make a hash out of either the url or the core data id (make sure its a permanent one and not the temporary one :P)
That way you can fetch and loop through and do whatever with the Cache records, without making core data fetch the image data into ram constantly. Should be a reasonable performance boost without much effort…
Hi Tom- Thanks for that suggestion; it’s a nice optimization. One detail I left out of my article is that once I fetch a managed object from CoreData, I keep it in an in-memory dictionary. So once the image data (as part of the managed object) is read in, it’s in memory. So the next time it’s needed, that I/O is not required. That being said, your suggestion is still valid and moves in the direction of combining available caching techniques into interesting hybrid solutions. Thanks!
Did you keep the “data” field because it already existed, for compatibility with previous versions, or for some other reason? I think I would want to fetch the content for an URL and make it into an image straight away, since the request is presumably happening because we need to show that thing on the screen right now.
It was a move of desperation. 🙂 My hope was that the lightweight migration would automagically migrate the original model (url, data, timestamp) to the new model (url, image, timestamp) using a mapping model that mapped data -> image. But that did not work. So I made the new model include all the old model’s attributes plus a new attribute, image. The mapping model maps the old attributes directly to their counterparts in the new model and sets the new image attribute to 0 (or nil, effectively). Then in code, I check for a nil image value in the managed object and I transform in code from the data attribute to the image attribute, and set the data attribute to nil. This is perhaps not the “right” way to have done this, but it worked.
This is easier if you use relationships for Large Data Objects (BLOBs). As with all performance issues I would advise using the simplest method until its a performance problem. I’ve been much happier when I get it to work first and then use profile driven optimizations. Additionally, after 20 years of programming, its also amazing how quickly optimizations with libraries or OS’s obsolete application level optimizations.
Hi Todd- Thanks for your comment. As for using the simplest methods first, I agree. However, in this particular case, where I started using an NSDictionary, I found that reading and writing it became a real bottleneck, which is why I started looking at CoreData.
I think the one thing to keep in mind here is iCloud Backup, specifically if you are using Core Data on iOS.
Specifically, if you are storing your Core Data sqlite file in the Documents directory iCloud is going to automatically back it up when the user connects their phone to power (if they have so configured ).
The issue with storing blob data in that sqlite file is that this single file is now growing in size with each blob added and I’d think iCloud would then re-backup the entire file in turn, no? This can get prohibitively slow / data intensive when adding a 5KB image to a 500MB database means a new 500MB+5K backup.
Unless Apple has some sort of “row-level” backups strategy when dealing with Core Data sqlite files? They do row-level synchronization for iCloud sync so it’s possible – but I can’t find any reference to this anywhere.
Dylan- You are absolutely correct. In the implementations of this CoreData caching scheme I have explicitely NOT put the cache file in the ~/Documents directory for exactly the reasons you highlight. It is a cache, after all. So the right place to put the file is in ~/Library/Caches, so iOS can reclaim that space if it needs to.
There is a feature added in iOS 5.0.1 that lets an app set a “no backup” bit on any file. That’s another way to tell the system not to back up something.
Basically, the rule of thumb is that if the user created the data, it should go in ~/Documents. If they didn’t and it can be easily recreated or refetched, it should go into ~/Library/Caches.