It appears that Instaview has been stricken with the recent API restrictions imposed by Instagram. Comments are currently disabled until I can get this functionality restored, however the ability to like/unlike photos and follow/unfollow users is still enabled but not working. I’ve requested these features to be restored through the proper channels. I’m awaiting a reply.
It appears that everything works. However, while looking into this, it worked on one computer in my household and not another. After logging out of Instaview/Instagram and logging back in, likes/comments work.
New Look & Feel
If you’re a regular visitor to Instaview, I’m sure you’ve noticed a few things have changed. I’ve been working on this new layout off and on for about a year (more off than on) and hadn’t planned on releasing it until after the new year. Mainly because there are some features that are missing, but also because I hadn’t fully tested everything I had done thus far. It just wasn’t quite ready. As luck (or, un-luck really) would have it, I went to upgrade the software running on the server and completely borked it in the process. It was in a somewhat unstable state even though the site seemed to up and functional. So, I had to rebuild the server from scratch which prompted an early deploy of the new look. So, here we are; new site built on fresh new server. As a bonus, the site was rebuilt using a new version of CakePHP and appears to be running a little bit leaner (CPU usage is down) and faster (response times seem better).
So, enjoy the new site. Comments are welcome. Just remember that I am still working on getting some of the previous features back. It will just take time for that to happen. There’s also new features I’d like to add as well so this will be an ongoing process.
Previously this blog was hosted on Posterous and while I have nothing against them, I already had a Tumblr account that I actively used so I decided to move. I know Tumblr has had their share of troubles keeping up with demand, but they have a nice product and I like their interface. Again, as luck would have it, they went down the day after I deployed the new code and it broke the home page for a few hours… such is life.
Well, it’s happened before and it happened again earlier this week. The cron job that keeps track of what photos made the popular throughout the day mysteriously stopped keeping track. Took some investigating, but I figured out the cause and a quick solution to make sure it doesn’t happen again.
Timing Is Everything
The cron job is scheduled to run every minute. The script calls the Instagram API twice, sleeping for 20 seconds between calls. Lately, the calls to the database have been slowing down. Probably a combination of the number of records in the database and the query used to exclude duplicate records from making their way into the table. This is causing the script run long which in turn is causing cron job collision.
Lock it Down
The solution was fairly simple and involves obtaining an exclusive lock on a file and only letting the script run if it’s able to obtain that lock. So far, it seems to be working and should prevent the issue from happening again.
While this appears to solve the cron issue, it doesn’t really solve the root of the problem; why the script is running long and making it more efficient.
So, if you notice a few days of popular images have gone missing in the month of March, this is why. Sorry about that. The really sad news is that, for now, the data is gone. There’s no way to retrieve that information from Instagram (at least, not that I’m aware of at the moment).
Hello there. Welcome to the instaview.me blog. I wanted to set up a place where I can share my thoughts and insights about instaview.me as well as somewhere users can go to check on the status of the site.
One of the cool things about instaview is the retention of popular photos. Instagram doesn’t (publicly) keep / share these photos with the public so it was one of my goals to keep a record of all of the photos that made the popular page. This is probably one of the more technically challenging aspects of the site. Keeping track of the photos required some kind of persistent storage; in this case, it’s a MySQL database. Since I started tracking the popular photos back in April (April 30th, 2011 to be exact), I’ve accumulate some 1.2 million records. That’s not a lot of data for a database to handle but it will bring any bottlesnecks to the surface fairly quick.
Let me give you a little history. When the site started out, I was running the site (database and all) on a single 512MB VPS (Virtual Private Server). This means that all the processes running had to do so using less than 512MB of RAM. It’s plenty of room for a small (ish) site with moderate traffic. Around November of 2011, the site started hitting the limits of the server and I began to get emails from monitors about things like CPU usage and disk I/O. It was then that I decided I needed to split the application in two and fired up another VPS just for the database. I had roughly 800k records in the database and this seemed to makes things better, but it was short lived.
About a month ago, the database server started experiencing high I/O usage. At first, it was around 8k IOP/s (this is basically the number disk read/write operations per second). I was concerned at first but didn’t notice much of a performance hit to the site so I only half paid attention to the alerts. I was already working on the latest redesign and figured I’d switch to a NoSQL solution (Mongo) which should help speed up the site. Then, about a week ago, the IOP/s headed through the roof. I quickly went from 8k to 20k, 40k, 80k! Yes, you read that correctly. I was completely I/O bound at the database layer. Of course, by then, the site’s performance fell on on the floor.
I quickly migrated the data to Mongo on my development machine and continued working through the redesign. The whole time I kept asking myself if Mongo would really save the site. I was skeptical so I started looking around the interwebs. After reading several posts and atricles about it, I decided to abandon Mongo a couple days before I planned on launching the new design. I chose to tackle the database performance issue and figure out how to make it faster.
Timestamp Columns Are Awesome… Kinda
Using the appropriate data type for your columns is essential to database performance. Timestamp and DateTime types are pretty important for tracking inserts and changes to your data. They index well enough and are usefull for sorting results by time. Where they are NOT awesome is when you are using them in the selection process and need to scan the table for year-month-day. This is an expensive task. My queries looked something like this:
SELECT ... FROM popular WHERE DATE(timestamp) = '2011-04-30';
This basically scans the entire table, calls the DATE function on every timestamp column and compares it to the string. Actually, using functions in your where clause can cause MySQL to examine more records than it needs. Presumably because the
possible_keys is NULL and therefore the entire table is examined (instead of the index) . This is bad. This is where I got into trouble. It might be somewhat quick on a server with no real traffic (and small amounts of data), but when you starting getting traffic, it becomes the bottleneck, quickly.
How did I solve this? It was pretty easy, actually. When I look up popular photos for a particular day, I’m just looking for records that have a timestamp with a year = , month = and day = . So, I created three new smallint columns, indexed them and modified some queries to do the lookup using those columns instead of the timestamp. There’s no computing a timestamp using the DATE function and it’s a straight num=num comparison. It’s faster and easier on the databse server. To illustrate just how effective this was, take a look at the graph below.
This is what my I/O looked like the past several weeks on this server. You can see the exact time I cut over to the new way of querying the data. There’s a huge drop off in I/O operations per second. Here’s some more nerdy graphs from Munin MySQL monitor.
Again, you can see the I/O drop at the database. Also, the buffer pool dropped significantly, as did the slow queries. Semaphores are practically nil and transactions per second have doubled.
These changes should make the server happier for a little while longer at least. There’s always room for improvement, but I think I can sleep better at night knowing that the disk I/O is back to a more sane level.