Infinite Scrolling for SEO: Depaginate Your Results

I read a very interesting article by AJ Kohn a few weeks ago about crawl budget. Being a technical person, and a developer in my former life, I am always interested in better information we can provide GoogleBot. There was one part of that article that really jumped out at me I want to know more about: Depagination.

Mobile is obviously on the rise. We all know it. We often recommend to our clients on a new site build to go with a responsive design, especially since that is what Google recommends. I especially like this solution because I am starting to notice that mobile results are different than desktop results. We were told at one point by Google this would eventually happen, and the clients I do have that use a separate mobile site (thin content too), we are showing drastically different rankings per device. A responsive design helps make sure all of your content will rank well across devices.

Mashable

But when I saw AJ’s example about depagination, it made me curious. While this isn’t a cure for all people, and isn’t appropriate in all situations, I decided to have this implemented in a new site we’re launching at the end of the year. If you look at the big social sites (Twitter, Facebook, Pinterest, Google+), they all use infinite scroll heavily. Mashable, which publishes a large volume of content, implements this on their home page as well as their main category pages. My issue with this type of implementation was the SEO implications.

Load Infinite Content: The SEO Way

When you look at the requests made by a home page like on Mashable, you see JavaScript queries happening as the user scrolls to the bottom of the page. This was my biggest concern as I can’t imagine GoogleBot is smart enough yet to scroll around a page to see what content might pop up. So while it might not be a complete disaster, some content might not get discovered if it’s buried somewhere else on the site. Especially if you are dealing with an e-commerce site. You want to make sure that all of your products show up.

But if you turn off your JavaScript and reload the page, you will notice they have paginated results at the bottom pointing to their different archive pages. This covers both bases; you’ve got a richer UX for your user, and you gracefully degrade to make sure crawlers will find the extra pages of content. So in this situation, the regular paginated links like “Previous 1 2 3 4 5 6 Last”.

In my situation (not always appropriate), on any page NOT page 1, you can give the robots meta tag directive “NOINDEX,FOLLOW”. This will keep paginated results from being indexed, but products will still get crawled. We only want the first paged indexed on an infinite scroll, right? And you still get the direct path to every product. The rel=”prev” and rel=”next” directives could be used as well.

Steps

Either have all results on a single page or have a paginated result set of all products for a JavaScript disabled version of the inventory/search page
Make sure this inventory listing section degrades gracefully when JavaScript is not available
Hide the the styling and HTML tags of the paginated section using JavaScript when it is enabled (NOTE: I will be experimenting with removing it completely when JavaScript is enabled)
Add the meta robots tag <meta name=”ROBOTS” content=”NOINDEX,FOLLOW”> to keep those extra pages from being indexed.

Just don’t forget… You must choose whatever approach makes sense for your site (and for your budget). This might not work in all situations, or it in some cases might require a full redesign. Can you imagine if you just launched 6 months ago, and completely changed your mind on how to serve up content? It happens, and standards change, but sometimes the CFO may not agree with your ‘revelations’.

How to Setup De-paginated Blog on WordPress

If you are on the WordPress platform, I put together some simple instructions on how to accomplish this on your site. We use this functionality on this site on the main blog page as well as the category pages.

Step 1: Prepare the Pagination for a ‘No Javascript’ Environment

While search crawlers are increasingly sophisticated, and use JavaScript in their crawling, we should show pagination only in environments where JS is disabled. This is for the users sake, as well as a crawler bot. Essentially, you want to put whatever pagination you have on pages you want to use this feature in a “<noscript>” tag. Here is an example:


<noscript>
<div class="page-nav">
<?php global $wp_query; $total_pages = $wp_query->max_num_pages;
if ($total_pages > 1)
{
$current_page = max(1, get_query_var('paged'));

echo paginate_links(array(
'base' => get_pagenum_link(1) . '%_%',
'format' => '/page/%#%',
'current' => $current_page,
'total' => $total_pages,
));
}
?></div>
</noscript>

Step 2: Add paginated query to your theme’s functions.php

There are two functions we need to add here: (1) Handles the AJAX query of new posts and (2) a function for modifying the meta robots tag. As mentioned above, we are going to give search engines the directive to not index any pages that are not page 1, but make sure the crawl all the links on those pages. First, the AJAX query method:


function wp_infinitepaginate()
{
    $loopFile        = $_POST['loop_file'];
    $paged           = $_POST['page_no'];
    $posts_per_page  = get_option('posts_per_page');

    # Load the posts
    query_posts(array('post_type' => 'post', 'paged' => $paged, 'post_status' => 'publish' ));
    get_template_part( $loopFile );

    exit;
}
add_action('wp_ajax_infinite_scroll', 'wp_infinitepaginate');           // for logged in user
add_action('wp_ajax_nopriv_infinite_scroll', 'wp_infinitepaginate');    // if user not logged in

And this method handles the meta robots directive. In this case, and since so many people use it, I have attached the meta robots method to Yoast’s WordPress SEO plugin. If you do not use this plugin, this function will not work!


function rh_meta_robots_update($robotstr)
{
    if (is_paged())
    {
        if ($robotstr != '')
            $robotstr .= ',noindex,follow';
        else
            $robotstr .= 'noindex,follow';
    }
    return $robotstr;
}
if (class_exists('WPSEO_Frontend'))
{
    add_filter('wpseo_robots', 'rh_meta_robots_update', 10, 3);
}

Step 3: Add a new Loop file for AJAX calls

You don’t necessarily have to add a completely new file for this instance, but I made one for clarity’s sake. For our purposes, I named this file loop.php. If you already have a file named that, call it whatever you like, and I will reference later in the code where you need to make the change. Please note, the way you output blog post snippets is probably going to be different. Update this code to whatever fits with your current blog. The real purpose here is to loop through the posts returned from the AJAX query above. I am using code from the default WordPress TwentyThirteen theme.


<?php if ( have_posts() ) : ?>
			<header class="archive-header">
				<h1 class="archive-title"><?php printf( __( 'Category Archives: %s', 'twentythirteen' ), single_cat_title( '', false ) ); ?></h1>

				<?php if ( category_description() ) : // Show an optional category description ?>
				<div class="archive-meta"><?php echo category_description(); ?></div>
				<?php endif; ?>
			</header><!-- .archive-header -->

			<?php /* The loop */ ?>
			<?php while ( have_posts() ) : the_post(); ?>
				<?php get_template_part( 'content', get_post_format() ); ?>
			<?php endwhile; ?>

			<?php twentythirteen_paging_nav(); ?>

		<?php else : ?>
			<?php get_template_part( 'content', 'none' ); ?>
		<?php endif; ?>

Step 4: Add AJAX Loading GIF to the Appropriate Templates

For this step, unless you have an image you would like to already use, just head over to this website to generate a new GIF to display on your site as new posts are loading. Once you have selected the type of loading indicator you want, we need to insert the following code on a few templates. We do it this way so the loading indicator doesn’t show up on pages where it isn’t necessary. We do this on our blog page, and the category pages, but you can certainly make an infinite home page, if that works for your site design. I just placed this at the bottom of the two templates I had chosen.

<a id="inifiniteLoader">Loading... <img src="<?php bloginfo('template_directory'); ?>/img/ajax-loader.gif" /></a>

Step 5: Add some Javascript to your Footer

I added this code into the footer of the page, mostly because there are a few PHP calls that need to be executed. We need to find out how many total pages their are in the paginated result set, and to auto-generate the ajax API URL. Once we get to the end of the pages, we’ll stop querying the database for more pages. Also please notice in the ajax call below, there is a query parameter in the URL for ‘loop_file’. Remember above with our loop.php? If you changed the name of your file, make sure you update the ‘loop_file’ query value to the appropriate name.


<script type="text/javascript">
    var count = 2;
    var total = <?php global $total_pages; echo $total_pages; ?>;
    jQuery(window).scroll(function(){
        var loadImg = jQuery('#inifiniteLoader');
        if (isScrolledIntoView(loadImg) == true)
        {
            if (count > total){
                jQuery('a#inifiniteLoader').hide('1000');
                return false;
            }else{
                loadImg.show('1000')
                loadArticle(count);
            }
            count  ;
        }
    });
    function loadArticle(pageNumber){
        jQuery.ajax({
            url: "<?php bloginfo('wpurl') ?>/wp-admin/admin-ajax.php",
            type:'POST',
            data: "action=infinite_scroll&page_no="  pageNumber   '&loop_file=loop',
            success: function(html){
                jQuery("#content").append(html);
            }
        });
        return false;
    }
    function isScrolledIntoView(elem)
    {
        var docViewTop = jQuery(window).scrollTop();
        var docViewBottom = docViewTop   jQuery(window).height();
        var docHeight = jQuery(document).height();
        var winHeight = jQuery(window).height();

        var elemTop = jQuery("#inifiniteLoader").offset().top;
        var elemBottom = elemTop   jQuery("#inifiniteLoader").height();

        return ((elemBottom >= docViewTop) && (elemTop <= docViewBottom)
                && (elemBottom <= docViewBottom) &&  (elemTop >= docViewTop) );
    }
</script>

That should about cover it! If you have any questions, have a hard time getting this working, or just have some corrections, please let me know below in the comments.