Mirroring our Medium blog to YLD.io

Medium is an awesome service that allows anyone to get a blog started quickly while providing an easy-to-use and full-featured experience for writing posts. At YLD, we have published posts on Medium since early 2014!

One downside of publishing on our Medium blog, however, is that it is not well-integrated with the rest of our web presence at YLD.io. We wanted visitors of our blog to be able to read our content right on our site, between our regular header and footer, just like any other content on the website.

So we sent a small engineering team on a mission to create a Cloud Function that automatically synchronizes all Medium posts we write to a dedicated blog section on our website. This would allow us to nicely present our content on the site, while still keeping it discoverable on our Medium publication as well. Crucially, we can also retain the frictionless writing experience for our authors that Medium has brought to perfection over the years, instead of reinventing the wheel with yet another WYSIWYG editor.

Fetching the Post Data

First of all: Where do we get the posts and their content from? Sure, we could web-crawl the Medium publication, but maybe there is a cleaner and a less brittle way to obtain the data we need?

And indeed there is — Medium provides an RSS feed for every user and publication. You can view that of the YLD Medium blog, for example, at https://medium.com/feed/yld-blog/. This is where we fetch our latest post data from.

From the rss > channel > item elements in there, we can extract the title, author, tags, and other data from each post — most importantly, the content encoded as HTML with tags such as <p>, <h4>, <img> . This is what a (shortened) post data entry looks like:


  <![CDATA[ The YLD Green Team ]]>
  
  
  
  2019-11-29T15:25:24.120Z
  
    ...
As Kermit the Frog once said, “It’s not easy being green”.... ]]>

For our website, we store the blog posts and assets in Contentful, our headless Content Management System (CMS). To find out which new blog posts we need to synchronize, we retrieve a list of posts that exist in there and compare it to the potential new ones from Medium.

Converting to Markdown / MDX

We now know which Medium blog posts we want to copy to the CMS. But while Medium gives us the post content as HTML, in the CMS, we need to store the post content in the Markdown rich text format. Advanced content, such as embedded Gists or Instagram posts, is represented using tags from the Markdown extension MDX.

To convert the HTML from the Medium RSS feed to MDX for our CMS, we use turndown. However, not every piece of incoming HTML is simple enough to be converted with Turndown’s default configuration. Take this very post as an example — it contains embedded GitHub Gist snippets. Let’s take a look at how we can bring such embeds to our custom blog.

MDX and Post Embeds

In our website’s blog post renderer component that creates a page based on the MDX content of the post, we have the capability to customize the implementation of every element generated from the MDX. Just like this means, for example, that we can set custom paddings, font weights, and line heights for<h1> tags, it also means that we can define an implementation for a <Gist> , <Tweet> , <YouTube> tag or any other tag that we come up with.

MDX and Post Embeds

So taking Gists as an example, we can provide react-gist as the implementation of any <Gist> tag in the MDX. Now, all we need to do is somehow transform what Medium gives us in place of an embedded Gist to a <Gist id="abcd1234" /> tag.
So what does Medium give us, to begin with?


  <a href="https://medium.com/media/cdef3456/href">
    https://medium.com/media/cdef3456/href
  </a>

Well, that’s weird. And there’s no trace of a Gist or its id in here.
Let’s put this media link into an <iframecontent:https://medium.com/media/cdef1234/href> for now.

What if we try to GET that URL?

HTTP/1.1 302 Found
Location: https://gist.github.com/abcd1234

Great! Now we can extract the Gist id from the last path segment of the URL, an generate the <Gist id="abcd1234" /> tag to be stored in the CMS with the rest of the post content MDX.

Finishing up

We run the MDX through the mdx compiler, just to catch any cases of possibly invalid MDX early, and after successful validation, push the blog entry to the CMS. The whole process is run in a Lambda function triggered by Zapier whenever it detects an update to the Medium RSS feed.

There’s a few other bits and pieces going on not mentioned in this post, like downloading and pushing image assets along with the blog post, but that was the gist (ha) of it. Because all our website code is public, you can check out the full source code of this implementation. There are certainly some pieces that could be improved, but after a lot of iteration, this version has worked for all of our existing blog posts. This is how the final result looks:

I would like to thank the whole YLD.io design and engineering team for their work on making the new blog experience happen and making it this smooth. Special credit goes to Ollie Monk, who designed and wrote the full initial implementation!

Mirroring our Medium blog to YLD.io

Fetching the Post Data

Converting to Markdown / MDX

MDX and Post Embeds

MDX and Post Embeds

Finishing up

View more blogs

Combatting sophisticated cybersecurity threats with AI

Why Evals are the missing link to your AI strategy

Get in touch