100 petabyte clusters! 60,000 hive queries a day! Facebook’s latest 1,800-word engineering blog post has one goal: proving to the world’s top programmers that if they want a challenge, they should work for the social network. There’s not much for the layman beyond that Facebook’s data warehouse is 2,500 times bigger than in 2008. This is back-end geek porn, and it’s critical to Facebook’s longterm success.
But Facebook has one thing young startups don’t have. Or should I say one billion things. Its massive user base means that what it builds seriously influences the world, and it’s trying to solve engineering problems on the forefront of computer science. At first glance, though, it might just seem like another consumer product. That’s why it needs blog posts like “Under the Hood: Scheduling MapReduce jobs more efficiently with Corona”.
The note details the limits of the Hadoop MapReduce scheduling framework, and how Facebook built its own version of Corona to surpass those limits. Facebook has open-sourced Corona and it’s now on GitHub. The benefits include dropping slot refill times from 10 seconds with MapReduce to just 600 milliseconds, cutting job latency in half, and better cluster utilization and scheduling fairness. I’m not going to paraphrase them any more, so if that stuff fascinates you, read the post.
Facebook has been publishing engineering blog posts for years, but the Under The Hood series started right about when it filed to IPO. Old eng blog posts used to be more about the human story of building Facebook’s back-end, but seem to have gotten more hardcore since it went public. And that’s smart, because it doesn’t have the financial windfall of a rapidly rising valuation to attract engineers anymore.
Facebook must show it is a riddle, wrapped in a mystery, inside an enigma, because that’s what gets great programmers fired up.
No comments:
Post a Comment