The average Facebook user has never heard of HydraBase, but the souped-up version of Apache HBase, an open-source distributed key value data store running on top of HDFS, was instrumental in the social network’s move in 2010 to revamp its messages inbox to include Facebook messages, SMS, chat, and email. Since then, the technology has been use to launch other features, as well.
How did Facebook manipulate the Hive storage format to enable it to deal with a data warehouse that stores some 300 petabytes and takes in about 600 terabytes per day? RCFile (record-columnar file format) wasn’t enough, so enter ORCFile.
Facebook offered some insight into how it handles the more than 300 petabytes of data it stores for its 1.19 billion monthly active users, providing some details on Presto, an interactive query system it created and is open-sourcing, in a note on the Facebook Engineering page.