Services/Sync/Server/Archived/HereComesEverybody/HBaseNotes

From MozillaWiki
< Services‎ | Sync‎ | Server‎ | Archived‎ | HereComesEverybody
Revision as of 17:03, 8 February 2010 by LesOrchard (talk | contribs)
Jump to navigation Jump to search
  • "HBase is the Hadoop database. Use it when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware."
  • "HBase is an open-source, distributed, column-oriented store modeled after Google' Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop"
  • http://www.roadtofailure.com/2009/10/29/hbase-vs-cassandra-nosql-battle/
    • "If you need highly available writes with only eventual consistency, then Cassandra is a viable candidate for now. However, many apps are not happy with eventual consistency, and it is still lacking many features. Furthermore, even if writes do not fail, there is still cluster downtime associated with even minor schema changes. HBase is more focused on reads, but can handle very high read and write throughput. It’s much more Data Warehouse ready, in addition to serving millions of requests per second. The HBase integration with MapReduce makes it valuable, and versatile."
  • http://spyced.blogspot.com/2009/03/why-i-like-cassandra.html
    • "Follows the bigtable model, so it's more complicated than it needs to be. (300+kloc vs 50 for Cassandra; many more components). This means it's that much harder for me to troubleshoot. HBase is more bug-free than Cassandra but not so bug-free that troubleshooting would not be required. Does not have any non-java clients. I need CPython support. Sits on top of HDFS, which is optimized for streaming reads, not random accesses. So HBase is fine for batch processing but not so good for online apps."