Lars George has been involved with HBase since 2007, and became a full HBase committer in 2009. He has spoken at many Hadoop User Group meetings, and conferences such as ApacheCon, FOSDEM, QCon, or Hadoop World. He also started the Munich OpenHUG meetings. Lars now works for Cloudera, as the Director EMEA Services, managing a team of Hadoop solutions architects in and around Europe. He is also the author or O'Reilly's "HBase - The Definitive Guide".
This presentation is for the more skilled HBase practitioner (or those that attended the earlier introductory talk) seeking to understand the inner workings of HBase, and how to make efficient use of its idiosyncrasies. The talk discusses common pitfalls in designing data schemas for applications running atop of HBase. It addresses the initially confusing shortcomings of its API, and explains advanced concepts to overcome the lack of transactions and other concepts known to exist in traditional databases. The presentation concludes with useful advice on how to plan for proper HBase cluster sizing, based on experience gained from many real-world installations.
HBase is the Hadoop Database, adding sometimes sorely missed random updates allowing to build applications on top of Hadoop that can modify and serve data in real-time. These types of application complement the batch oriented nature of Hadoop, and facilitate new kinds of data processing. The implicit sorting feature of HBase lends itself to serialize otherwise unordered events, for example in the context of sessionization. This talk will introduce the attendee into the basic concepts behind HBase and its programming API. Using real-world examples it shows how to apply the feature set of HBase to an application design, while possibly replacing the need to use a traditional RDBMS.