Oleg is a Principal Architect with Hortonworks responsible for architecting scalable BigData solutions using various OpenSource technologies available within and outside the Hadoop ecosystem. Before Hortonworls Oleg was part of the SpringSource/VMWare where he was a core engineer working on Spring Integration framework, leading Spring Integration Scala DSL and contributing to other projects in Spring portfolio. He has 17+ years of experience in software engineering across multiple disciplines including software architecture and design, consulting, business analysis and application development. Oleg has been focusing on professional Java development since 1999. Since 2004 he has been heavily involved in using several open source technologies and platforms across a number of projects around the world and spanning industries such as Teleco, Banking, Law Enforcement, US DOD and others.
As a speaker Oleg presented seminars at dozens of conferences worldwide (i.e.SpringOne, JavaOne, Java Zone, Jazoon, Java2Days, Scala Days, Uberconf, and others).
Hadoop is entering a new era where more and more clients are now looking not only for a platform to store a lot of data, but also for the platform that can continue to accept a lot of data at the very high rates. While it seems like a simple case, the complexity as always is in the details. For example; Once the ingest begins it should never stop. How would it affect the MapRedice and Query. Parallel ingest. Complex Event Processing (CEP) and more. The conventional means of ingesting and dealing with such data are starting to show its limitation thus requiring an alternatives. This talk will explore the area of real-time data ingest into Hadoop and present the architectural trade-offs as well as demonstrate alternative implementations that strike the appropriate balance across the following common challenges:
- Decentralized writes (multiple data centers and collectors)
- Continuous Availability, High Reliability
- No loss of data
- Elasticity of introducing more writers
- Bursts in Speed per syslog emitter
- Continuous, real-time collection
- Flexible Write Targets (local FS, HDFS etc.)