Impala ●
What is it ?
●
How does it work ?
●
Performance
●
Formats
●
Architecture
www.semtech-solutions.co.nz
[email protected]
Impala – What is it ? ●
Adhoc real time query for Hadoop
●
Open source
●
Developed by Cloudera
●
Based on Google 2010 dremel paper
●
Direct data access via Impala engine
●
Future Hadoop parquet update will –
Add columnar binary storage to Hadoop
–
Improve Impala performance
www.semtech-solutions.co.nz
[email protected]
Impala – How does it work ? ●
Direct data access
●
Query planning / coordination on data nodes
●
Node based query engine
●
Low latency
●
Perfomance imrovement
●
Query data on HDFS or Hbase
●
Uses same Hive QL syntax ( SQL like )
●
Has the Hue GUI
●
Allows table joins and aggregation
www.semtech-solutions.co.nz
[email protected]
Impala – Performance Impala delivers performance gains ●
IO bound queries – hardware limitations –
●
Complex – multiple MapReduce stages –
●
Min 3 times Min 7 times
Cached queries –
Min 20 times
www.semtech-solutions.co.nz
[email protected]
Impala – Formats Supported formats –
Text & Sequence Files which can be compressed as
–
Snappy ● GZIP ● BZIP Future support for ●
● ● ● ●
Avro RCFile LZO text file Parquet
www.semtech-solutions.co.nz
[email protected]
Impala – Architecture
www.semtech-solutions.co.nz
[email protected]
Impala – Requirements What does Impala need to run ? –
CentOS 6.2
–
or RHEL (Red Hat Enterprise Linux)
–
CDH 4.1 (Cloudera Hadoop Distribution)
–
Cloudera Manager ( advised )
www.semtech-solutions.co.nz
[email protected]
Contact Us ●
Feel free to contact us at –
www.semtech-solutions.co.nz
–
[email protected]
●
We offer IT project consultancy
●
We are happy to hear about your problems
●
You can just pay for those hours that you need
●
To solve your problems