overview
1) cluster commputing platform designed to be fast and general purpose
(content to be delivered)
2) It provides high-level APIs in Java, Scala and Python, and rich built in libr
aries
3)integrates closely with other big data tools
4)compatible with apache hadoop
5)can acess any hadoop source including cassandra
6)provides various high level tools like spark sql for structured data processin
g mib for machine learning and more
why spark
3)Provide Powerful caching and disk persistence capabilitie
3)iterative processing
4)interactive quering
4)streaming
5)graph computations
5)faster batch processing
2)simple https://databricks.com/spark/about
3)fast
4)unified engine
9)faster decision making
5)broadly copmpatible
6)spark framework can be deployed through apche mesos, apache hadoop via yarn or
sparks own cluster manager
7)spark framework is polygot- can be programmed in several programming language(
currently scala javas python supported)
8)a fully apache hive compatible data warehousing system that can run 100* faste
r than hive