Open Source Business Intelligence Tools
Comments
Content
Open Source Business Intelligence Tools
Alex Meadows TriLUG, January 2012
Agenda
● ●
Business Intelligence Overview Review of OSBI Tools
● ● ● ● ●
Data Warehousing Data Integration Reporting/OLAP Visualization Statistical Analysis/Predictive Analytics
What Is Business Intelligence?
Utilizing technology to identify and analyze trends in data to make better business decisions.
Overlapping Fields
Source: Back In Business, Klimberg, Miori (www.informs.org)
Competing On Analytics
Source: Competing on Analytics; Thomas Davenport, Jeanne Harris
Phases of Growth
The Three Types of Questions
●
What happened?
●
How was performance last week? How is performance right now? What can I do to reach our goals?
●
What is currently happening?
●
●
What will happen?
●
Data Warehousing
●
Store data outside of application/normal business environment (i.e. ERP systems) Specific for reporting/analytics Modeling Styles
● ● ● ●
● ●
3NF (normal database modeling) Data Marts (aka star schemas) Data Vault (hybrid 3NF/Data Mart) Anchor Modeling (6NF)
Data Warehousing
●
Databases
●
MySQL, Postgres, etc Infobright*, LucidDB, InfiniDB*, etc. Greenplum* (both RDBMS and Columnar) Hadoop, CouchDB, MongoDB, etc.
●
Columnar Data Stores
●
●
Hybrid Data Warehouse Databases
●
●
NoSQL
●
*Hardware and/or Software limitations in community editions
RDBMS vs Columnar
Source: http://www.calpont.com/column-oriented-database-bi
NoSQL?
● ● ●
Not Only SQL Unstructured/semi-structured data Huge (multi-terrabyte to petabyte+ data sets)
Source: http://www.information-management.com/specialreports/20040622/1005301-1.html
Data Integration
● ●
Syncing data across systems Includes:
● ● ● ●
ETL (Extract, Transform, Load) MDM (Master Data Management) EAI (Enterprise Application Integration) EII (Enterprise Information Integration)
Talend
●
Data Management Tool Suite
● ● ● ●
ETL MDM Data Profiling Data Quality
● ● ●
Code generator Eclipse based Extensible plugin architecture
Pentaho K.E.T.T.L.E.
●
Kettle Extraction, Transport, Transformation, and Loading Environment Focus on ETL Extensible plugin architecture Engine based
● ● ●
Reporting
●
Focus: Historical Analysis
Reporting Options
MDX BIRT Pentaho JasperReports SQL Power Wabit Saiku ✔ ✔ ✔ ✔ ✔ ✔ “Pivot Table” ✔ Charting ✔ ✔ ✔ ✔ ✔ SQL Other Sources* ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ Drill Parameterized Through ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔
✔
✔
*Flat Files, NoSQL, etc.
BIRT Example
Visualization
●
Focus: Trending and Present
Pentaho CDE/CDF
●
Dashboard framework and editor built into Pentaho BI Server Community developed – uses open web languages (Javascript, HTML, etc).
●
Statistics/Predictive Analytics
●
Focus: All relevent data used to predict outcomes
Statistics/Predictive Analytics
● ● ●
R – stats oriented Weka – machine learning oriented RapidMiner – mixed
● ● ●
Originally YALE Weka and R Plugins Like SAS Enterprise Miner
BI From Reporting to Statistical Analysis
ETL Jaspersoft ✔* Pentaho SpagoBI ✔ ✔* ✔* Metadata Reporting ✔ ✔ ✔ Dashboards ✔ ✔ ✔ OLAP*** ✔ ✔ ✔ ✔ ** ✔ ** Statistics Automated Decisions
* Utilizes Talend ETL **Utilizes Weka Data Mining ***All use Mondrian for OLAP, with different front ends
Shameless Plug
●
RTP Pentaho User Group
● ●
On LinkedIn (soon to be also on Meetup) Meets quarterly
Sponsor Documents