Ebook: Data visualization tools for users (English)

Published on July 2016 | Categories: Types, Presentations | Downloads: 42 | Comments: 0 | Views: 233
of 26
Download PDF   Embed   Report

Each day a vast amount of information is generated in the digital world, and this ebook reveals the best tools for improving their visualization.BBVA Open4U launches a series of ebooks containing interesting information for an enterprising developer. The first focuses on data visualization tools.If you're a developer and you want to reap the maximum benefit from data tools, this ebook provides three in-depth analyses of the best and most popular tools with data scientists.It contains all the elements a developer should have in his or her toolbox, particularly in the field of data visualization.

Comments

Content

01
02
03

Tools for

data visualization

The data scientist’s
toolbox
Five data visualization
tools

Get the benefit from data
with four webinars

The data scientist’s

toolbox

Data Science stands today as a multidisciplinary profession. The
following is intended to be a basic guide of some useful resources
available for each of the facets performed by these professionals.

01. TOOLBOX

Data Science stands today as a
multidisciplinary profession, in which
knowledge from various areas overlap in a
profile more typical of the Renaissance than
from this super-specialized 21st century.
Given the scarcity of formal training in
this field, data scientists are forced to
collect dispersed knowledge and tools
to optimally develop their skills.

TOOLS AND
LANGUAGES
• SQL

• pyODBC

• Sqlite

• mxODBC

• SQlite3

• SQLAlchemy

• RSQlite

• pandas

• Toad

• data.table

• Tora

• XML

• RapidMiner • Jsonlite
The following is intended to
be a basic guide, obviously
not exhaustive, of some useful
resources available for each of
the facets performed by these
professionals.

• Knime
• Pentaho
• RODBC
• RJDBC

• json

01. TOOLBOX

Data management
Part of the work of the data scientist it to capture,
clean-up and store information in a format suitable
for its processing and analysis.
The most usual scenario is to access a copy of the
data source for a one-time or periodic capture.
You will need to know SQL to access the data
stored in relational databases. Each database has a
console to execute SQL queries, even though most

people prefer to use a graphical environment with
information about tables, fields and indexes. Some
of the most popular data management tools are
Toad, proprietary software for Microsoft’s platform,
and Tora, which is open-source and cross-platform.
Once the data is extracted we can store it in plain
text files which we will upload to our working
environment, for machine learning or to be used
with a tool such as SQlite.

01. TOOLBOX

SQlite is a lightweight relational database with no
external dependencies and which does not require
to be installed in a server. Moving a database is as
easy as copying a single file. In our case, when
processing information we can do it without
concurrence or multiple access to the source data,
which perfectly suits the characteristics of SQlite.
The languages we use for our algorithms have
connectivity to SQlite (Python, through SQlite3 and
R, trhough RSQlite) so we can choose to import the
data before preprocessing or to do part of it in the
database itself, which will help us to avoid more
than one problem after a certain amount of
records.
Another alternative to bulk data capture is to use a
tool including the full ETL cycle (Extraction,
Transformation and Load), i.e. RapidMiner, Knime

or Pentaho. With them, we can graphically define
the acquisition and debugging cycles of data using
connectors.
Once we have guaranteed access to the data
source during preprocessing, we can use an ODBC
connection (RODBC and RJDBC in R, and pyODBC,
mxODBC and SQLAlchemy in Python) and benefit
from making connections (JOIN) and groups
(GROUP BY) using the database engine and
subsequently importing the results.
For the external processing, pandas (a Python
library) and data.table (a package in R) are our first
choice. Data.table allows to circumvent one of R’s
weaknesses, memory management, performing
vector operations and reference groups without
having to duplicate objects temporarily.

01. TOOLBOX

A third scenario would be to access
information generated in real time and
transmit it in formats like XML or JSON.
These are called incremental learning
projects, and among them we find
recommendation systems, online advertising
and high frequency trading.
For this we will use tools like XML or jsonlite
(R packages), or xml and json (Python
modules). With them we will make a
streaming capture, make our predictions,
send it back in the same format, and update
our model once the source system provides
us, later on, with the results observed in
reality.

01. TOOLBOX

Data analysis
Even though the Business Intelligence, Data
Warehousing and Machine Learning fields are part
of Data Science, the latter is the one which
requires a greater number of specific utilities.
Hence, our toolbox will need to include R y
Python, the programming language most widely
used in machine learning.

For Python we highlight the suite scikit-learn, which
covers almost all techniques, except perhaps neural
networks. For these we have several interesting
alternatives, such as Caffe and Pylearn2. The latter
is based on Theano, an interesting Python library
that allows symbolic definitions and a transparent
use of GPU processors.

01. TOOLBOX

If we need to change any R package we will need C++ and some utilities that allow us to re-generate them:
Rtools, an environment for creating packages in R under Windows, and devtools, which facilitates all
processes related to development.

Some of the most used packages for R:


Gradient boosting: gbm y xgboost.





Random forests for classification and regression:
randomForest and randomForestSRC.





Support vector machines: e1071, LiblineaR and
kernlab.



Regularized regression (Ridge, Lasso and
ElasticNet): glmnet.
Generalized additive models: gam.
Clustering: cluster.

There are also some general purpose tools that will make our life easier in R:


Data.table: Fast reading of text files; creation,
modification and deletion of columns by
reference; joins by a common key or group, and
summary of data.



Foreach: Execution of parallel processes against
a previously defined backend with utilities such
as doMC or doParallel.



Bigmemory: Manage massive matrices in R and
share information across multiple sessions or
parallel analyses.



Caret: Compare models, control data partitions
(splitting, bootstrapping, subsampling) and
tuning parameters (grid search).



Matrix: Manage sparse matrices and
transformation of categorical variables to binary
(onehote encoding) using the
sparse.model.matrix function.

01. TOOLBOX

Distributed environments deserve a special mention. If we have dealt with data from a large institution or
company, we will probably have experience working with the so-called Hadoop ecosystem. Hadoop is a
distributed file system (HDFS) equipped with algorithms (MapReduce) that allows to perform information
processing in parallel.

Among the machine learning tools compatible with Hadoop we find:


Vowpal Wabbit: Online learning methods based
on gradient descent..



Mahout: A suite of algorithms, including among
them recommendation systems, clustering,
logistic regression, and random forest.

The data scientist should also keep abreast of new
trends of generational change of Hadoop to Spark.
Spark has several advantages over Hadoop to
process information and the execution of



h2o: Perhaps the tool experiencing a higher
growth phase, with a large number of
parallelizable algorithms. It can be executed
from a graphical environment or from R or
Python.

algorithms. The main one is speed, as it is 100
times faster because, unlike Hadoop, it uses inmemory management and only writes to disk
when necessary.

01. TOOLBOX

Spark can run independently or may
coexist as a component of Hadoop,
allowing migration to be planned in a nontraumatic way. You can, for example,
use HBase as a database, even
though Cassandra is emerging as a
storage solution thanks to its redundancy
and scalability.
Spark can run independently or may
coexist as a component of Hadoop,
allowing migration to be planned in a nontraumatic way. You can, for example,
use HBase as a database, even
though Cassandra is emerging as a
storage solution thanks to its redundancy
and scalability.

01. TOOLBOX

Visualization
Finally, a brief reference to the
presentation of results.
The most popular tools for R are
clearly lattice y ggplot2,
and Matplotlib for Python. But if we
need professional presentations
embedded in web environments the
best choice is certainly D3.js.
Among the integrated Business
Intelligence environments with a clear
approach to presentations we should
highlight the well known Tableau, and
as alternatives for graphical
exploration of data, Birst and Necto.

Five data visualization
tools that you should not miss

We present you some of the best data visualization tools that you
can use in your business to take full advantage of the large
amount of information created every day in the digital world.

02. DATA VISUALIZATION TOOLS

VISUALIZATION TOOLS INDEX
• Google Fusion Tables
• CartoDB

• Tableau Public
• iCharts
• Smart Data Report

The digital universe is reaching new
thresholds. The amount of data
generated by both private users and
companies is growing at a rapid pace.
Actually, according to a study by IDC and
EMC, the world of digital data is doubling
its size every two years, and in 2020 it
will have generated 44 zettabytes of
information, or what is the same: 44
trillion gigabytes of structured and
unstructured data.
The fact of creating and accessing a
website, participating in a blog, increasing
our number of followers, post comments,
send a tweet or just surfing the internet
produces a whole range of data that, if
exploited properly, can be of great value
for companies.

02. DATA VISUALIZATION TOOLS

The big challenge, however, is to make sense of all
that data. That is, to be able to capture, link,
analyze and extract its true value, so that the
information can be presented in an attractive, clear,
concise and understandable manner, facilitating
decision making in your business. Exploring and
analyzing visually customers’ data can also take you
to discover new ways to reach them, create a
better segmentation, personalized offers for
products or services, and generate innovative ideas,
among many other possibilities which can
contribute to maintain the engagement between
your brand and your users over time.
Where to start
The first steps in data visualization may be
intimidating. Fortunately, the same way data is
growing, so do the tools that help us get the most
out of it. Here we present the five tools that we
consider the best, based on the capabilities they
provide and the level of experience required.

02. DATA VISUALIZATION TOOLS

Google Fusion Tables
It’s an excellent tool for beginners or for those
who don’t know programming. For more
advanced users there is an API that allows to
produce graphics or maps from information.
One of the advantages of this application is the
diversity of data representations it offers. It also
offers a relatively fast way to create graphics and
maps, including GIS functions to analyze data by
geographic area.
This tool is used frequently by The Guardian to
produce detailed maps very quickly.

02. DATA VISUALIZATION TOOLS

CartoDB
This is an open source service directed to any user,
regardless his technical level, with a friendly
interface. It allows to create a variety of interactive
maps, choosing from a catalog of options (which
includes Google Maps) or adding your own
customized maps.
The most interesting feature of this tool is that it lets
you access Twitter’s data to see how users react to
a brand, a particular marketing campaign or event.
We can see a good example of this on the map
tracking tweets that was created last year with the
launch of Beyoncé’s latest album. It shows clearly
the places where the release had more impact. This
is a great source of visual information for marketing
professionals and businesses.
It should also be highlighted that it has an active
group of developers who provide extensive

documentation and examples. In addition, the open
nature of its API allows to create continuously new
integrations and to increase the capabilities of the
tool with new libraries.

02. DATA VISUALIZATION TOOLS

Tableau Public
With Tableau Public you can create easily
interactive maps, bar and pie charts, etc. One of its
advantages is that, like Google Fusion Tables, you
can import tables from Excel to facilitate your work.
In a matter of minutes you can generate an
interactive graphic, embed it in your website and
share it. For example, the news portal Global
Post created with it a series of charts about the best
countries to do business in Africa.
In the recently released 8.2 version we can also find
the new OpenStreetMap tool, which allows to
produce very detailed maps from local data such as
cafes or shops. Tableau Public is a free tool,
although it also has a premium version.

02. DATA VISUALIZATION TOOLS

iCharts
You can get started in the world of data
visualization with the service offered by iCharts,
which has a free version (Basic) and two premium
options (Platinum and Enterprise). With this tool you
can create visualizations in just a few steps,
exporting Excel and Google Drive documents or
adding data manually.
Through this tool it is possible to share your
graphics with your collaborators privately, besides

being able to edit and update them with new data
through its cloud computing service. You can even
share them with your clients through emails,
newsletters or social networks.
Among the companies using this service we find
the prestigious consulting firm IDC, which
uses iCharts to provide visual images of relevant
data included in its reports.

02. DATA VISUALIZATION TOOLS

Smart Data Report
Finally, we also recommend Smart Data Report,
which is not a tool as powerful as the previous ones
but has the advantage of being an affordable data
solution for entrepreneurs and small businesses
whose workers don’t have much spare time.
Among other services, this website offers free data
analysis and the option to receive reports by email,
without having to create them yourself. Once the
service has your report ready, it generates
an HTML code that you can embed in your
corporate website or in your articles.

Get the maximum benefit from data with

these four webinars
Mapping data, visualizing them in geospatial apps and applying
automatic learning. We put our knowledge into practice with the
help of these video tutorials.

03. WEBINARS

Mapping data
CartoDB explains how to convert location data into knowledge for your business. In this tutorial you can learn
how to analyze, visualize and build data apps using the CartoDB tool.

03. WEBINARS

Machine Learning
Now summer's round the corner, Andrés González, solutions manager for Big Data and Data Prediction at
Clever Task, shows us how to make forecasts from data in a very specific area: the tourist sector.

03. WEBINARS

Geospatial apps
And if you want to learn to create apps and geospatial data, you can't miss this tutorial –also by CartoDB–
explaining how you can make the most of an API –in this case the one opened by BBVA for the
InnovaChallenge competition– to create apps and visualizations.

03. WEBINARS

Good examples of visualization
Finally, to finish off this selection, Alberto Cairo, professor of data visualization at the Universidad de Miami,
teaches us good practices in data visualization. It's good to learn from our own mistakes and from the
successes of others.

share
THIS MIGHT INTEREST YOU

Innovation Edge Big Data: to create
business value with data

Emerging Tech: Data visualization
beyond the noise

Infographic: Big Data, chronology,
present and future

Caso study: data visualization with
Illustreets y CartoDB

Infographic: the keys of Big Data by
DJ Patil

BBVA no BBVA is not resposible for the opinions expressed here in

Sign up
To keep up to
date with the
latest trends

www.bbvaopen4u.com

Interact with us on:

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close