BIG Data Management and Visulization

Published on January 2017 | Categories: Documents | Downloads: 30 | Comments: 0 | Views: 223
of 36
Download PDF   Embed   Report

Comments

Content

Big Data, Data Management & Visualization
Ashish Sharma Director & Co-Founder, BRIDGEi2i Analytics Solutions [email protected] January 2012

@ 2012 BRIDGEi2i Analytics Solutions Pvt. Ltd. All rights reserved

Agenda
1

BIG DATA

2

BUSINESS INTELLIGENCE

4

VISUALIZATION

3

DATA MANAGEMENT

2

Competing on Analytics with Big Data is big news

What is BIG Data
In the world of social media and news: • The entire US allocated research infrastructure is 12PB of disk and 22PB of tape! • Microsoft’s Bing search engine uses 150PB of spinning disk • Biggest scientific projects will generate only 10-20TB / day of data, while Twitter alone produces 28PB of new data a day and Bing processes 2PB / day • 200 MILLION new tweets a day • 1BILLION new Facebook items a day: average person adds 3 items to Facebook every single day

What is BIG Data
• Entire New York Times 1945-2005 = 18M articles = 2.9 billion words • 5 BILLION words added to Twitter each DAY (almost twice the total volume of the Times in the last 60 years)

• Estimated 49.5 trillion words ever printed in books over last 600 yrs • Twitter alone will reach that size in just three years with its current rate of tripling post volume each year

What makes it BIG Data
SOCIAL

BLOG

SMART METER

101100101001 001001101010 101011100101 010100100101

VOLUME

VELOCITY

VARIETY

VALUE

It is not a single number but a set of parameters
Social Data Machine-Generated Data Video and Images

Documents

Why is BIG Data Important
US HEALTH CARE MANUFACTURING GLOBAL PERSONAL LOCATION DATA EUROPE PUBLIC SECTOR ADMIN US RETAIL

Increase industry value per year by

Decrease dev., Increase service provider Increase industry assembly costs by revenue by value per year by

Increase net margin by

$300 B

–50%

$100 B
New Data

€250 B

60+%

Today’s Challenge
Healthcare Expensive office visits Manufacturing In-person support Location-Based Services Based on home zip code Public Sector Standardized services Retail One size fits all marketing

What’s Possible
Preventive care, reduced hospitalization

Remote patient monitoring Product sensors
Real time location data Citizen surveys Social media

Automated diagnosis, support
Geo-advertising, traffic, local search Tailored services, cost reductions Sentiment analysis segmentation

Catalina Marketing: Building loyalty one customer at a time
No targeting Basic targeting e.g., offer dog food coupon to customer buying dog food Using predictive models to find latent correlations

Coupon redemption rate

1%

6-10%

25%

 Marketing to a segment of one – 195 million US loyalty program members – Every coupon printed is unique to the individual customer – Customized based on three years' worth of purchase history

• Identifies items that shoppers are likely to buy in future visits • 25% increase in coupon redemption rates

What needs to be done & how
Tapping into diverse data sets
Finding and monetizing unknown relationships Data driven business decisions
DECIDE ACQUIRE

ANALYZE

ORGANIZE & DISTILL

Just in US - 150K Incremental Advanced Analytical Talent & 1.5 M Data Savvy Manager Required to take full advantage of Big Data

Analytics as Competitive Advantage
• Research to identify “real world” applications of data and analytics in business
– Summarize the business challenge – name of company, function, time, sources of example – What data was used – Insights – How it created value for business

• 1 page powerpoint output (font 12+)
– Use additional pages if required to show sample dashboards / outputs

• No two examples should be same

Agenda
1

BIG DATA

2

BUSINESS INTELLIGENCE

4

VISUALIZATION

3

DATA MANAGEMENT

11

What is Business Intelligence

How is it used today …
Customer Transaction Business Transaction Transactional Database Simple Query
Item:‘Shoes’ Cost:‘$34’ Cust:‘James’
Item Shoes

2011 Sales

Cost $34

Cust James

Business Analyst

Data Warehouse
SALES

BI Reports & Dashboards

Complex Query
Sales & Profit for Shoes & Belts Year >= 2005

2010 2009 2008 2007 2006 2005

BI & DW will evolve to meet BIG Data challenges
Will Need to Integrate in BIG DATA world

ANALYTICS

So what is Business reporting & analysis
Why? • To report findings from different transactional, performance & financial data stored by businesses • Best interpretation of data based right business metrics and optimal slicing and dicing of data • Integrate Reporting platforms, remove redundancies across reports • Automate all repeatable Reports • Provide drilldown analysis build in reporting automated tools

Monthly Operations Metrics Analysis

Website Click Stream Analysis

Spend Analysis

Help business track performance based on historical data

Agenda
1

BIG DATA

2

BUSINESS INTELLIGENCE

4

VISUALIZATION

3

DATA MANAGEMENT

16

Visualization – an example

http://news.yahoo.com/s/yblog_thelookout/watch-200-years-of-history-in-5-minutes

Principles of Visualization
1. Who is my audience
• What are they keen to know, what do they already know, what do they believe, what metrics do they understand

2. What am I trying to communicate 3. How do I expect the message be used – informative, actionable etc 4. What type of Dashboard am I creating

What type of Dashboard am I creating

Chart Chooser

Information Discrimination
• Find the crux of what needs to be presented
– What is the story, what metrics articulate the full story, what should one do to improve

• Ask a better question
– Not all questions are important, separate good to know to what would drive action if someone knew this

• Have hypothesis, analyze, but only show what matters

What is the right metric

What do you think of this?

Chart summarizing sales performance of a business

What do you think of this?

What do you think of this?
Very common charts to show how Company G is doing compared to competition

What do you think of this?

What do you think of this?

What do you think about this?

Resources
• http://www.juiceanalytics.com/white-papers-guides-and-more/#registration • http://www-958.ibm.com/software/data/cognos/manyeyes • http://www.perceptualedge.com/library.php

Visualization Assignment
• Identify a data set of your choice. Present a visualization of that that that you think is interesting. Key parameters of evaluation:
– Interesting insight – Quality of visual output – Ease of interpretation

• You could use data sets from (but do not use any existing visual outputs) http://www958.ibm.com/software/data/cognos/manyeyes/d atasets?q=

Agenda
1

BIG DATA

2

BUSINESS INTELLIGENCE

4

VISUALIZATION

3

DATA MANAGEMENT

31

Understand Data & Define Objective
u










Understand Data (“INFORMATION”) Data Collection Approach – Actual / Imputed Values Structure, Type, Granularity / Level of Data – Credit Bureau Attributes – Demographic Data (Infobase) – Commercial Data (D&B) – Macro Economic Data – DRI-WEFA – Market Research Data Period / Population for which data available / would be collected (VERY IMPORTANT) – Keep the “END IMPLEMENTATION / USE” in mind Confirm “Operational Definition” of Attributes (esp. Dependent Attribute) – Dictionary / Discussions – Financial Ratios / Response / Net Response / Profitability Calculation Data Integrity Check for Relationship between various attributes – Check for Logical Relationships – Frequency / Recency – Response / Quote / Conversion

Data Preparation
– –





Data Extraction / Reading into SAS Numeric Attributes: – Plot Univariate Graphs (PROC UNIVARIATE) for every attribute – “Identify” & Cap Outliers (MAX(MIN( , ), ) (Don’t use a 99.XX percentile approach) – Identify Treatment for Missing Values (Differentiate from “ALL MISSING”) – Business understanding / Data Collection Methodology based Imputation – Mean Imputation (To Minimize Distortion in end model) – Median Imputation (for Discrete Attributes) – Regression Based Imputation (Can be explored) – MCMC Based Imputation (Vijay’s GB Project) Character Attributes: – Frequency Plots (PROC FREQ) (For “Numeric Valued” Continuous Character Attributes convert to numeric and then use the approach for Numeric Attributes) – Treatment of Missing Values – Identify Appropriate ways of creating Numeric Attributes from them: – Ordered Values (Income Deciles) » Level (1,2,3 …or 12000, 15000, 25000 etc) – Meaningful Bucketing (Education, Type of Car, SIC Code, Credit Score) » Create Dummies for various Categories (example: Gender, Marital Status etc) • (Can Use CHAID to identify some useful combinations, will discuss in Modeling Session) Confirm No Missing Data, All “Information” in Numeric Form (PROC MEANS nmiss option)

Summary Day 2
• Big Data & Technological advancements are creating opportunities for businesses to turn analytics into a competitive advantage • Business Intelligence fast evolving to providing insights, integrating data storage, reporting & analytics to provide real time relevant answers • Visualization – crisp & insightful key to making executives take decisions based on analytics • At the core of any Analytics exercise is “Data Management” and how it is treated appropriately to maximize information from it

Deliverables
• Assignment 1: One “Unique” example (within the class) of Impact from Analytics in business world • Assignment 2: Visualization of an interesting insight using any data of your choice

THANK YOU

E-Mail – [email protected] Linkedin - http://www.linkedin.com/company/bridgei2i-analytics-solutions Facebook – http://www.facebook.com/pages/BRIDGEi2i-Analytics-Solutions/127891620624459 Twitter - @BRIDGEi2i Web – www.bridgei2i.com

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close