Lots of data is being collected
and warehoused
• Web data, e-commerce
• purchases at department/
grocery stores
• Bank/Credit Card
transactions
• Social Network
How much data?
Google processes 20 PB a day (2008)
Wayback Machine has 3 PB + 100 TB/month
(3/2009)
Facebook has 2.5 PB of user data + 15 TB/day
(4/2009)
eBay has 6.5 PB of user data + 50 TB/day
(5/2009)
CERN’s Large Hydron Collider (LHC) generates
640K ought to be
15 PB a year
enough for
anybody.
The Earthscope
• The Earthscope is the world's largest science project.
Designed to track North America's geological evolution,
this observatory records data over 3.8 million square
miles, amassing 67 terabytes of data. It analyzes
seismic slips in the San Andreas fault, sure, but also the
plume of magma underneath Yellowstone and much,
much more.
(http://www.msnbc.msn.com/id/44363598/ns/technology
_and_science-future_of_technology/#.TmetOdQ--uI)
Types of Data
Relational Data (Tables/Transaction/Legacy Data)
Text Data (Web)
Semi-structured Data (XML)
Graph Data
• Social Network, Semantic Web (RDF), …
Streaming Data
• You can only scan the data once
Big Data Analysis Example
Big data can generate significant financial value across sectors
8
Who is collecting all of this data?
Government Agencies
(Hey, I didn’t say which government!)
Big Pharmaceutical Companies
Who is collecting all this data?
Consumer Products Companies
Big Box Stores
Who is collecting what?
Credit Card Companies
What data are they getting?
Airline ticket
Restaurant check
Grocery Bill
Hotel Bill
Why are they collecting all this
data?
Target Marketing
To send you catalogs for
exactly the merchandise
you typically purchase.
To suggest medications that
precisely match your
medical history.
To “push” television
channels to your set instead
of your “pulling” them in.
To send advertisements on
those channels just for
you!
Targeted Information
To know what you need
before you even know
you need it based on past
purchasing habits!
To notify you of your
expiring driver’s license or
credit cards or last refill
on a Rx, etc.
To give you turn-by-turn
directions to a shelter in
case of emergency.
What to do with these data?
Aggregation and Statistics
• Data warehouse and OLAP
Indexing, Searching, and Querying
• Keyword based search
• Pattern matching (XML/RDF)
Knowledge discovery
• Data Mining
• Statistical Modeling
Where Is This “Big Data” Coming From ?
4.6
billion
camera
phones
world
wide
100s of
millions
of GPS
enabled
data every
day
? TBs of
12+ TBs
of tweet data
every day
30 billion RFID
tags today
(1.3B in 2005)
devices
sold
annually
25+ TBs
of
log data
every day
2+
billion
76 million smart
meters in 2009…
200M by 2014
people
on the
Web by
end 2011
With Big Data, We’ve Moved into a New Era of Analytics
12+ terabytes
5+ million
of Tweets
create daily.
100’s
of different
types of data.
trade events
per second.
Volume
Velocity
Variety
Veracity
Only
1 in 3
decision makers trust
their information.
The number of organizations who see analytics
70% is growing.
as a competitive advantage
57 %
63%
2010
business initiative
BUSINESS
IMPERATIVE
2011
2012
IQ
Four Characteristics of Big Data
Cost efficiently
processing the
growing Volume
50x
2010
35
ZB
2020
Establishing the
Veracity of big
data sources
Responding to the
increasing Velocity
30
Billion
RFID
sensors and
counting
Collectively
Analyzing the
broadening Variety
80% of the
worlds data is
unstructured
1 in 3 business leaders don’t trust
the information they use to make
decisions
The 5 Key Big Data Use Cases
Big Data Exploration
Find, visualize, understand
all big data to improve
decision making
Enhanced 360o View
of the Customer
Security/Intelligence
Extension
Extend existing customer
views (MDM, CRM, etc) by
incorporating additional
internal and external
information sources
Lower risk, detect fraud
and monitor cyber security
in real-time
Operations Analysis
Data Warehouse Augmentation
Analyze a variety of machine
data for improved business results
Integrate big data and data warehouse
capabilities to increase operational efficiency
Big Data Exploration: Needs
Find, visualize, understand all big data
to improve decision making
Struggling to manage
and extract value from
the growing 3 V’s of
data in the enterprise;
Need to unify
information across
federated sources
Inability to relate “raw”
data collected from
system logs, sensors,
clickstreams, etc., with
customer and line-ofbusiness data managed
in enterprise systems
Risk of exposing
unsecure personally
identifiable information
(PII) and/or privileged
data due to lack of
information awareness
Big Data Exploration: Value & Diagram
Relational
Data
File
Systems
Content
Management
Email
Find, Visualize & Understand
all big data to improve
business knowledge
• Greater efficiencies in business
processes
• New insights from combining and
analyzing data types in new ways
• Develop new business models
with resulting increased market
presence and revenue
Enhanced 360º View of the Customer: Needs
Extend existing customer views (MDM, CRM,
etc) by incorporating additional internal and
external information sources
Need a deeper
understanding of
customer sentiment
from both internal and
external sources
Desire to increase
customer loyalty
and satisfaction
by understanding
what meaningful
actions are
needed
Challenged getting the
right information to the
right people to provide
customers what they need
to solve problems, crosssell & up-sell
Security/Intelligence Extension: Needs
Security/Intelligence Extension enhances
traditional security solutions by analyzing all
types and sources of under-leveraged data
Operations Analysis: Needs
Analyze a variety of machine
data for improved business results
Business Challenges:
•Complexity and rapid growth of machine data
•Difficult to capture small fraction of machine for better
decision
•In-ability to analyze machine data and combine it with
enterprise data for a full view analysis
Benefits:
• Gain real-time visibility into operations,
customer experience, transactions and
behavior
• Proactively plan to increase operational
efficiency
• Identify and investigate anomalies
• Monitor end-to-end infrastructure to
proactively avoid service degradation
or outages