Objective
• Share Contemporary understanding on Big
Data.
• Creating awareness, and spark up interest to
explore new avenues in Big Data trends /
technologies.
• Big Data initiatives in ThomsonReuters.
Content
The rise of the Bytes
Astonishing facts and figures
World Data forecast
Broad classification of Big Data
Characteristics of Big Data –The 3 Vs of Big
Data
Challenges of Big Data and next Gen tools
Big Data’s impact on Thomson Reuters
The rise of the Bytes …
10008 YB -> Yottabyte
10007 ZB -> Zetabyte
10006 EB-> Exabyte
10005 PB ->Petabyte
10004 TB -> Terabyte
10003 GB -> Gigabyte
10002 MB -> Megabyte
1000 KB -> Kilobyte
Astonishing facts and figures …
ERIC Schmidt, Chairman of Google Said :
“From the dawn of humanity to 2003 data produced by
human race is 5 Exa bytes( 10006), and now every 2
days we are creating 2 Exa Bytes of data”
World Data forecast.
•
•
•
•
In 2010, estimated amount of world digital data was 1.2 ZB.
In 2013, the web data reached to 4 Zettabytes
Data growth will be 44 times greater in 2020 than in 2009.
Data volume is doubling in every 1.2 years.
Big Data :Broad classification
Big Data :Broad classification
(Contd…)
• Structured data
– Fits into table, stored in RDMBS
– It is 20% of the world data
• Semi-Structured Data:
Big Data :Broad classification (Contd…)
• Unstructured data:
– 80% of world data semi-Structured /
Unstructured
Big Data :Characteristics
• The 3 Vs of Big Data…..
Big Data :Characteristics (contd..)
• Volume: Huge Volume of data is being
generated by different sources.
• Velocity: The speed at which data comes into
real time as a consequence of different
sources.
Variety: The different forms of data.
Machine Generated: Sensors, Machines, Satellites, Weather data
User Generated Data: Social Media sites, Face book, Twitter
Operational Data: Stock Market, Application Logs
Big data :Significant data
producers
NYSE trading/day produces 1 TB
New websites created every minute a day
571.
Google data processing /day 20 peta
bytes.
Data uploaded daily to Facebook 100
terabytes.
Aadhar card for India…
UIDs for Indian population of 1.5 BILLION.
Per resident 5MB
I/O everyday 30 TB
Big Data : Challenges
• Handle the variety of data.
• Store the Huge volumes of data in
existing in different forms.
• Process /Analyze this Huge data
. Eg :
By using the traditional RDBMS approach
for decoding the human genome takes
10 years.
What next ??
• Next generation of data tools and
techniques like Hadoop and NoSQL
databases are needed to handle the Big
Data….
What we intends…
Linked Data - RDF
• RDF (Resource Description Framework) is a
standard model for data interchange on the Web
• It’s the foundation upon which the web of
semantic data is built
• Organized into triples [Subject, Predicate, Object]
Predicate
Subject
Object
• A “predicate” defines the relationship between the “subject”
and “object” nodes
16
RDF Example
RDF: XML based language for triples using URIs
Inferred relationships…
Subject=Dan,
Predicate= is_from,
Object=England
Relationship doesn’t exist inferred from the other two: new
knowledge