Big Data and Its Analytics for CA

Published on May 2017 | Categories: Documents | Downloads: 27 | Comments: 0 | Views: 241
of 85
Download PDF   Embed   Report

Comments

Content

Big Data and its Analytics –A Challenge or Boon for Governance
Ravikumar Ramachandran

My Profile
• CISA, CISM, CGEIT, CRISC, SSCP, CAP, CISSP-ISSAP, CFE, CIA, CRMA, PMP, CEH, ECSA, CHFI, FCMA • COBIT 5 (F), ISO 27001:2013 Lead Auditor • More than 22 years Industry experience • Last 12 years as CRO, CISO • Research and Review Committee –ISACA • e-journal editor of Mumbai Chapter & CGEIT Coordinator • Presently in Hewlett-Packard

References
• Big Data Big Analytics –Michael Minelli, Michele Chambers, Ambiga Dhiraj • Big Data Analytics-Turning big data into big money-Frank J. Ohlhorst • Big Data Now-Current perspectives from O’Reilly Media • Privacy and Big Data-Terence Craig & Mary E. Ludloff • Ethics of Big Data-Kord Davis with Doug Patterson • A Revolution that will transform How we live, Work and Think –Big Data-Viktor Mayer-Schonberger and Kenneth Cukier • Big Data: The next frontier for innovation, competition and productivity-McKinsey Global Institute-June 2011 • Big Data Analytics: From Strategic Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph –David Loshin

Disclaimer & Author’s Note
• The views expressed belongs to the author and not that of the employer or any of the Professional Associations • This Presentation is meant for the members of the Institute of Chartered Accountants of India • The Author is sharing his own independent views and whenever references have been made to other works, due credit is given to the respective authors

Seizing the future…..

• “ As for the future, your task is not to foresee it, but to enable it” -French aviator and author Antoine de Saint-Exupery

What is Big Data
• Extremely large data sets
• Unmanageable by database software tools • Relative and not an absolute figure • Increase with technology advances • Varies with Sector

What is Big Data
• “Every two days now we create as much information as we did from the dawn of civilization up until 2003. That’s something like five exabytes of data”-Former Google CEO Erik Schmidt

What is Big Data
• • • • • • • • 1000 Bits = 1 Kilobyte 1000 Kilobytes = 1 Megabyte 1000 Megabytes = 1 Gigabyte 1000 Gigabytes = 1 Terabyte 1000 Terabytes = 1 Petabyte 1000 Petabytes = 1 Exabyte 1000 Exabytes = 1 Zettabyte ………..Yottabyte..Brontobyte…GEOPBYTE!!

Human Brain (Scientific American)
• Storage Capacity -2.5 Petabytes ( or 1 million gigabyte) • Capacity to hold 3 million hours of TV shows • TV to run for more than 300 years……!!

Internet-World’s largest library
• Estimated at Yottabytes as on date • 11 trillion years using the fastest internet connectivity • Estimated at 5 lakh TB in 2003 • In 10 years…. Expanded 20 lakh times!!

Internet-World’s largest library
• “The Internet emphasizes the depth of our ignorance because our knowledge can only be finite, while our ignorance must necessarily be infinite”-Sir Karl Popper, Conjectures and Refutation: The Growth of Scientific knowledge (2002)

IDC’s Digital Universe Study
• “Between 2009 and 2020, digital data will grow 44-fold to 35 zettabytes per year”

IDC ‘s Prediction
• Volume of Digital Content:  2012 -2.7 billion terrabytes ( 48% more than 2011)
 2015 -8 billion terrabytes • Digital content doubles every 18 months

Economist
• Humans created 150 exabytes of information in the year 2005
• In 2011-more than 1200 exabytes!!

Gartner ‘s prediction
• More than 90% of universal data have been created in the last two years
• About 80% of enterprise data will be in the form of unstructured data

The arrival of Analytics
• Big Data-Big Opportunity • NASA, National Oceanic and Atmospheric Administration • Pharmaceutical companies, energy companies • Big Data & Today’s business

Dimensions of Big Data
• Volume : Whole and sample size • Variety : Structured and unstructured  Structured : Any data capable of being entered in a data field.  Unstructured : Audio, Video, image, geospatial, click streams and log files

Dimensions of Big Data
• Velocity : The speed at which the data is created, accumulated, ingested and processed • Real-time decision making

Big Data Synergies
• • • • • Traditional Business Intelligence Data Mining Statistical applications Predictive analysis Data Modeling

Getting the Big of Big Data
• • • • Transformation Capabilities Big Data is too big an opportunity Best Integration Storage Technologies

Open Source
• Hadoop-its suitability • Limitations-Pre-requisites, hardware requirements

Business Takeaway
• Business cannot wait to take decision for the completed and structured data • It needs to take decision on unstructured data • However not all unstructured data is useful • Business Houses ignoring unstructured data are doomed

Factors enabling Big Data
• Internet and digitization of opinions & behaviour • Mobile computing • Social Networking • Moore’s Law & Cloud

Key factors driving Big Data-1
• Increasing data volumes being captured and stored • 2011 IDC Digital Universe Study- “In 2011, the amount of information created and replicated will surpass 1.8 zettabytes…growing by a factor of 9 in just 5 years…” • The scale of this growth surpasses traditional technologies and configuration setups

Key factors driving Big Data-2
• Rapid acceleration of data growth • 2012 IDC Digital Universe study, “ From 2005 to 2020, the digital universe will grow by a factor of 300, from 130 exabytes to 40000 exabytes…” • From now, until 2020, the digital universe will double about every two years

Key factors driving Big Data-3
• Increased data volumes pushed into the network • According to CISCO’s annual Visual Networking Index Forecast, “ By 2016, annual global IP traffic is forecasted to be 1.3 zettabytes” • Due to increasing number of smartphones, tablets and other internet devices • Increased bandwidth and proliferation of Wi-fi availability

Key factors driving Big Data-4
• Growing variation in types of data assets for analysis • Data scientists take advantage of unstructured datasets as against structured datasets • Acquired from a wide variety of sources • Format can be that of text, images, audio and video content • Existing structured data management needs to enhanced to accommodate the above

Key factors driving Big Data-5
• Alternate and unsynchronized methods for facilitating data delivery • Structured environment gives clear methods of data delivery and exchange • File transfers through tape and disk storage systems • Unstructured data coming from twitter, Government websites • Pressure for rapid acquisition, absorption and analysis

Key factors driving Big Data-6
• Rising demand for real-time integration of analytical results • Increasing number of consumers for analytical results • Business required real-time results of consumer behaviour

Data Explosion
• Data doubles itself in every two years

Malthusian Theory of Population
• Author of book “Essay on the Principles of Population” (1798) • Food production increases in A.P (25 years) • Population growth increases in G.P (25 years) • Restraint on reproduction

Malthusian Theory of Data Explosion (Imaginary)
• • • • • • • • Population growth increases in G.P (25 years) Data explodes every 2 years ( 1024 times app) Do not use mobile devices Restraint on internet Do not go to social sites Reproduction is allowed But no DATA Reproduction!! All economists to become Data Scientists

Evolution of Big Data
• Farnam Jahanian-Assistant Director for computer and information science and engineering for National Science foundation(NSF) defines data “ a transformative new currency for science, engineering, education and commerce”

Evolution of Big Data
• “Big Data is characterized not only by the enormous volume of data but also by the diversity and heterogeneity of the data and the velocity of its generation”

Implications of Big Data-Farnam
• Creation of new products and services • Accelerate the pace of discovery in every science and engineering discipline
• Solve the nation’s challenges-medicine to cyber security

Data Explosion & Knowledge Management
• Data multiplies every two years

• Proprietary knowledge gets diluted

IP & Inventions

2%

IP & Inventions

1%

Going Forward
• Chief Innovation officer (CIO)!!

• Chief Discovery officer ( CDO)!!

Balance Sheet
• • • • • Financial Management Management Accounting Strategic Financial Management Financial Risk Management….so on… ……exciting new disciplines follows….

Big Data Technology
• Hadoop
• Open source software framework for processing huge datasets on a distributed system • Development was inspired by Google’s Map Reduce and Google File system • Allows you to question on structured and unstructured data

Hadoop
• • • • Store any kind of data in its native format Stores petabytes of data inexpensively Assurance of availability Runs on a cluster of servers each having its own CPU and disk storage

Components of Hadoop
• Hadoop Distributed File System (HDFS)
• • • • Storage system for Hadoop cluster HDFS breaks the data into pieces Distributes among the servers in the cluster Each server stores a small segment of the data set • Each piece of data is replicated on more than one server

Components of Hadoop
• Map Reduce
• Each server does its part of analytical job • Reports the results for collation into a comprehensive answer • Map Reduce is the agent that distributes the work and collects the results

Hadoop
• HDFS continually monitors the data stored in the cluster • In case of hardware or software failure, it takes the data from the known good replica • Map Reduce monitors the progress of each server • In case of server slowing down or failing to return an answer….

Hadoop
• MapReduce automatically starts another instance of the task in the server having copy • HDFS & MapReduce joins to do a super fast & reliable job

Hadoop Users
• As of early 2013, Facebook was recognized as having the largest Hadoop cluster in the world • Other prominent users  Google  Yahoo  IBM

New Approach of Data processing
• Data needs to be stored in a system in which hardware is infinitely scalable • Storage and network cannot be a bottleneck • Data must be processed into BI where it is • Move the code to the data and not other way • Data sits in one place and never move it around

Challenges in Protection of Big Data
• Big Data –Risk of permanent loss  Data from monitoring devices  Surveillance cameras  In frequency and in real time • Uniqueness- No deduplication • Large files- Huge CPU processing power • No good Back up solution available

Challenges in Protection of Big Data
• • • • Not handled well by RDBMS Nosql –new DBMS evolution HIPAA & PCI compliance challenge Very risky in medical industry

SQL/NoSQL
• SQL Databases
• • • • Predefined Scheme Standard Definition and Interface language Tight consistency Well defined semantics

SQL/NoSQL
• NoSQL Database
• No predefined scheme • Per-product definition and interface language • Getting an answer quickly is more important than getting an correct answer

Challenges in Protection of Big Data
• CIA Triad- Focus on Access Control • Balance with performance  High levels of encryption  Complex security technology  Additional security layers • Liability

Way forward….
• Destroy data if not legally required (logs) • Classify data

Protection measures
• Control access on Need to Know • Secure the Data at rest • Keep the cryptographic keys on a separate hardened server • Ensure that security does not impede performance • Pick the right encryption scheme • Flexible security solution with changing requirements

Big Data & IP
• • • • • • • Inventions, literary and artistic works Symbols, images designs What to protect Prioritize protection Labeling and locking Security awareness Holistic approach

Governance Measures
• Strategic Alignment
 Identify Business priorities  Define problems to be solved  Time frame  Measurable and achievable outcomes

Strategic alignment
• Demonstration of Value: Whether these technologies add value to real business problems • Operationalization : How to migrate the big data projects into the production environment in a controlled and managed way

Governance Measures
• Management Sponsorship
 Management support for fact-based decision making  Identify champions for consumption of analytics  Ensure benefits realization from various reports and statistical models

Integration of Big Data Analytics
• Standard processes for soliciting input from business users • Clear evaluation criteria for acceptability and adoption • Massive data scalability • Data reuse • Oversight and Governance • Mainstreaming accepted technologies

Governance Measures
• Analytical Human Capital  Mobilize resources for analytics  Hire the right talent and retain them  Increasing demand for analysts skilled in mathematics, business and technology

Key Governance Role
• Ensure business effectively uses analytics to make better decisions • Ensure investment is made in right type of analytics • Ensure investment happens in right type of people, process & technology

Data Governance
• Alert : Identify data issues that might have negative business impact • Triage : Prioritize those issues in relation to corresponding business value drivers • Remediate : Data owners to take proper actions when alerted to the existence of those issues

McKinsey study
• Approximately 1,40,000 to 1,80,000 unfilled positions of data analytic experts in U.S by 2018
• Shortage of 1.5 million managers and analysts who have the ability to understand and make decisions using Big Data

Rise of Data Scientist
• New designation • The Data Scientist

Yesterday’s skills
• Business + Mathematics = Consulting profession • Usage of heuristics and persuasive arguments in the board roon

Yesterday’s skills
• Business + Technology = IT Profession • Automate algorithmic Tasks improving productivity and efficiency

Yesterday’s skills
• Mathematics + Technology = Software Development • Address a wide range of business problems

Tomorrow's Skills
• Business + Mathematics + Technology +Behavioral Science = Decision Science

Tomorrow’s Skills (Big Data, Big Analytics –Michael Minelli et al)

Privacy Landscape-Businesses
• Increased need to leverage privacy information for competitive advantage
• Huge investment in data sources and data analytics

Privacy Landscape-Criminals
• Rise in Identity theft • Sophisticated technology to exploit data security vulnerabilities

Privacy Landscape-Consumers
• Increased awareness and concern about  Collection  Use  Disclosure of personal information

Privacy Landscape-Legislators
• Responding to consumer concern by restricting use of PI
• Significant impact and restriction for business

Seven Global Privacy Principles
• Notice : Inform individuals the purpose for which information is collected • Choice : Offer individuals the opportunity to choose or opt-out • Consent : Only disclose information to third parties consistent with the above principles • Security : Take responsibility for CIA of PI

Seven Global Privacy Principles-Cont’d
• Data Integrity : Assure the reliability of PI • Access : Provide access to individuals to PI about them • Accountability : A firm must be accountable for following principles-compliance mechanism

Other Regulations
• HIPAA • GLB
• FTC

Different approach
• Privacy may be wrong focus • “Data privacy is the thing you do to keep from getting sued, data ethics is the thing you do to make your relationship with your customers positive”-James Stogdill, O’Reilly Radar

James Powell, CTO, Thomson Reuters, 2011, O’Reilly Strata Data Conference

Conclusion
• Availability of Big Data • Low Cost Hardware • New Information Management and Analytic software  Enormous opportunity  Efficiency, productivity, profitability

Concluding Remarks
• “There are known knowns, there are known unknowns, but there are also unknown unknowns”-Former U.S. Secretary of Defense, Donald Rumsfeld

Concluding Remarks…..
• “I love that quote…When I think about these three things in our daily life, they fall into these three outcomes for me.. The known unknowns more fall into the category of analysis throwing………the thing I love is the last part, if you could figure this thing out, we could have saved Afghanistan from big problems” –Google’s Avinash Kaushik in his presentation at Strata 2012, “A Big Data Imperative, Driving Big Action”

Thanks for your precious time!
Ravikumar

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close