Data Analytics and Performance

Published on March 2017 | Categories: Documents | Downloads: 51 | Comments: 0 | Views: 366
of x
Download PDF   Embed   Report

Comments

Content

H T T P : / / W W W. A N A LY T I C S - M A G A Z I N E . O R G

DRIVING BETTER BUSINESS DECISIONS

JULY/AUGUST 2013
BROUGHT TO YOU BY:

WHY IT’S THE ‘SEXIEST JOB OF THE 21ST CENTURY’

DATA SCIENTIST

ADVENT OF THE

ALSO INSIDE:
• Analytics & BPM ‘SMAC’ delivers one-two punch • Predictive Analytics Harnessing the power of big data • Forecasting & Optimization Perfect technology for complex sourcing
Executive Edge FICO chief analytics officer Andrew Jennings on what makes a good data scientist

INS IDE STORY

Big bang theory of analytics
FICO recently published an eye-popping infographic called “The Analytics Big Bang” that, according to an accompanying press release, “traces predictive analytics from the dawn of the computer age in the 1940s through the present day, and cites compelling evidence indicating that the analytics industry is at an inflection point.” The compelling evidence includes these nuggets: • Sales of analytics software grew from $11 billion to $35 billion between 2000 and 2012. • The number of data scientist job posts jumped 15,000 percent from 2011 to 2012. • 2.5 quintillion bytes of big data are created each day, enabling analytics to become more insightful, precise and predictive than at any point in history. “Predictive analytics is becoming the defining technology of the early 21st century,” says Andrew Jennings, FICO’s chief analytics officer and head of FICO Labs, which produced the infographic. “You can trace the evolution over the past few decades, but we’ve now reached a tipping point where the convergence of big data, cloud computing and analytic technology is leading to massive innovation and
2 | A N A LY T I C S - M A G A Z I N E . O R G

market disruption. We foresee predictive analytics being used to solve previously unsolvable problems, and bringing enormous value to businesses, governments and people.” The explosive growth in the demand for analytics and data scientists has created an interesting problem for managers like Jennings: What makes a good data scientist and how do you find one? Jennings addresses the question in his Executive Edge column in this issue of Analytics magazine. Jennings details four key skills and traits to look for when building an analytics team and notes that, “It’s a great time to be a data scientist, but a tricky time to hire one.” ❙

– PETER HORNER, EDITOR peter.horner @ mail.informs.org
W W W. I N F O R M S . O R G

C O N T E N T S

DRIVING BETTER BUSINESS DECISIONS

JULY/AUGUST 2 013
Brought to you by

FEATURES
30 ANALYTICS & BPM By Malcolm Ross ‘SMAC’ delivers a much-needed combination punch for peak customer experience. 38 PREDICTIVE ANALYTICS By Eric Siegel Harnessing the power of big data and the priceless collection of experience within it. 44 FORECASTING & OPTIMIZATION By Arne Andersson The development of optimization: the perfect technology for complex sourcing. 50 UTILITIES DUST OFF FORECASTING PLAYBOOK By Tao Hong and Alyssa Farrell Smart grid data brings challenges and opportunities for the power industry. 58 FORENSIC ANALYTICS By Priti Ravi Combating a growing pandemic of corporate crime, from identity theft to insider trading. 64 ANALYTIC COMMUNICATION By Evan S. Levine Fundamental principles include clarity, transparency, integrity and humility.

30

50

64
4 |

A N A LY T I C S - M A G A Z I N E . O R G

W W W. I N F O R M S . O R G

DRIVING BETTER BUSINESS DECISIONS

REGISTER FOR A FREE SUBSCRIPTION: http://analytics.informs.org INFORMS BOARD OF DIRECTORS President Anne G. Robinson, Verizon Wireless President-Elect Stephen M. Robinson, University of Wisconsin-Madison Past President Terry Harrison, Penn State University Secretary Brian Denton, University of Michigan Treasurer Nicholas G. Hall, Ohio State University Vice President-Meetings William “Bill” Klimack, Chevron Vice President-Publications Eric Johnson, Dartmouth College Vice President Sections and Societies Paul Messinger, University of Alberta Vice President Information Technology Bjarni Kristjansson, Maximal Software Vice President-Practice Activities Jack Levis, UPS Vice President-International Activities Jionghua “Judy” Jin, Univ. of Michigan Vice President-Membership and Professional Recognition Ozlem Ergun, Georgia Tech Vice President-Education Joel Sokol, Georgia Tech Vice President-Marketing, Communications and Outreach E. Andrew “Andy” Boyd, University of Houston Vice President-Chapters/Fora Olga Raskina, Con-way Freight INFORMS OFFICES www.informs.org • Tel: 1-800-4INFORMS Executive Director Melissa Moore Meetings Director Teresa V. Cryan Marketing Director Gary Bennett Communications Director Barry List Headquarters INFORMS (Maryland) 5521 Research Park Drive, Suite 200 Catonsville, MD 21228 Tel.: 443.757.3500 E-mail: [email protected]

76

DEPARTMENTS
2 Inside Story 8 Executive Edge 14 Analyze This! 20 Viewpoint 26 Forum 72 INFORMS audio, video presentations 76 Five-Minute Analyst 80 Thinking Analytically
Analytics (ISSN 1938-1697) is published six times a year by the Institute for Operations Research and the Management Sciences (INFORMS), the largest membership society in the word dedicated to the analytics profession. For a free subscription, register at http://analytics.informs.org. Address other correspondence to the editor, Peter Horner, [email protected]. The opinions expressed in Analytics are those of the authors, and do not necessarily reflect the opinions of INFORMS, its officers, Lionheart Publishing Inc. or the editorial staff of Analytics. Analytics copyright ©2013 by the Institute for Operations Research and the Management Sciences. All rights reserved.

ANALYTICS EDITORIAL AND ADVERTISING Lionheart Publishing Inc., 506 Roswell Street, Suite 220, Marietta, GA 30060 USA Tel.: 770.431.0867 • Fax: 770.432.6969 President & Advertising Sales John Llewellyn [email protected] Tel.: 770.431.0867, ext.209 Editor Peter R. Horner [email protected] Tel.: 770.587.3172 Art Director Lindsay Sport [email protected] Tel.: 770.431.0867, ext.223 Advertising Sales Sharon Baker [email protected] Tel.: 813.852.9942

6

|

EXE CU TIVE E D G E

What makes a good data scientist?
Four things to look for when building an analytics team. At the heart of a successful deployment is still human intelligence. Hiring the right people is crucial.
Companies in every industry from retail to banking are leveraging big data to improve the customer experience and enhance their bottom lines. Big data – high volume, high velocity (real time) and high variety (structured and unstructured) data – is transforming the way we live and conduct business across all industries and all aspects of daily life. This has created a talent gap for qualified data scientists. And this is not purely a Silicon Valley tech phenomenon. Gartner estimates that big data will generate six million new U.S. jobs in the next three years, including non-technical roles (see CNNMoney). Last October a Harvard Business Review article called data scientist “the sexiest job of the 21st century,” and Indeed.com reported that job postings for analytic scientists have jumped 15,000 percent between the summer of 2011 and 2012. McKinsey & Company predicted a 50 percent to 60 percent shortfall in analytic scientists in the United States by 2018. Gartner echoed this sentiment, predicting that only one-third of 4.4 million global big data jobs will be filled by 2015. Prior to 2000, the analytics function, outside of a few places like retail banking, was relegated to the finance or IT department. Now, many companies are
W W W. I N F O R M S . O R G

BY ANDREW JENNINGS

8

|

A N A LY T I C S - M A G A Z I N E . O R G

hiring autonomous analytics teams that work across departments. There is no magic to leveraging big data in pursuit of solutions to business problems. Yes there is the technology – sophisticated predictive analytics, for example – but at the heart of a successful deployment is still human intelligence. Hiring the right people is crucial. So what makes a good data scientist? What qualities should a company look for when recruiting and interviewing candidates? I’ve been with FICO for 20 years, and the company itself has been hiring data scientists (by any name) since 1956. We’ve hired some of the best – and probably a few who should never have been let near a data set. Here’s what we believe you should do when building your own analytics team. 1. Find people who are focused on solving problems, not just boosting model performance curves. Math skills are important, but the point of leveraging big data analytics is solving business problems. It’s coming up with answers to challenges that will actually be useful in the real world. It means answering specific questions in ways that will be helpful to the bottom line. For example, key questions would include: What decision are we looking to improve? How
A NA L Y T I C S

will we measure the improvement? How do we make that decision today? What are the deployment constraints? And so on. These are all practical questions before one gets to the data and the statistical techniques, which are generally the things that attract all the media attention. One example of a big data challenge that seems to resonate universally and helps highlight the importance of these questions is customer attrition. Most businesses are focused on retaining their best customers. Aside from thorny questions like what does “best” mean, there are other important questions such as, how far ahead of the potential attrition event does the prediction need to be made? In other words, how does one construct the problem to allow time between the prediction indicating an attrition risk and the delivery of some action and that action having enough time to be effective? These are business context questions that need to be answered long before a data analyst can be effective. 2. Make sure they can talk with people who don’t hold Ph.D.s. Data scientists are not simply good problem solvers; they are also good at helping to identify the right problems to solve and framing the questions in such a way as to yield meaningful answers. The challenges whose solutions have the most value to
J U LY / A U G U S T 2 013 | 9

EXE CU TIVE E D G E

an organization are not easy to solve and they often take a non-mathematical mindset. How can we make changes for the better? Where do we even start? Some data scientists are abstract thinkers who are technical and academic. And then there is a rare breed, those data scientists who can think and conceptualize and communicate to a business audience. Given some of the key questions above, ideally you want an individual who is business-savvy and well-versed enough on the larger strategy that he or she can have a discussion with the business user. If you could only choose one person this profile would be the perfect package, but these individuals are hard to find. In a team context, making that trade-off between best-in-class technical skills and strong communicators who can help translate the highly technical information into language that a business user can understand is a trade-off worth making. Also, going in reverse, those same people need to be able to translate a business need into an analytics investigation.

Ideally, even if the back-office analytics folks won’t speak to clients, you want them to want to, because that indicates that they’re thinking of things from a client perspective, not just a technology perspective. There are some data scientists who will never want to move beyond an R&D role, and for these folks, communications may seem less important – but then again, don’t you want them to be able to justify their work, explain its benefits and author white papers? 3. Put more emphasis on skills and mindset than degrees. Clearly, a strong background in numerical science is a necessity. Not all candidates need to be a Ph.D. in mathematics or operations research; they may be electrical engineers or sociologists. I have become less concerned about those specifics and far more attuned to the mindset. Good data scientists are not only technically sound, with attention to detail, but they are also inquisitive and open-minded; they question everything that they find. They ask tough questions of the data and equally of the veracity of the conclusions. Big

Join the Analytics Section of INFORMS

For more information, visit: http://www.informs.org/Community/Analytics/Membership

10

|

A N A LY T I C S - M A G A Z I N E . O R G

W W W. I N F O R M S . O R G

EXE CU TIVE E D G E

data doesn’t guarantee the right answer. People still need to think about getting to the right answer. Increasingly, the effective data scientist needs to be able to automate. This means that they need to be comfortable with writing scripts and code to make their work efficient, mixing and matching tools, and have the ability to absorb new techniques. From a long-term career perspective, one of the big opportunities is that data science can lead in any number of directions. Some end up in sales, finance or executive management. Others start off in more traditional corporate roles and slowly gravitate toward jobs that are more heavily steeped in predictive analytics. A broad skill set always comes in handy, and there will ultimately be a range of opportunities where an analytic mindset can be applied effectively. Being inquisitive goes a long way. For those looking to transition to a data analytics role from, for example, a financial or economics background, basic programming skills are important. Being able to manipulate data and think logically

will impress hiring managers. They will likely want to see a demonstrated ability to learn a programming language, and link various concepts via code. There is obviously a need for individuals who are well-versed in big data programming frameworks such as Hadoop and statistical programming languages such as R. 4. Use your current analysts to sniff out the real data scientists from the pretenders. As more and more candidates start self-identifying as data scientists, sorting through them all has become more challenging. When screening and interviewing data scientists, having one or more involved in the process, someone who really knows what he or she is doing, is an obvious part of the recruitment process. This is particularly important for those hiring managers with a traditional business background who may not know the right questions to ask. Some candidates will of course overrepresent their background and experience. They may claim to have run a full analytics process but really only have been involved in part of it. You don’t want to hire someone

Request a no-obligation INFORMS Member Benefits Packet
For more information, visit: http://www.informs.org/Membership

12

|

A N A LY T I C S - M A G A Z I N E . O R G

W W W. I N F O R M S . O R G

who says they’re a modeling superstar but in fact has been specializing in data cleansing. In all the buzz, analytics has become a broadly and loosely used term. They may know the lingo, but if they are not familiar with how a whole analytic project is put together, the knowledge gaps may be too great to overcome. A word of caution: The best analytic teams will embrace diversity of experience and skills. Just like any other hiring situation you always need to guard

against hiring people that “look just like the people you already have.” We’re entering a new age of analytic competition. It’s a great time to be a data scientist, but a tricky time to hire one. Every candidate will claim mad math skills – your job is to appraise those while also looking for the problem-solvers, the communicators and the skills that will make your data scientists a more valuable part of your whole organization. ❙
Andrew Jennings is the chief analytics officer of FICO. He is a member of INFORMS.

A NA L Y T I C S

J U LY / A U G U S T 2 013

|

13

ANALY ZE T H I S !

Silicon Valley’s ‘serial entrepreneurs’
Analytics are increasingly central to the lore, and the lure, of Silicon Valley.
A dozen or so years after the bursting of the Internet bubble, Silicon Valley is once again in the spotlight as a symbol of its times. The press coverage of Napster founder and former Facebook President Sean Parker’s $10 million wedding has brought cheers from some, jeers from others, and a series of spirited rejoinders from its perpetrator [1]. George Packer’s recent article in the New Yorker [2] shines a somewhat harsh light on the region (the money quote: “… after decades in which the country has become less and less equal, Silicon Valley is one of the most unequal places in America”). And as Somini Sengupta recently reported in the New York Times [3], despite the best efforts of various governments around the world to lure technically talented young people to their shores with visas and funding opportunities, the Silicon Valley dream of quick wealth and enduring fame continues to exert an extremely powerful pull on their imaginations (one aspiring Indian entrepreneur describes the region as “the N.B.A. of the start-up world”). Analytics are increasingly central to the lore, and the lure, of Silicon Valley. A great deal of leading-edge research on data analysis and modeling continues
W W W. I N F O R M S . O R G

BY VIJAY MEHROTRA

14

|

A N A LY T I C S - M A G A Z I N E . O R G

to happen on the Stanford campus, and many analytic innovations have roots that can be traced back to this research. More recently, as the launching pad for Internet search engines, online social networks and mobile application development, an increasing number of Silicon Valley companies have featured intelligent use of huge volumes of data as part of their “value proposition.” In turn, the development of Hadoop and the much-ballyhood “Big Data” revolution have largely happened in response to the explosion of data resulting from the needs of search, social and mobile platforms. One recent Saturday night, I took a trip down to Silicon Valley from my home in Oakland. While the drive took less than an hour, the cultural distance is startling: the New York Times has referred to Oakland as “rust belt town” [4], and the city’s role in today’s technologically enabled global economy is primarily as a prominent physical node (because of its large container port). I drop in to a sports bar to see an old friend, an analytics professional who has been working for more than a decade in the world of online advertising. He brings me up to speed on recent developments: the Hadoop-enabled platform that his group has been building and using for a couple of years lets them utilize more of the data that their network captures and
A NA L Y T I C S

enables them to control decisions on a much more granular level than their previous optimization platform. He also expresses some frustrations: “Senior management depends on our algorithms to drive revenue, and there are a lot of people tasked to make sure the targets are actually met. But they don’t understand – and don’t really try to – what the models are actually doing, so every time we seem to be heading for a quarterly revenue shortfall there’s some kind of fire drill where a lot of silly ideas get thrown around by people who don’t know what they are talking about, because they are afraid of looking dumb in front of executives that are demanding answer from them.” As I’m leaving, he points out that most of those people have MBAs, “so keep doing what you are doing with that consulting class.” [5] At my next destination, over drinks at a pleasant birthday party, I meet a supply chain manager who admits to an obsession with relentlessly squeezing out costs. My attempts to engage him in a discussion of the broader impact of global

Subscribe to Analytics
It’s fast, it’s easy and it’s FREE! Just visit: http://analytics.informs.org/

J U LY / A U G U S T 2 013

|

15

ANALY ZE T H I S !

He clearly knows the right things to say, but for some reason, I still leave this conversation feeling like he’s a lot more focused on his bonus plan than his supplier scorecards.

supply chains, including the impact of the recent tragedy in Bangladesh [6], are moderately successful, and he quickly mentions regular supplier audits, best practices and various other programs that his company promotes on their website. He clearly knows the right things to say, but for some reason, I still leave this conversation feeling like he’s a lot more focused on his bonus plan than his supplier scorecards. Finally, I end up at a somewhat upscale dinner with an interesting collection of technology professionals, most of whom I’m just meeting for the first time. I quickly notice that several of them are wearing attractive, elegantly designed gadgets to track blood sugar levels, heart rates, blood pressure and other health-related data. This observation triggers a friendly debate about which glucose meter is the best and most technologically advanced (several of us are from India, which makes us three times more likely than white Americans to be diagnosed as diabetics [7]). As for me, since being diagnosed with type 2 diabetes several years ago, I basically use the only one for which my insurance company is willing to provide test strips. Our host listens to this discussion with a faintly amused look. From previous conversations, I know that he is working hard on a health-related start-up company. He and his colleagues, most with computer science and engineering backgrounds, have been scouring a number of publicly available databases searching for

Join the Analytics Section of INFORMS

For more information, visit: http://www.informs.org/Community/Analytics/Membership

16

|

A N A LY T I C S - M A G A Z I N E . O R G

W W W. I N F O R M S . O R G

ANALY ZE T H I S !

correlations and opportunities. They have also been furiously reading trade publications and research literature, one of them even taking a class on endocrinology, to develop the background needed to generate better hypotheses to investigate. The guys leading this thing up are what Silicon Valley folks call “serial entrepreneurs,” and they have successfully sold a couple of companies already. I’m not sure what they will end up doing with this venture – neither do they, frankly – but I’m pleased to see that they are focused on using data to improve the state of our public health, and I wouldn’t bet against them doing something significant. A quarter century after arriving there as a young and naïve graduate student, my own feelings about Silicon Valley are decidedly mixed. It is certainly no accident that I no longer live or work in the tech industry echo chamber [8], and in talking to the supply chain executive and listening to the jousting about whose glucose monitor was the most techno-chic, I was reminded of some of the reasons. But it is also no accident that I still live nearby, thereby keeping my ringside seat at the circus, and that I look for reasons

to visit there often. The vast majority of the young people who continue to flock in droves to Silicon Valley are not actually going to change the world much. And yet collectively its denizens have had an astonishing impact on our world – and on the world of applied analytics – and there is seemingly no end in sight. Anyway, I’ll be heading back down again next month. Who knows what I might find down there next time? ❙
Vijay Mehrotra ([email protected]) is an associate professor in the Department of Analytics and Technology at the University of San Francisco’s School of Management. He is also an experienced analytics consultant and entrepreneur, an angel investor in several successful analytics companies and a longtime member of INFORMS.

REFERENCES
1. See, for example, http://news.cnet.com/83011023_3-57589288-93/sean-parker-on-his-weddingredwoods-and-death-threats/ 2. http://www.newyorker.com/ reporting/2013/05/27/130527fa_fact_packer 3. http://www.nytimes.com/2013/06/06/ technology/wishing-you-and-your-start-up-werehere.html 4. http://www.nytimes.com/2012/08/05/magazine/ oakland-occupy-movement.html?pagewanted=all 5. For more about my MBA course, see http:// analytics-magazine.org/may-june-2013/798analyze-this-course-puts-students-in-theanalytics-game 6. See, for example, http://www.ft.com/cms/ s/0/5bd48c1a-b7e2-11e2-9f1a-00144feabdc0. html#axzz2WUz50Bbi

Help Promote Analytics Magazine
It’s fast and it’s easy! Visit: http://analytics.informs.org/button.html

7. http://forecast.diabetes.org/news/indianethnicity-tied-higher-diabetes-risk 8. See, for example, http://bits.blogs.nytimes. com/2013/06/02/disruptions-the-echo-chamberof-silicon-valley/

18

|

A N A LY T I C S - M A G A Z I N E . O R G

W W W. I N F O R M S . O R G

VIE WPOIN T

O.R. vs. analytics … and now data science?
The results of data science are not just competitive advantages; results of data science are the products of the company. The data is the product.
In a 2010 survey [1], members of the Institute for Operations Research and the Management Sciences (INFORMS) were asked to compare operations research (O.R.) and analytics. Thirty percent of the respondents stated, “O.R. is a subset of analytics,” 29 percent stated, “analytics is a subset of O.R.,” and 28 percent stated, “advanced analytics is the intersection of O.R. and analytics.” The remaining 13 percent were split between “analytics and O.R. are separate fields” (7 percent) and “analytics is the same as O.R.” (6 percent). The emergence of data science only adds to the confusion. Is data science just another clever marketing term popularized by the math illuminati? INFORMS has developed working definitions of both O.R. and analytics through surveys of INFORMS members and Analytics magazine readers. O.R. is the “application of advanced analytical methods to help make better decisions.” Analytics is the “scientific process of transforming data into insight for better decision-making.” DATA SCIENCE: AN EMERGING FIELD Data science is an emerging field with no standard definition yet. An early description can be found
W W W. I N F O R M S . O R G

BY BRIAN KELLER

20

|

A N A LY T I C S - M A G A Z I N E . O R G

in “Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics” [2]. I think of data science as an interdisciplinary field combining mathematics, statistics and computer science to create products based on data. The delivery of data products is the key idea. More on that later. Indeed, the definitions for each sound similar. Differences begin to emerge when looking at O.R., analytics and data science in terms of the focus of the discipline and types of techniques applied. Operations research tends to focus on the solution of a specific problem using a defined set of methods and techniques [3]. Classic examples of O.R. include facility location problems, scheduling and deciding how many lines should be opened at a service center, which are all problem-solution focused. Techniques tend to be model-driven in which analysts select a reasonable model, fit the model parameters to the data and analyze results. Based on survey data in “ASP: The Art and Science of Practice” [3], the top O.R. quantitative skills are optimization, decision analysis and simulation. Analytics tends to go beyond solving a single problem and focuses on overall business impact [3]. Clas sic examples of analytics include business intelligence to summarize
A NA L Y T I C S

operations and customer segmentation for improved marketing and sales. The same survey identified the top analytics quantitative skills as statistics, data visualization, data management and data mining. Data science tends to focus on data as a product. For example, Amazon records your searches, correlates them with other users and offers you suggestions on what you might like to buy. Those suggestions are data products that personalize the world’s biggest market, which drives sales. Google Now presents the results of your search before you even think to search for the information. Google Now is a data product that increases use of Google services, which delivers added revenue to Google. Amazon product recommendations and Google Now may sound like an analytic, which focuses quantitative effort on a broader business impact. However, the results of data science are not just competitive advantages; results of data science are the products of the company. The data is the product.

Subscribe to Analytics
It’s fast, it’s easy and it’s FREE! Just visit: http://analytics.informs.org/

J U LY / A U G U S T 2 013

|

21

VIE WPOIN T

Creating data products requires a strong sense of creativity and diverse perspectives of thought. As such, data scientists hail from a variety of academic backgrounds including O.R., statistics, computer science, engineering, biology and physics.

Creating data products requires a strong sense of creativity and diverse perspectives of thought. As such, data scientists hail from a variety of academic backgrounds including O.R., statistics, computer science, engineering, biology and physics. The common themes across data scientists are creativity, curiosity to ask bigger questions, skills in data analysis and programming. Data science often relies on combining multiple types of data together for analysis. Some data may be company proprietary; other data is available from one of the many public data sets available on the Web. These data sets often are too large to analyze using desktop tools, have missing or erroneous data, vary in structure across data sets, and may be lacking structure entirely (e.g., free-form text in maintenance repair logs). The combination of data size and structure adds an additional challenge on top of data analysis – the data itself becomes part of the problem. LEVERAGING DIVERSE SKILLS Because so much effort of data science work falls on parsing, cleaning and managing the data, data scientists often must leverage diverse software development skills. One project may use Python for data acquisition and parsing, R for exploratory analysis, Hadoop for data storage and Map Reduce via

Request a no-obligation INFORMS Member Benefits Packet
For more information, visit: http://www.informs.org/Membership

22

|

A N A LY T I C S - M A G A Z I N E . O R G

W W W. I N F O R M S . O R G

VIE WPOIN T

Java for production analytics, with results delivered through Ruby on Rails. Analytics practitioners share in many of the data management challenges of data scientists, although usually at a smaller scale. In contrast, O.R. applications tend to focus on problem solution, and O.R. analysts usually use fewer tools during a project. Visualization is key to the success of data science projects since the information must be consumable to users. Who would want to use Google Now if it presented results in a table with p-values? Similarly, analytics practitioners value data visualization, whereas visualization is much less important to O.R. practitioners [3]. Analysis techniques may also differ with the large amounts of data collected. O.R. and analytics approaches generally assume a model and then fit the model to the data. The large amounts of data collected in many data science projects enable an alternative, model-free, datadriven approach. For example, automated language translation algorithms were predominantly manual, rule-driven approaches until an increase in storage and compute power enabled storage and processing of large amount of bilingual text corpora from which statistical models could infer the translation rules from the data.
24 | A N A LY T I C S - M A G A Z I N E . O R G

DuoLingo [4], a free language learning website, has created a data product based on a data-driven approach. As users progress through lessons, they help translate websites and documents. In other lessons, users vote on correctness of translations. Statistical models based on user skill choose the best translations of documents, which others have submitted to be translated for a fee. O.R., analytics and data science are closely related – all apply math to gain insights – and the fuzzy descriptions of the three disciplines above have boundaries as porous as the borders of countries in the European Union. However, just as a person in Germany is most likely a German (although he or she could be French or Italian), an O.R./analytics/data science practitioner will most likely fit the description outlined in this article. ❙
Brian Keller ([email protected]), Ph.D., is a data science practitioner and lead associate at Booz Allen Hamilton. He is a member of INFORMS.

REFERENCES
1. Matthew Liberatore, Wenhong Luo, “INFORMS and the Analytics Movement: The View of the Membership,” Interfaces, Vol. 41, No. 6, November-December 2011, pp. 578–589. 2. W. S. Cleveland, “Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics,” ISI Review, Vol. 69, p 21-26, 2001. 3. Matthew Liberatore, Wenhong Luo, “ASP: The Art and Science of Practice,” Interfaces, Vol. 43, No. 2, p 194-197, March/April 2013. 4. www.duolingo.com

W W W. I N F O R M S . O R G

The Trade Extensions platform allows you to transform your day-to-day operations and eliminate the boundaries between sourcing and supply. Our software will take any amount of your data, in any format, from any system and its world-class optimization and reporting tools will enhance your decision making on a daily basis and improve your supply chain. We are used by some of the world’s most respected organizations including: P&G, Coca-Cola, Unilever, BP, Kimberly-Clark, Cargill, Impress, Alcan, Dow, Cabot, Huhtamaki, Ineos Polyolefins and AT Kearney. Contact us to see how we can help you.

www.tradeextensions.com Corporate Headquarters Uppsala, Sweden +46 (0) 18 13 66 00

[email protected] Europe Headquarters Swaffham, UK +44 (0)1760 720 746

U.S. Headquarters Houston, TX USA +1 (855) 215 8387

FORUM

Oil & gas producers need to tame the gusher … of data
Data grew more important as the industry grew, reserves became harder to exploit and drilling technology evolved.
Throughout the century-and-a-half since the dawn of the commercial petroleum industry, oilmen have always hoped for the gusher – the big find that would spew enough oil to make them rich. There have always been far more “dry holes” than gushers, however, and the proportion only gets worse as oil and gas become harder to find and more difficult to produce. One thing the industry has in abundance today, however, is data. And just like some of the biggest oil discoveries, the data gusher offers huge promise if it can be tamed. However, the exploration and production (E&P) industry, and the IT companies that support it, have a lot of work ahead to derive maximum value from their growing troves of data. The first oil wells didn’t rely on data at all. They were sited where oil seeped from the ground naturally, and the challenge – not simple, but not datadependent – was figuring out how to dig or drill down to the source. Data grew more important as the industry grew, reserves became harder to exploit and drilling technology evolved. Drillers began keeping paper records of what occurred during each work shift or “tour.” They tracked basic metrics such as the number of
W W W. I N F O R M S . O R G

BY WARREN WILSON

26

|

A N A LY T I C S - M A G A Z I N E . O R G

feet drilled per hour or day, obstacles encountered, injuries sustained. Over the years recordkeeping has become steadily more thorough and sophisticated. Today the E&P industry has vastly better tools for every aspect of oilfield operations including: three-dimensional maps of subsurface geological structures and hydrocarbon reservoirs; graphs or “logs” of the wells’ downhole conditions (temperature, pressure, porosity, permeability, etc.); and records of injuries and environmental incidents. New equipment is often fitted with sensors that produce steady streams of data about temperature, vibration and other parameters that indicate whether the asset is operating as it should, or whether it needs service, repair or replacement. Yet E&P companies still find themselves in the same position as their predecessors and their counterparts in other industries: They are awash in data, but they don’t have all the insight they need. For example, they have copious real-time data from individual wells, but they do not have a good handle on the dynamics of complex unconventional reservoirs. Similarly, they have access to vastly better analytical tools than ever before, but few companies can claim to have fully optimized field operations, production or asset management.
A NA L Y T I C S

ANALYTICS TOOLS MUST EVOLVE E&P companies’ drilling programs have always relied primarily on historical data that describe a given region and its history. Extrapolation and interpolation suggest where to drill next – if you drill between two producing wells, your odds of success are relatively good. Outside known reservoir boundaries, the odds fall off dramatically; so-called “wildcat” wells face a much greater risk of producing no return on many millions of dollars invested. That has always been the biggest gamble in oil exploration – even when armed with the best historical, descriptive data available, you still have to spend large sums up front to drill the well before finding out if your data is any good. Today the industry’s data and analytics needs are changing, and so are the tools at its disposal. Having found and developed most of the world’s “easy” oil and gas reserves, E&P companies are venturing into even more remote locations and extracting hydrocarbons from unconventional sources. They are tapping into shale rock so impermeable that it won’t give up its gas and oil without first being “fractured” with water, sand and chemicals injected under high pressure. They are tapping sand deposits that contain oil so viscous that it won’t flow without first being diluted with solvents or softened by steam.
J U LY / A U G U S T 2 013 | 27

FORUM

These unconventional sources pose unique challenges that require greater precision and real-time analytics – for example, to keep drill bits positioned precisely where they need to be within the shale “pay zone” and to control the placement, composition and pressure of fracking fluids to yield optimal results. At the same time, E&P companies increasingly need predictive analytics tools, for a variety of purposes. They need to better understand how current production methods will affect long-term yield. They need more accurate predictions of asset behavior to improve continuity in drilling and production while minimizing the costs of spare equipment or service crews. Software vendors are increasingly offering analytics tools that address such needs. But prediction is just one step forward. The next is so-called “prescriptive” analytics that go beyond merely predicting future behavior, to recommend the best course of action to achieve a desired result. Such capabilities are quite rare today and at a very early stage of development. But they promise to bring new

levels of performance, not just in E&P but in many other industries as well, because they track the results of their recommendations and feed those results back into the prescriptive algorithms to produce (in theory, at least) better and better recommendations over time. COMPLEX NEEDS DRIVE DEMAND FOR NEW SOLUTIONS Despite all the potential benefits of advanced analytics in oilfield operations, their adoption is still at a very early stage. That is true of most industries, simply because the technologies themselves are new. Software vendors today are evangelizing their predictive capabilities; only a tiny number yet offer “prescriptive” analytics. Indeed, the term itself is not yet widely known. An additional factor in IT adoption in E&P is the industry’s insularity. It is a world unto itself, an industry that operates largely outside of everyday view and speaks a specialized language that few outsiders understand. It is an industry driven by geologists and engineers who tend to regard IT as a mere support tool for existing operations. The idea that IT, and particularly analytical software, can provide strategic and competitive advantage is not widely held. But until it is, E&P companies will not find or produce as much oil and gas
W W W. I N F O R M S . O R G

Help Promote Analytics Magazine
It’s fast and it’s easy! Visit: http://analytics.informs.org/button.html

28

|

A N A LY T I C S - M A G A Z I N E . O R G

as they could, nor will they manage their operations as efficiently and safely as they could. E&P companies owe it to themselves, and to their shareholders, to broaden their traditional view of IT and consider the strategic advantage it can provide. IT vendors, for their part, must state their value propositions in terms that E&P engineers can understand. They must explain, in plain language, how IT can help E&P companies find and extract more oil. ❙

Warren Wilson ([email protected]) leads Ovum’s Energy team, focusing primarily on IT for upstream Oil & Gas. He has been an analyst for 14 years. He joined Ovum in 2006 when Ovum acquired his former employer, Summit Strategies. At Summit his primary area of responsibility was mobile business applications. On joining Ovum, his research focus shifted to business management applications such as enterprise resource planning, supply chain management, and analytics. Before becoming an IT analyst, Wilson had been a reporter and editor for U.S. newspapers including the Casper (Wyoming) Star-Tribune, where he covered oil & gas and other energy industries. He majored in geology at Carleton College in Northfield, Minn., and later worked in the oilfield as a roughneck and well logger.

A NA L Y T I C S

J U LY / A U G U S T 2 013

|

29

SOC IAL , MO B I LE , A NA LY T I C S & C LOU D

Analytics & BPM
‘SMAC’ delivers a much-needed combination punch for peak customer experience.

BY MALCOLM ROSS
In some circles, business process management (BPM) has developed an unfortunate reputation, particularly among general business media. From the media’s perspective, the negative
30 | A N A LY T I C S - M A G A Z I N E . O R G

I

reputation is implicit in the words themselves: “process” implies rigidity, and “management” implies slowness. Business that’s rigid and slow is antithetical to success in the age of social, mobile and cloud technologies – and it
W W W. I N F O R M S . O R G

doesn’t address the importance of analytics at all. BPM as a phrase contains everything that may sound bad about old-fashioned enterprise software. The truth is, it is precisely the advent of social, mobile and cloud that makes business process management, when combined with sophisticated data analytics, the cornerstone of success in this new age. The vocabulary of IT is increasingly embracing social, mobile, analytics and cloud – or its catchier acronym “SMAC.” All of these words have one thing in common. They are all ways to engage customers, whether internal or external, to do several important things: • to create meaningful interpretations of information, • to adapt quickly and nimbly to even small changes in the business environment, • to improve the customer experience, and most importantly, • to do it all more quickly than ever before. SYSTEMS OF RECORD VERSUS SYSTEMS OF ENGAGEMENT Understanding how analytics works to support intelligent business process and provide peak customer experience means taking a hard look at the two camps of enterprise data-related software.
A NA L Y T I C S

The industry analyst firm Forrester and Forrester senior analyst Clay Richardson draw a distinction between “systems of record” and “systems of engagement.” ERP software, big data repositories and other similar technologies comprise what Richardson refers to as systems of record. They record data according to organizational function, and they offer a fairly static representation of a company’s performance – essentially a snapshot, or in the best case, a dashboard. Systems of engagement sit in front of systems of record. They provide the rules for creating customer-facing responses to the data stored in those systems of record. That includes rules for interacting with customers in the mobile, social and cloud environment. The rules themselves are based on an analysis of the information in the systems of record. Almost by definition, business process management can be seen as a system of engagement. It’s how workers engage with the people and data they need to make informed business choices, quickly, easily, measurably and routinely. Some BPM systems, it can be argued, still to have one foot in the systems of record camp, simply because they record business rules and processes. At the high end, these systems may even include specific functionality that allows companies to look at their business as a
J U LY / A U G U S T 2 013 | 31

S M AC DEL IVE R S

Any organization that uses BPM to understand data, create business rules that can be adapted almost on the fly, and apply those rules through various media such as social and mobile technology, is ultimately creating the best possible customer experience it can offer.

collection of records – essentially a snapshot of information related to particular work transactions. It’s when SMAC enters the picture, though, that the distinction between the two systems emerges. The introduction of methods of engagement such as social, mobile, analytics and cloud, puts a picture frame around Forrester’s contention that BPM is much more than a system of record, and is, in fact, a sophisticated system of engagement. The benefit of this type of analytically supported system of engagement is that it creates routines (business processes) for the best possible customer experience. It is a means to that end, not an end itself. With SMAC, systems of engagement offer businesses a way to make sense of all the means by which workers connect to their business data, their colleagues’ brainpower and their customers. Any organization that uses BPM to understand data, create business rules that can be adapted almost on the fly, and apply those rules through various media such as social and mobile technology, is ultimately creating the best possible customer experience it can offer. Everything else is just a delivery mechanism. In particular, social networking – whether accessed through mobile networks or through the cloud – offers a truly meaningful customer experience only when filtered through the lens of analytically supported business processes. Social media without the context of work is nothing more than water cooler talk. It’s the overlap of work with social engagement (call it “worksocial”) that really creates a compelling case for business success. And implicit in the notion of work is the analytical component.
W W W. I N F O R M S . O R G

32

|

A N A LY T I C S - M A G A Z I N E . O R G

The worksocial model applied to data analysis and BPM delivers a modern and more consistent customer experience, increasing organizational productivity and operational efficiency for a more agile business. Real-time analytics can be improved to make informed decisions and take actions from anywhere, at any time. Organizations can quickly respond to new market opportunities, and enterprise social software finally has a real business purpose. Now that we’ve had some background theory of how SMAC relates to analytics and business process, let’s look at what it means in the real world. ANALYTICS AND PROCESS IN ACTION – PRACTICAL APPLICATIONS For analytics to be actionable, analysis must work in conjunction with process. Business intelligence and analytics yields intelligent business process management. By having more real-time intelligence and analytics directly feeding automated processes via mobile, social and cloud mechanisms, an organization can see trends, issue actions and measure the results through reports delivered by enterprise social media. The application of SMAC to business intelligence and process essentially creates a definition of what a peak customer experience looks like to any one company,
A NA L Y T I C S

all the way up to what someone should do when actually interacting with the customer. Again, keep in mind that both internal and external customers benefit. Following are only a few examples of tangible benefits: Field service management and operational efficiency. One of the largest wind turbine companies in the United States and the world, EDP Renewables has 28 separate, geographically dispersed wind farms each populated with massive 300-foot tall turbine generators.

J U LY / A U G U S T 2 013

|

33

S M AC DEL IVE R S

The company’s wind farms generate 3.3 gigawatts of green wind energy – more power than created by the Hoover Dam. The energy market fluctuates greatly. The margin on a watt of energy produced versus a watt sold is constantly in flux, depending on the weather and usage patterns. Identifying areas where the market is very good, such as where demand for climate control is high, might dictate prioritization of that equipment for maintenance. EDP uses this type of real-time streaming data analytics information to help prioritize how the company manages and maintains its wind turbine assets. Weather patterns and related North American weather events are streamed in to the organization’s business process management application as big data information. By analyzing that information in context with data on turbine maintenance issues, EDP can anticipate weather patterns to identify the potential energy output – and the potential price of that energy – for particular farms. This is essential for prioritizing the remediation of turbine issue fixes to maximize the profitability of the company.

With thousands of wind turbine components from a variety of vendors, EDP also needs to be able to analyze the relative quality of the turbines and compare their performance over time against the cost to repair and maintain these pieces of equipment. They are constantly reviewing their vendors in this analysis. By correlating factors such as time between repairs from one vendor to another, EDP can make appropriate assessments on the prioritization and maintenance of its equipment. Quality control. A leading global beverage provider uses business analytics in mobile applications to assist in inspecting individual stores for quality control to optimize the customer experience. With tens of thousands of stores all over the world, the company has dramatically accelerated the inspection process and the return of inspection reports that identify areas for improvement. Inspectors in the field are able to perform complete inspections using an iPad, with immediate tabulation of individual store issues, as well as regional and supplier trends. In any given location, field inspectors can examine the store for factors such as equipment and service quality, customer experience, signage and cleanliness. The real-time data crunching that produces scoring data enables
W W W. I N F O R M S . O R G

Subscribe to Analytics
It’s fast, it’s easy and it’s FREE! Just visit: http://analytics.informs.org/

34

|

A N A LY T I C S - M A G A Z I N E . O R G

S M AC DEL IVE R S

inspectors to then sit with store managers and discuss action plans all in the single inspection visit. The company calculates that this process acceleration saves it more than 30,000 hours annually in inspection times, directly improving the quality of the customer experience more rapidly and consistently. Customer satisfaction. Online retail giant Amazon does not source every product they sell. When you purchase something from Amazon, that product may actually be sourced from a departmental store, specialty merchant or other retail organization. When a customer receives an incorrect shipment or an incorrect charge, Amazon refers to this error as a “price purchase variance.” As the intermediary, Amazon is responsible for resolving that variance. Because of the sheer quantity of transactions conducted through its website, Amazon has considerable familiarity and expertise in handling these types of exceptions. In some cases in the past, Amazon would have simply absorbed the difference in cost or the cost of correcting the transaction problem. Price purchase variance settlements were easily costing the company millions of dollars a year. Amazon began developing a process environment that would take in the massive volume of exception handling
36 | A N A LY T I C S - M A G A Z I N E . O R G

and apply business rules to it, to automate decision logic. They can see how the variances are coming in, look at the past performance of other vendors, determine the vendors’ track record of fulfilling products and determine whether there is a history of that type of transaction problem. Using statistical analysis of price purchase variance frequency, Amazon now applies business rules to automatically push the issue back to the sourcing vendor and request resolution – whether auto approval, make good or other tasks for the vendors. Through the same system, they can initiate communication with the customers to assure them that they are aware of the problem and are working to resolve it. Social engagement for collaboration. Social media for business involves both external and internal aspects. In consumer social platforms (Twitter, Facebook, LinkedIn, etc.), there clearly can be a business integration element, including brand management and customer relationship management. There’s also an internal component to social media for business. This aspect focuses on developing a project-based social collaboration framework for the open exchange of data and knowledge within the enterprise. In some cases, there may also be a hybrid of external and internal
W W W. I N F O R M S . O R G

elements. Processes can be designed to read external social media feeds, identify exceptions or trends, and bring those things into an automated environment to set up response mechanisms (particularly important for brand and reputation management). By applying social media in the context of work – the “worksocial” concept mentioned earlier in this article – organizations can openly share both collaborations and conversations related to the analysis at hand. Whether it is an individual case of turbine maintenance, story inspection, invoice exception or any other process, someone else involved in the same type of task can see that information and openly share related data or process recommendations and strategies. This improves business decision-making and the customer experience as well. CONCLUSION Peak customer experience has long been in need of a healthy SMAC. Social, mobile, analytics and cloud capability fuels the ways in which business processes can
A NA L Y T I C S

be automated. Organizations can connect their workers to the job at hand, letting them all make use of whatever information they need to deliver the best result to customers. And SMAC can improve the quality of the user experience at the speed of the Internet, bringing together many users, using many devices, across many different networks, in many different places everywhere, all at once. ❙
Malcolm Ross is VP of Product Marketing for Appian. He can be reached at Malcolm.ross@ appian.com.

J U LY / A U G U S T 2 013

|

37

COM PETITIVE E D G E

Predictive analytics
Harnessing the power of big data.

BY ERIC SIEGEL
Every day’s a struggle. I’ve faced some tough challenges such as which surgery to get, how to invest for my business and even how to deal with identify theft. With so much stuff coming at me from all angles, daily prosperity relies on spam filters, Internet search engines, and personalized music and movie recommendations. My mailbox wonders why companies still don’t know me well enough to send less junk mail.
38 | A N A LY T I C S - M A G A Z I N E . O R G

E

These predicaments matter. They can make or break your day, year or life. But what do they all have in common? These challenges – and many others like them – are best addressed with prediction. Will the patient’s outcome from surgery be positive? Will the credit applicant turn out to be a fraudster? Will the investment fail? Will the customer respond if mailed a brochure? There’s another angle. Beyond benefiting you and I as individuals, prediction
W W W. I N F O R M S . O R G

bestows power upon an organization: Big business secures a competitive stronghold by predicting the future destiny and value of individual assets. Figure 1: The learning process. For example, in the mid-1990s, Chase Bank witnessed a windfall predicting Data can seem like such dry, unmortgage outcome. By driving millions of interesting stuff. It’s a vast, endless transactional decisions with predictions regiment of recorded facts and figures. about the future payment behavior of It’s the unsalted, flavorless residue dehomeowners, Chase bolstered mortgage posited en masse as businesses churn portfolio management, curtailing risk and away. boosting profit. But the truth is that today’s big data embodies a priceless collection of expeINTRODUCING ... THE CLAIRVOYANT rience from which to learn. Every medical COMPUTER procedure, credit application, Facebook Making such predictions poses a post, movie recommendation, fraudulent tough challenge. Each prediction deact, spammy e-mail and purchase of any pends on multiple factors: the various kind is encoded as data and warehoused. characteristics known about each paThis veritable Big Bang delivers a plethtient, each homeowner and each e-mail ora of examples so great in number only that may be spam. How shall we attack a computer could manage to learn from the intricate problem of putting all these them. pieces together for each prediction? This learning process discovers and The solution is machine learning; builds on insightful gems such as: computers automatically discovering pat• Early retirement decreases your life terns and developing new knowledge by expectancy. furiously feeding on modern society’s • Online daters more consistently greatest and most potent unnatural rerated as attractive receive less source: data. interest.
A NA L Y T I C S J U LY / A U G U S T 2 013 | 39

PR ED IC TIVE A N A LY T I C S

• Vegetarians miss fewer flights. • Local crime increases after public sporting events. Machine learning develops predictive capabilities with a form of numbercrunching, a trial-and-error learning process that builds upon statistics and computer science. In commercial, industrial and government applications – in the real-world usage of machine learning to predict – it’s known as: Predictive analytics — Technology that learns from experience (data) to predict the future behavior of individuals in order to drive better decisions. APPLIED PREDICTION “The powerhouse organizations of the Internet era, which include Google and Amazon ... have business models that hinge on predictive models based on machine learning.” – Professor Vasant Dhar, Stern School of Business, NYU Every important thing a person does is valuable to predict, including:

consume, work, love, procreate, vote, mess up, commit a crime and die. Here are some examples: • Prediction drives the coupons you get at the grocery cash register. U.K. grocery giant Tesco predicts which discounts will be redeemed in order to target more than 100 million personalized coupons annually at cash registers across 13 countries. This increased coupon redemption rates by a factor of 3.6 over previous methods. • Predicting mouse clicks pays off massively: websites predict which ad you’ll click in order to instantly choose which ad to show, driving millions in new-found revenue. • Netflix awarded $1 million to a team of scientists who best improved their recommendation system’s ability to predict which movies you will like. • Obama was re-elected in 2012 with the help of voter prediction. The campaign predicted which voters would be positively persuaded by campaign contact, and which would be inadvertently influenced

Join the Analytics Section of INFORMS

For more information, visit: http://www.informs.org/Community/Analytics/Membership

40

|

A N A LY T I C S - M A G A Z I N E . O R G

W W W. I N F O R M S . O R G

to vote adversely. Acting on these predictions was shown to successfully convince more voters to choose Obama than traditional campaign targeting. • The leading career-focused social network, LinkedIn, predicts your job skills. • Online dating leaders Match.com, OkCupid and eHarmony predict which prospect on your screen would be the best bet at your side. • Target predicts customer pregnancy

in order to market relevant products accordingly. Nothing foretells consumer need like predicting the birth of a new consumer. • Student essay grade prediction has been developed for automatic grading. The system grades as accurately as human graders. • Wireless carriers predict how likely it is you will cancel and defect to a competitor – possibly before you have even conceived a plan to do so – based on factors such as

A NA L Y T I C S

J U LY / A U G U S T 2 013

|

41

PR ED IC TIVE A N A LY T I C S











dropped calls, your phone usage, billing information and whether your contacts have already defected. Wikipedia predicts which of its editors, who work for free to keep this priceless asset alive, are going to discontinue their valuable service. Allstate Insurance tripled the accuracy of predicting bodily injury liability from car crashes based on the characteristics of the insured vehicle. This could be worth an estimated $40 million annually to the company. At Stanford University, a machine learned to diagnose breast cancer better than human doctors by discovering an innovative method that considers a greater number of factors in a tissue sample. Researchers predict your risk of death in surgery based on aspects of you and your condition in order to inform medical decisions. Crime-predicting computers help decide who belongs in prison. To assist with parole and sentencing decisions, officials in states such as Oregon and Pennsylvania consult prognostic machines that assess the risk a convict will offend again.

there’s ample room for operational improvement; organizations are intrinsically inefficient and wasteful on a grand scale. Marketing casts a wide net; “junk mail” is marketing money wasted and trees felled to print unread brochures. An estimated 80 percent of all e-mail is spam. Risky debtors are given too much credit. Applications for government benefits are backlogged and delayed. With predictive analytics, millions of decisions a day determine whom to call, mail, approve, test, diagnose, warn, investigate, incarcerate, set up on a date and medicate. By answering this mountain of smaller questions, predictive analytics combats financial risk, fortifies healthcare, conquers spam, toughens crime-fighting, boosts sales and may in fact answer the biggest question of all: How can we im prove the effectiveness of all these massive functions across business, government, healthcare, non-profit and law enforcement work? ❙
Eric Siegel, Ph.D., is the founder of Predictive Analytics World (www.pawcon.com) and the author of “Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die,” from which this article was adapted with permission of the publishers, Wiley. Upcoming Predictive Analytics World conferences will be held in Boston, San Francisco, Chicago, Washington, D.C., Berlin and London. For more information about predictive analytics, see the Predictive Analytics Guide. ©2013 Eric Siegel. All rights reserved.

Organizations of all kinds benefit by applying predictive analytics, since
42 | A N A LY T I C S - M A G A Z I N E . O R G

W W W. I N F O R M S . O R G

S U PPLY CH A I N M A N AG E ME NT

Forecasting and optimization
Beyond supply chain planning: the development of optimization in complex sourcing.

BY ARNE ANDERSSON
Optimization is the perfect technology for sourcing since it deals with selecting the best element that meets specified criteria from some set of available alternatives, i.e., “finding the best deal.” Its use is becoming more widespread in industry as data handling, processing power and solvers have improved to the extent that there is no other way to handle the levels of data and complexity in sourcing projects that are run today.

O

What constitutes the “best deal” is a discussion in its own right, but the person or organization doing the buying will determine it. If it is a commodity with a clearly defined specification it could well be a question of the lowest price. If, however, it is a service that is being sourced, “softer” criteria will more likely needed to be met, so price is only one factor that will be considered. As more criteria are introduced, the complexity increases and it is the ability to handle complexity that has
W W W. I N F O R M S . O R G

44

|

A N A LY T I C S - M A G A Z I N E . O R G

seen a dramatic change in the way large organizations approach sourcing. Fifty years ago the telephone and a notepad were the tools available for sourcing, so the levels of complexity were relatively low. Spreadsheets and e-mail dramatically increased the levels of complexity that could be handled, but even these techniques pale into insignificance with the levels of complexity that are handled by online sourcing platforms today. A typical “buying event” today will have thousands of items, and tens of thousands of offers from hundreds of suppliers. Even the simplest event will have potentially millions of combinations of goods and suppliers, so optimization is the only way to analyze this level of data. HANDLING COMPLEXITY The ability to handle large amounts of data has also seen sourcing change from a one-way process where suppliers are asked to make offers for individual items based on the sourcing companies’ criteria to a process of collecting an array of information from suppliers and analyzing the information collected in order to find the best solution. This combinatorial approach allows suppliers to express their strengths by creating their own groups of items so they can make their most competitive offers. Trade Extensions, for example, carried out the first online
A NA L Y T I C S

combinatorial auction in February 2001 when it worked with Volvo on a packaging tender, which involved 600 items, 15 suppliers and had a value of approximately $15 million. To show how far the levels of complexity have increased, a U.S. bank recently used the platform to allocate a spend of $1 billion sourcing the materials to produce and deliver two billion items of direct mail after collecting 400,000 bids from more than 100 suppliers for 65,000 items. This level of complexity is commonplace nowadays, and many recent projects take the complexity to another level by integrating sourcing and planning. MOVING BEYOND SOURCING Companies that have become familiar using the technology for bid collection and analysis now realize that the software can be configured to solve any constraintdriven challenge. For example one of our customers is using the platform to define the manufacturing process of its products stage by stage. This customer has numerous manufacturing sites and even more assets at its disposal. In this case assets are manufacturing equipment that are owned and operated by external suppliers. Each asset has been qualified by the company to perform a certain operation for each product, so the challenge
J U LY / A U G U S T 2 013 | 45

S U PPLY CH A I N M A N AG E ME NT

Operational Constraints
Raw Material Cost Different sites use different raw materials for a product: The precise mix of raw materials can vary between sites for a product Production Cost External suppliers have contracted prices for each operation and product Supply Chain Costs Freight Warehousing/Inventory/Capital Taxes & Duty Initiative Costs Cost for changing/moving production Qualification cost for approving an asset to perform an additional operation

Figure 1: Asset optimization based on monthly forecasts for each product and market, taking into account operational constraints. is to optimize the manufacturing process to ensure that each product goes through the correct number of processes required to produce the finished product using only qualified assets while taking into account the various costs – raw material, production, transport, warehousing, inventory etc. It’s a simple concept and a classic optimization challenge, and it is made more complex by introducing further constraints. For example, it is possible to increase the number of
46 | A N A LY T I C S - M A G A Z I N E . O R G

operations individual assets can perform on different products, but this qualification process costs time and money and there is qualification budget that cannot be exceeded. To identify the most appropriate assets to use, the manufacturer optimizes its production based on monthly demand forecasts for each product and per market. It is an incredibly complex system in turns of data, but optimization transforms the data into tangible information that the
W W W. I N F O R M S . O R G

business uses to determine its day-today operations. And because the data is continually updated it essentially creates a dynamic model of the supply chain on which further analysis can be carried out. For example: What happens if there is natural disaster that completely closes site Y? What happens if there is a 15 percent wage increase in China? What happens if “Supplier x” goes bust? If the data is handled in the correct way, there are no limits on the “what if?” questions companies can ask so they can see the impact of any proposed changes before implementation. FLEXIBILITY CREATES COMPLEXITY The flexibility that is provided to organizations in terms of analysis creates its own problems, and a large proportion of the research that we are carrying out at the moment deals with improving our optimization software, both in terms of capability and user friendliness. There are many factors to consider. First of all, one must always show respect for complexity. The type of mathematical problems that need to be tackled are known in the scientific world as NPcomplete. Simply put, it is impossible to give any guarantees for how long they will take to solve. Therefore, we have carefully developed our skills and experiences of how to properly handle hard
A NA L Y T I C S

optimization problems in practice. In our experience, with a proper re-formulation or relaxation of the hardest problems, there is basically always a working solution available, with or without tweaking. As an example, consider the following business rule: “For each product, we want two suppliers, but no individual supplier should be awarded less than 20 percent.” This seems like a quite natural rule. Let us re-formulate it slightly: “For each product, no supplier is awarded more than 80 percent, and the total number of suppliers is at most two.” Are the two rules identical? No; there are some subtle differences. For example, if there is only one available supplier on one product, the first rule would create an infeasible problem, while the second would still be able to handle. But, more importantly, the difference in execution time on a solver may be very large when these rules are combined with other rules. And, by helping clients re-formulate rules as above, we can bring significant assistance in tackling the most challenging problem instances. Another example where much care is needed is related to numeric precision. It is not uncommon that very large numbers are mixed with very small numbers in the same sourcing/optimization project (e.g., when a retailer sources products where volumes differ by several orders
J U LY / A U G U S T 2 013 | 47

S U PPLY CH A I N M A N AG E ME NT

Figure 2: Optimization used in conjunction with large-scale data and effective reporting is transforming sourcing and moving into areas beyond supply chain planning and asset optimization. of magnitude between different product categories). However, the small numbers are just as significant as large numbers and they cannot be ignored. We also have to remember that we are working with people in sourcing departments and not computer scientists from academic institutions, and often users will create impossible or illogical queries to solve. Therefore, helping users to identify conflicting rules and constraints is of great importance. Not only may we face conflicting rules, but sometimes it may be very hard to understand why a particular solution is the optimal one. For example, we may ask ourselves why
48 | A N A LY T I C S - M A G A Z I N E . O R G

one particular supplier is not included in the optimal solution, and access to good automatic explanations is of vital importance. Such an explanation could be, “Supplier X not allocated because of Rule Y,” or “Not allocated because price is too high,” etc. Alongside the challenge of formulating the correct query is the practical problem of computing power. Because the queries are NP-complete problems and it is impossible to predict how long they will take to solve, they can tie up a significant amount of computer resources. The Trade Extensions platform solves this by dynamically allocating computer
W W W. I N F O R M S . O R G

resources over the cloud. While it’s commonplace for “the cloud” to be used for data storage, using it for data processing is still quite rare yet it allows an unlimited number of complex queries to be solved simultaneously. CONCLUSION Optimization is transforming sourcing and its influence on other areas of business is only going to increase. Data handling, equation definition, solvers and reporting are improving all the time so the

number of people and organizations able to access these incredibly sophisticated tools will grow and optimization applications will only be limited by individuals’ creativity. ❙
Arne Andersson co-founded Trade Extensions (www.tradeextensions.com) in June 2000. He is one of the world’s leading experts in designing algorithms for optimization, and he has published more than 50 articles in major scientific journals. In 2012 he became a member of the Royal Swedish Academy of Engineering Sciences. Previously, Andersson was a professor of computer science at Uppsala University, the oldest university in Sweden (founded in 1477) and one of the highest ranked universities in Europe.

A NA L Y T I C S

J U LY / A U G U S T 2 013

|

49

LOAD FO RE CA ST I N G

Utilities dust off the forecasting playbook
Smart grid data brings challenges and opportunities for power industry.

BY TAO HONG (LEFT) AND ALYSSA FARRELL
The age-old business of forecasting is once again a hot topic of conversation at utilities. As the business needs shift to a more proactive management style, analytics that give insight into the future – whether customer adoption of electric vehicles (EVs) over the next five years or tomorrow’s wind power generation – are in demand. For load forecasters specifically, the scrutiny is intensifying. Previously,
50 | A N A LY T I C S - M A G A Z I N E . O R G

T

utilities didn’t get many questions about the accuracy of their load forecasts during the regulatory rate case approval process. But now, new rate cases are harder and harder to approve. In this environment, utilities need more defensible forecasts to secure regulatory approval. Under pressure to demonstrate a return on smart grid investments, utilities are using the data they collect from smart meters and other smart grid devices to
W W W. I N F O R M S . O R G

Figure 1: Ten years of hourly electric load of a U.S. utility at the corporate level. As millions of smart meters are being installed, utilities will see more and more hourly or even sub-hourly load series at the household level. The data brings both challenges and opportunities to the utility industry. better understand customers, design demand response (DR) programs, make buying and selling decisions on the energy market, and increase the reliability of the grid. Forecasting plays a key role in each of these areas, from modeling future load growth to predicting the impact of DR. Forecasting is also becoming more critical to the operations of a utility because of the increasing penetration of distributed energy resources, EVs and energy-efficient appliances. Previously, when forecasting electricity demand, utilities didn’t have to worry about electric vehicles or solar panels on rooftops or wind farms because these technologies were not present in significant enough numbers to have any real effect. Now, however, they’re increasing in prevalence and therefore increasing the challenge of accurately forecasting electricity demand. Advanced Metering Infrastructure (AMI) is the primary technology that offers forecasters more timely and granular data for load analysis and forecasting. With AMI, the utility has two-way communication with the meter (electricity, water or gas), and it gets readings back in an automated fashion in real time, which
J U LY / A U G U S T 2 013 | 51

Increased use of solar panels makes forecasting electricity demand more difficult.
A NA L Y T I C S

LOAD FO RE CA ST I N G

Smart grid infographic takes into consideration emerging energy sources. means that all the data about energy consumption, down to the meter level, can be more granular than ever before. AN EXPANDING ROLE FOR UTILITY FORECASTERS For the vast majority of the electricity grid, energy consumption is mainly driven by weather, human activities and the interactions among those variables. In the past, if utilities could predict temperature and properly model seasonal behaviors, they would arrive at a pretty decent forecast. Now, utilities with renewable generation resources may need to forecast cloud cover or wind speed. For example, as cloud cover increases, solar photovoltaic output goes down. This means the net demand on the remaining system will increase under the same loading condition. The opposite is true for wind. As wind speed increases in a region, the output from wind farms increases and net demand on the system
W W W. I N F O R M S . O R G

Help Promote Analytics Magazine
It’s fast and it’s easy! Visit: http://analytics.informs.org/button.html

52

|

A N A LY T I C S - M A G A Z I N E . O R G

LOAD FO RE CA ST I N G

Figure 2: One week of solar generation (kW) at five-minute intervals. There is no solar generation at night. During the daytime, solar generation can be very volatile and difficult to predict. The utility industry needs advanced forecasting and optimization techniques to operate the power grid under reliability, economic and environmental constraints.

is reduced. Unfortunately, making predictions about cloud cover and wind speed and direction is significantly more challenging than predicting temperature. The high volatility of wind and solar makes today’s load forecasting much more complicated than before. In addition, EV charging is quite difficult to model. If EV owners regularly charge their batteries in the evening hours then that would be a predictable load to forecast, but human behavior is erratic. We come home early some days, stay late, go out for dinner, work from home, etc. The volatility in demand that is introduced by these new technologies is putting new pressures on utility forecasters.
54 | A N A LY T I C S - M A G A Z I N E . O R G

BRIDGING THE CULTURAL AND TECHNICAL DIVIDE As a fundamental problem in the utility industry, forecasting finds its applications across several departments of a utility, such as planning, operations, marketing and customer services. Many utility forecasting teams are siloed, sitting in different departments. Some utilities have an analytics center of excellence that serves multiple business needs. When they are centralized, these resources communicate better with each other and build collaborative forecasts that tend to have higher overall accuracy. If they are siloed, the consistency and quality of the data is sometimes sporadic.
W W W. I N F O R M S . O R G

LOAD FO RE CA ST I N G

Siloed forecasting teams may use different data, customized tools and have access to less computing power than if they were centralized. Just like organizational differences, the business pressures faced by each utility are also unique. Large utilities tend to feel the pain of renewable and distributed energy resources more than smaller utilities. For example, several utilities in California provide power to urban areas that have high penetrations of renewable energy. On the other hand, municipals and co-ops also care about improving their forecasting processes because many of them deployed smart meters even before the larger investor-owned utilities. Because municipal utilities and cooperatives are city- or member-owned, they have the incentive to understand their customers better so that they can more accurately contract the right amount of power to meet demand. When they do this well, they can pass on the savings directly to their customers. Old Dominion Electric Cooperative (ODEC) credits advanced forecasting capabilities with enabling four rate decreases in just one year [1].

UTILITY FORECASTING KEYS TO SUCCESS Because forecasts are having an increasingly significant impact on business decisions, it is important to highlight several keys to success. One of the authors (Tao Hong) discussed three skills of the ideal energy forecaster in his blog [2]. First, forecasters need to maintain a close relationship with the business. The forecast provides no value unless people on the business side know how to use it. In addition, forecasters need broad analytical skills to understand basic statistics, and they need technical skills to master the tool set available to them. Finally, but most importantly, forecasters need to be honest and true to their forecasting methodology and not allow themselves to be swayed by internal politics. Forecasting results should be data-driven, not tweaked to meet some personal agenda. To improve energy forecasting in the future, each utility needs centralized forecasting teams that provide analytical services for most of the business units across the utility. A team consists of people with diverse backgrounds, including electrical engineers, economists, statisticians, meteorologists, social scientists, operations research specialists, information management specialists, software programmers and business liaisons. Ideally, these people with their diverse skill
W W W. I N F O R M S . O R G

Subscribe to Analytics
It’s fast, it’s easy and it’s FREE! Just visit: http://analytics.informs.org/

56

|

A N A LY T I C S - M A G A Z I N E . O R G

sets all have access to quality data and are not technology constrained so they can perform complex calculations across many models in a very short time frame. They have rigorous, traceable forecasts, and comprehensive documentation. By working closely with the business side, the liaisons within the forecasting team help them improve data-driven decisionmaking. ❙
Tao Hong is the head of Energy Forecasting at SAS, where he oversees research and development, consulting, education, marketing

and sales of SAS solutions for energy forecasting. He is the author of the blog Energy Forecasting (http://blog.drhongtao.com). He is also the chair of IEEE Working Group on Energy Forecasting, general chair of the Global Energy Forecasting Competition and an adjunct instructor at the Institute of Advanced Analytics at North Carolina State University. Alyssa Farrell leads global industry marketing for SAS’ business within the energy sector, including utilities, oil and gas. She also has responsibility for SAS’ Sustainability Solutions and works with customers around the world to understand best practices and solutions for managing their business with environmental responsibility in mind. She participates in the Green Tech Council of the North Carolina Technology Association.

A NA L Y T I C S

J U LY / A U G U S T 2 013

|

57

COM BAT IN G C O R PO RAT E F RAU D

Forensic Analytics
Adapting to a growing pandemic.

BY PRITI RAVI
Corporate fraud has maniservices using phantom clinics. fested itself in diverse forms • Epsilon, the world’s largest provider repeatedly around the of permission-based e-mail world. From identify thefts marketing, announced that millions and insider trading to more sophisticated of individual e-mail addresses were e-crimes and misrepresentation of finanexposed in an attack on its servers cial information, the spectrum of frauduaffecting a large number of brands lent activities is huge and hence a tough on whose behalf Epsilon sends challenge to overcome. Today, even as marketing e-mails to customers. corporations ramp up fraud-detection ef- • CC Avenue, an Indian firm that forts, the incidence of various frauds has validates payments made over been on the rise. certain e-commerce websites, faced Consider these recent instances: charge-backs from a number of • Medicare made about $35 million customers for e-transactions that they worth of payouts to an organized apparently did not make when a Web group of more than 50 people who service provider posed as both the allegedly stole personal information, seller and the buyer by using credit including Social Security numbers of card information he had pilfered 2,900 Medicare patients and billed to make purchases from his own Medicare for unperformed medical website.
58 | A N A LY T I C S - M A G A Z I N E . O R G W W W. I N F O R M S . O R G

C

Figure 1: Frauds in the Indian Banking sector. Source: RBI Annual report 2010 While such experiences may have induced a sense of urgency in many organizations to establish a basic framework for fraud management, without a sophisticated and intelligent monitoring and fraud detection system in place, most corporations are still struggling to cope with fraud detection as a discipline. Corporate fraud is a multi-industry global phenomenon. One in five companies in Western Europe highlighted a significant increase in fraud in 2010. A similar pattern was observed in Latin America and the Middle East and Africa [1]. On the other side of the world, Asia is reported to have the highest number of employees who do not know what to classify as misconduct [2], clearly deterring the expansion plans of Western firms
A NA L Y T I C S

into the emerging markets. Further, the 2010 annual report of the Reserve Bank of India indicates a near doubling in the two yearly average of the cost of fraud incurred between 2007-2008 and 20092010 (Figure 1). FRAUD DETECTION TECHNIQUES NEED TO EVOLVE With corporate frauds estimated to cost 5 percent of global revenue every year [3], fraud detection has been on the radar of companies over the last few years. However, certain myths regarding fraud detection have also become pervasive, and they are often cited as grounds for not deviating from traditional fraud detection. The first step toward managing frauds is to debunk such myths.
J U LY / A U G U S T 2 013 | 59

FORE N S IC A NA LY T I C S

Figure 2: Corporate fraud is widespread. Source: Kroll Global Fraud report 2010 Myth 1: Fraud detection needs investments only in risk-prone disciplines such as financial accounting. Some of the biggest scams that shook the world – the Enron scandal, WorldCom bankruptcy and the Barings PLC collapse – are certainly financial in nature. However, the incidence of some lesserknown, non-financial schemes such as supply chain frauds and e-crimes have subjected various firms to huge financial losses, making them equally grave concern areas. Myth 2: Internal audits and whistleblower policies are adequate fraud detection techniques. Approximately 50 percent of firms surveyed by KPMG in 2010 indicate that they rely on internal audits to detect fraud while about 25 percent indicate that they rely on tips/whistleblowers. However, the use of more sophisticated techniques such as data analytics can help detect frauds faster. For instance, the use of Link Analysis, a technique that identifies the connections and network of a

Request a no-obligation INFORMS Member Benefits Packet
For more information, visit: http://www.informs.org/Membership

60

|

A N A LY T I C S - M A G A Z I N E . O R G

W W W. I N F O R M S . O R G

fraudster, could have helped detect the 50-person Medicare fraud described earlier. Similarly, a Web server survey [4] could have helped brands minimize their losses from Epsilon data theft, while an anomaly based machine-learning system could have helped detect the credit card fraud faced by CC Avenue. Myth 3: Fraud prevention through efficient security measures renders fraud detection unnecessary. Although prevention techniques such as holographs on banknotes, Internet security systems for credit card transactions and subscriber identity module cards for mobile phones or predictive analytic techniques (such as profiling potential fraudsters) are leading methods used to contain fraud, fraudsters are increasingly adaptive. Analytics-based fraud detection techniques can identify frauds that have passed through the prevention system.

USING FORENSIC ANALYTICS TO DETECT CORPORATE CRIMES The process of fraud management goes beyond fraud detection as illustrated in Figure 3. In a typical fraud management system, once a fraud is detected, suitable alarms are raised that are then scrutinized to confirm the incidence of a fraud before any further action for resolution is warranted. Forensic analytics, on the other hand, encapsulates a diverse set of techniques used to identify data-based anomalies and to use such outlier trends to detect/ predict the occurrence of frauds. Although a subset of the analytics discipline, forensic analytics differs from general analytics in the following ways: • Forensic analytics is extremely data heavy – it needs to learn from every fraudulent and regular (nonfraudulent) activity and hence cannot use a sample of data as general analytics does.

Figure 3: Process of fraud management.
A NA L Y T I C S J U LY / A U G U S T 2 013 | 61

FORE N S IC A NA LY T I C S

• Forensic analytics requires human intervention – the cost of a misclassified fraud and the investigation therein is extremely high in most industries, and hence an alarm raised by forensic analytics is usually subjected to further human scrutiny and resolution. FORENSIC ANALYTIC TECHNIQUES Although statistical techniques for forensic analytics are varied, they have a common theme: comparing the observed values with expected values. The expected values, in turn, could be derived using multiple techniques – starting from simple numerical summaries (graphical summaries) to more sophisticated behavior profiling or anomaly based modeling techniques to obtain suspicion scores. Statistical tools for fraud detection can either be supervised or unsupervised. Supervised methods use both fraudulent and non-fraudulent data records to construct models, while unsupervised methods use only outlier (potentially fraudulent) records that could be further analyzed closely. CONCLUSION With an increase in the number of fraudulent activities in the recent past,
62 | A N A LY T I C S - M A G A Z I N E . O R G

a robust fraud management system is increasingly being seen as a musthave across the globe and across industries. Forensic analytics offers a collective set of techniques to make data-driven decisions to combat fraud. Ranging from simple rule-based techniques to complex self-learning and predictive algorithms such as neural networks, forensic analytics can be used for both prevention and detection of various types of frauds. It is a complex and adaptive approach, which could well become the norm in fraud management in the coming decade. ❙
Priti Ravi is a senior manager with Mu Sigma, specializing in providing analytics-driven advisory services to some of the largest retail, pharmaceutical and technology clients spread across the United States. She has more than eight years of experience in the corporate sector. Ravi completed the Post Graduate Programme in Management from the Indian School of Business, specializing in marketing and finance.

REFERENCES
1. Ernst and Young 11th Global fraud survey, 2009-10. 2. CEB’s Compliance and Ethics Leadership Council, 2009. 3. ACFE 2010 Global Fraud Survey. 4. a service provided by vendors with access to host names, domain names and first page content of websites that can check for the occurrence of the brand’s trademarks or commonly used phrases. 5. RBI Annual Report, 2010. 6. Kroll Global Fraud Report, 2010. 7. Deloitte Airline Fraud Report, 2010.

W W W. I N F O R M S . O R G

SOFT S K IL L S

Fundamental principles of analytic communication
BY EVAN S. LEVINE
64 | A N A LY T I C S - M A G A Z I N E . O R G W W W. I N F O R M S . O R G

Analytic ideas and findings are often surprising, subtle and technically complex. These qualities can make them challenging to communicate, regardless of the audience. On the other hand, analysts have a great deal of freedom over the manner in which they communicate ideas and findings – some overarching, general principles can help analysts make decisions in this regard. These sorts of principles are useful because communication advice for analysts is fragmented, primarily by medium. We use data visualization books to help us build plots, slide construction guides to help us build presentations, and Web manuals to help us build Websites. The advice specific to these media is very useful, but establishing overarching principles helps analysts make decisions regarding how to organize communication materials by keeping a small set of objectives in mind. Four principles apply to all analytic communication, regardless of audience or medium. These principles are: clarity, transparency, integrity and humility. (A similar list of principles for excellence in analytic communication appears in Markel (2012 [1]). Whenever you are faced with a design decision for a communication product, return to these principles and they will guide you to a good solution.
A NA L Y T I C S

A

An alternative frame for these principles is to think of them as fundamental objectives (as the term is used in decision analysis) for the analytic communication process. Some alternatives will solely impact one of the objectives; for example, sometimes an analyst can improve the clarity of a plot by changing the color that the lines are drawn with. On the other hand, sometimes alternatives will involve tradeoffs between the objectives; those decisions are generally more difficult, and which alternative is preferred can depend on the audience or the medium. Let’s discuss each of the principles in more depth. CLARITY Clarity is the expression of an analytic concept in a manner that is simple, direct, efficient and effective. Extraneous lines, words, colors and other markings are minimized so the key idea can be placed front and center. At the same time, the concept must be expressed in a manner that is understandable and not oversimplified; minimization should not be taken so far that the finding disappears or key data features are lost. Consider how experts on analytic communication in various media make recommendations that maximize clarity: • In data visualization, clarity is exemplified by Tufte’s first two
J U LY / A U G U S T 2 013 | 65

ANALY TIC C OM M U N I CAT I O N

When communicating analytic ideas and findings, clarity means that you should maximize efficiency, whether measured through words, lines or colors, while still conveying your thoughts forcefully, definitively and, most importantly, understandably.

principles in the theory of data graphics (Tufte 2001 [2]) – “above all else show the data” and “maximize the data-ink ratio.” In other words, when making a data visualization, don’t add tick marks, gridlines or decorative elements unless they actually convey information. At the same time, don’t eliminate any markings that impart information about the data. • Iliinsky and Steele (2011 [3]), in their book on data visualization, are expressing the desire for clarity when they recommend “function first, suave second.” (Personally, I would put suave even lower on the list.) • In his guide to slide presentations, Reynolds describes the Zen concept of simplicity as one of his guiding principles (Reynolds 2008 [4]). Reynolds’ simplicity is similar to what I’ve called clarity, as evidenced by his advice that “simplicity can be obtained through the careful reduction of the nonessential,” as long as it also “gets to the essence of an issue.” • In their classic book on style in writing, Strunk and White (1959 [5]) stress the importance of clarity by emphasizing the repercussions when an author fails to achieve it: “Muddiness is not merely a destroyer of prose, it is also a destroyer of life, of hope: death on the highway caused by a badly worded road sign, heartbreak among lovers caused by a misplaced phrase in a wellintentioned letter, anguish of a traveler expecting to be met at a railroad station and not being met because of a slipshod telegram. Think of the tragedies that are rooted in ambiguity, and be clear!”
W W W. I N F O R M S . O R G

66

|

A N A LY T I C S - M A G A Z I N E . O R G

• One of the rules of journalism is not to “bury the lead (intro).” The writer should put the most important finding at the front of the story, stated directly. If the important finding is placed deeper in the story, the audience is more likely to miss it. In summary, when communicating analytic ideas and findings, clarity means that you should maximize efficiency, whether measured through words, lines or colors, while still

conveying your thoughts forcefully, definitively and, most importantly, understandably. (Scientists can think about clarity as maximizing the signal-tonoise ratio of communication.) TRANSPARENCY Transparent analytic communication explains to the audience the method by which the findings were derived, accessibly and to an appropriate depth. In addition to presenting the methodology, part of transparency is ensuring that the

A NA L Y T I C S

J U LY / A U G U S T 2 013

|

67

ANALY TIC C OM M U N I CAT I O N

audience understands the assumptions that underlie the analysis. This is important because if the assumptions are violated (whether through further study, natural change or some other means), or even if the audience just doesn’t accept the assumptions, there will be implications for the findings. Transparency is most appropriately applied to the entirety of an analytic communication package, as opposed, for example, to a single plot inside of a technical document. Some examples of transparency in action include: • A journal article that describes the methodology behind an interesting result. In many fields, the author of an article is required to publish his or her methods to the level of detail required for another researcher to replicate every step. • A financial analyst making a presentation who discloses his data sources and the techniques by which he processed them. • An analyst who publishes on the Web the raw documents and code used in his new text mining method. • A scientist speaking to the general public who, given the limited time allotted for her presentation, refers the audience to a freely available white paper for those who would like more technical detail.
68 | A N A LY T I C S - M A G A Z I N E . O R G

Why is it important for analytic communication to be transparent? Shouldn’t an analyst only care that the findings are communicated correctly? First of all, one of the benefits of doing analysis is that there is a logical line of reasoning that leads to a finding; analysts don’t need to rely on the assertion of findings without support. This gives analytics a competitive advantage versus other types of arguments, such as “gut-based” reasoning or subject matter expertise, and that advantage should only be squandered for very good reason. In other words, transparency builds the audience’s confidence in the findings. Secondly, part of our responsibility as analysts is to expose our line of reasoning to questions and comments. Sometimes this feedback reveals errors, oversights or forgotten assumptions. These corrections should be welcomed, because in the long run they result in better analysis. In most cases, however, an analyst will have a ready answer to a question or a comment because he or she has spent more time thinking about the data and findings than the audience. Answering questions directly also increases the audience’s confidence in the findings. Finally, transparency helps to spur on other analysis. This occurs because revealing the methodology behind a finding can give a member of the audience an idea to
W W W. I N F O R M S . O R G

solve a problem they’ve encountered or, even if transparency doesn’t spark an immediate idea, it can add another tool to the analytic toolbox of the audience members. The audience can also more easily recognize other opportunities to apply analytics, sometimes bringing business and collaborative opportunities to the analyst. The transparency communication objective demonstrates a benefit of keeping the line of reasoning as simple as possible – simple methodologies are easier to explain, and if you can’t explain the methodology in

a way that the audience will understand, few people will believe the findings. INTEGRITY Analytic communication with integrity conveys defensible and sound conclusions driven by the data, while properly representing the limitations and uncertainty in the findings. As analysts, our first responsibility is to ensure that we are communicating the data and findings accurately and honestly. However, it can be tempting to

A NA L Y T I C S

J U LY / A U G U S T 2 013

|

69

ANALY TIC C OM M U N I CAT I O N

exaggerate the implications of the data, because, in all likelihood, no one will look at your data as thoroughly as you will. Analysts engage in exaggeration for many different reasons, whether to further their career, to please the audience or simply to make a stronger argument than the data support. It is important to understand that this temptation is common; as analysts, we spend inordinate amounts of time and energy focused on our work and to be tempted by a larger reward is entirely natural. However, it is counterproductive to engage in this kind of overreach, whatever the reasoning behind it. In the long run this kind of behavior will have a negative effect on your career, particularly in the opinion of other analysts. Additionally, as analysts, we study real phenomena, and our techniques are designed to reveal real insight regarding these phenomena. Even if your colleagues can’t tell that you’ve gone too far, eventually the phenomena you are studying will show the truth. In addition to communicating the data and findings accurately, the presentation of limitations and uncertainty in analytic communication is integral to integrity. This information allows the audience to use the findings responsibly – if limitations and uncertainty are not presented or are minimized, the audience is likely to apply the findings in regimes beyond which they are valid. It also
70 | A N A LY T I C S - M A G A Z I N E . O R G

facilitates comparisons between different analysts’ findings. Integrity is connected to the concept of “epistemological modesty,” a complicated sounding phrase that describes a simple idea. Roughly, analysts that demonstrate epistemological modesty do not overstate the findings and certainty of their work because they recognize that the real world is quite complex and often difficult to understand and model. Analysis can break down in very surprising ways even if you’ve carefully accounted for the known sources of uncertainty. Keep this in mind when communicating findings. A good example of the concept of integrity in action can be found in data visualization. When making plots, it is easy to exaggerate the trend you are trying to demonstrate by adjusting the axes in improper ways or by showing only selected subsets of the data. (This behavior is common in situations where the analysis was carried out to support a predetermined position – it’s often seen in politics.) Tufte (2001, [2]) expresses integrity by arguing that “graphical excellence begins with telling the truth about the data.” The analyst should present the data in such a way that the audience leaves with an accurate and complete impression. HUMILITY By humility in analytic communication, I mean that we should strive to remove
W W W. I N F O R M S . O R G

the analyst from the message. In writing, Strunk and White (1959, [5]) recommend that authors, “Place yourself in the background. Write in a way that draws the reader’s attention to the sense and substance of the writing, rather than to the mood and temper of the author.” In analytic communication, too often the audience takes away the idea that the analyst is some kind of super-genius, that analytical work is inaccessible, or that they could never carry out their own analyses. These perceptions are detrimental to the future of our profession; analytics is a young field and in order to grow we need to attract people and business by making ourselves and our work as accessible as possible. Furthermore, the data and the conclusions drawn from it should speak for themselves – if you find yourself needing to rely on your authority as an analyst, that’s a sign that you may be overreaching. We can communicate with humility by not encouraging a “cult of personality” around the analyst. For example, you can talk about mistakes that you made in the initial pass through the analysis or ways you feel the findings are difficult for you to understand. Discussing these sorts of things won’t hurt the audience’s opinion of you; in fact, it will actually improve it, because they will find you more relatable. Furthermore, they’ll also think you are smart, in the way that we think great teachers are smart
A NA L Y T I C S

– good analytic communication requires a great deal of intelligence! CONCLUSION These basic principles can help guide our decision-making when it comes to communicating analytics. However, I don’t to imply that there is one right answer to communication decisions. Even with the constraints imposed by the principles there is still plenty of room for individual style, unique voices and elegant solutions. ❙
Evan S. Levine ([email protected]) is the lead for analytics at the New York City Police Department’s Counterterrorism Bureau. Previously, he served as chief scientist at the Department of Homeland Security’s Office of Risk Management and Analysis. Levine is the author of the textbook “Applying Analytics: A Practical Introduction,” forthcoming from Taylor & Francis. This article is an excerpt from the book and is reprinted with permission. The views expressed in the article are those of the author and do not necessarily represent those of his employer. Levine is a member of INFORMS.

REFERENCES
1. Markel, Mike, 2012, “Technical Communication,” Bedford/St. Martin’s, Boston, Mass. 2. Tufte, Edward R., 2001, “The Visual Display of Quantitative Information,” Graphics Press, Cheshire, Conn. 3. Lipinski, Noah, and Steele, Julie, 2011, “Designing Data Visualizations,” O’Reilly Media, Sebastopol, Calif. 4. Reynolds, Garr, 2008, “Presentation Zen: Simple Ideas on Presentation Design and Delivery,” New Riders, Berkeley, Calif. 5. Strunk, William and White, E.B, 1959, “The elements of style,” Macmillan Company, New York,. N.Y.

J U LY / A U G U S T 2 013

|

71

LEA RN IN G R E SO U RC E S

INFORMS’ Library of Audio and Video Presentations
SCIENCE OF BETTER: PODCASTS Gain insights from experts on how math, analytics, and operations research affect organizations like yours in these 20-30 minute podcasts conducted by INFORMS Director of Communications Barry List. Visit www.scienceofbetter.org/ podcast. Brian Keller, Booz Allen Hamilton Hadoop Anyone? Recorded May 24, 2013 Don Kleinmuntz, Strata Decision Technology Healthcare analytics: Hospitals and Obamacare Recorded May 2, 2013 Carrie Beam, Carrie Beam Consulting Soft Skills for Lone Wolves Recorded April 19, 2013 Atanu Basu, Ayata 5 Pillars of Prescriptive Analytics Recorded April 5, 2013
72 | A N A LY T I C S - M A G A Z I N E . O R G

BARRY LIST

Stephen Budiansky, Author Blackett: WWII’s Indispensable Man Recorded March 21, 2013 Matthew Liberatore and Wenhong Luo, Villanova University Bright Contrast in Roles of OR and Analytics Recorded March 12, 2013 Brett R. Gordon, Columbia University & Wesley R. Hartmann Advertising & Presidential Campaigns Recorded February 22, 2013 Michael Gualtieri, Forrester Research Forrester on Predictive Analytics Solutions Recorded February 7, 2013 Maksim Tsvetovat, Deepmile Networks & George Mason University Sentiment Analytics: What Does the Blogosphere Think? Recorded January 25, 2013

W W W. I N F O R M S . O R G

Ralph Keeney, Duke University Brainstorming with Ralph Keeney Recorded January 10, 2013 Gary Lilien, Penn State University Marketing Analytics: A Must for Retailers and Manufacturers Recorded November 28, 2012 Michael Schroeck , IBM Big Data: Extracting the Value Recorded November 11, 2012 Arnold Barnett, MIT’s Sloan School Terror Goes to Ground Recorded October 24, 2012 Allan Lichtman, American University and Sheldon Jacobson, University of Illinois Forecasting the U.S. Presidential Election Recorded September 21, 2012 Michael Fry, University of Cincinnati & Jeffrey Ohlmann, University of Iowa More than Moneyball Recorded July 6, 2012 Chrysanthos Dellarocas, Boston University The Pay per Click Paradox Recorded June 8, 2012

Gary Cokins, SAS consultant Mystery of Dying Industry Giants Recorded May 24, 2012 Wally Hopp and Roman Kapuscinski Does American Manufacturing Have a Future? Recoded May 11, 2012 U.S. Army Major Rob Dees Measure of a Soldier Recorded April 27, 2012 Theresa Kushner, the Senior Director of Customer Intelligence at Cisco Marketing Analytics at Cisco Recorded March 30, 2012 Sheldon Jacobson, University of Illinois Urbana-Champaign March Madness O.R. Style Recorded March 16, 2012 Renee Adams, University of New South Wales and Patricia Funk, Universitat Pompeau Fabra and Barcelona Graduate School of Economics Beyond the Glass Ceiling Recorded March 2, 2012

Plus Archival Podcasts back to 2009

A NA L Y T I C S

J U LY / A U G U S T 2 013

|

73

LEA RN IN G R E SO U RC E S

INFORMS VIDEO LEARNING CENTER View these free, on-demand presentations, complete with slides, from INFORMS renowned meetings and conferences. Visit https://www.informs.org/Apply-Operations-Research-and-Analytics/ INFORMS-Video-Learning-Center INFORMS ANALYTICS CONFERENCE 2013 2013 Edelman Award Presentations

Keynote Presentations Betting the Company: The Role of Analytics by Jerry Allyne, Boeing Commercial Airplanes Achieving Social Success: How Data and Analytics Guide the Social Business Journey by Sandy Carter, IBM Analytics Process Presentations

Dutch Delta Commissioners: Economically Efficient Standards to Protect the Netherlands against Flooding Operations Research Transforms Baosteel’s Operations Optimizing Chevron’s Refineries Dell’s Channel Transformation – Leveraging Operations Research to Unleash Potential across the Value Chain Kroger Uses Simulation – Optimization to Improve Pharmacy Inventory Management McKesson: A Holistic Supply Chain Management Solution

Effective Use of Business Analytics by Kathy Lange, SAS Hospital Business Analytics in an Era of Healthcare Reform by Don Kleinmuntz, Strata Decision Technology Stop Sacred Cows before they Stop Analytics! By Jake Breeden, Breeden Ideas Optimizing as if People Matter by Steve Sashihara, Princeton Consultants Integrated Analytics in Transportation and Logistics by Ted Gifford, Schneider National

74

|

A N A LY T I C S - M A G A Z I N E . O R G

W W W. I N F O R M S . O R G

INFORMS ANALYTICS CONFERENCE 2012 2012 Edelman Award Presentations Chris Goossens, TNT; Hein Fleuren, Tilburg University; Davide Chiotti, TNT; Marco Hendriks, TNT Supply Chain-Wide Optimization at TNT Express Frederic Deschamps, Carlson; Pelin Pekgun, JDA Software; Suresh Acharya, JDA Software; Kathleen Mallery, Carlson Carlson Rezidor Hotel Group Maximizes Revenue through Improved Demand Management and Price Optimization Greg Burel, CDC; Eva K. Lee, Georgia Institute of Technology Centers for Disease Control and Prevention: Advancing Public Health and Medical Preparedness with Operations Research Sofia Archontaki, Danaos; Takis Varelas, Danaos; Iraklis Lazakis, Danaos; Evangelos Chatzis, Danaos Operations Research in Ship Management: Maximizing Fleet-Wide Revenue Routing at Danaos

Suresh Subramanian, HP; Prasanna Dhore, HP; Girish Srinivasan, HP; David Hill, HP Hewlett-Packard: Transformation of HP’s Business Model through Advanced Analytics and Operations Research Karl Kempf, Intel; Feryal Erhun, Stanford University; Robert Bruck, Intel Optimizing Capital Investment Decisions at Intel Corporation Panel Discussion: Diego Klabjan, Northwestern University; Thomas Olavson, Google; Blake Johnson, Stanford University; Daniel Graham, Teradata; Michael Zeller, CEO, Zementis, Inc. Innovation and Big Data: Panel Discussion

Plus Archival Videos of Other Great Talks
INFORMS Annual Meeting 2012 INFORMS Annual Meeting 2011 INFORMS Analytics Conference 2011 INFORMS Annual Meeting 2010 INFORMS Practice Conference 2010 INFORMS Annual Meeting 2009

A NA L Y T I C S

J U LY / A U G U S T 2 013

|

75

FIVE- M IN U T E A N A LYST

Carnival Game
A man is determined to win a video game console by playing a carnival game where balls are tossed into tubs. He did not win; in fact, he lost around $2,600. There are (at least) three subproblems to consider.
I was captivated by this recent headline: “Man loses life savings, wins giant banana” [1]. After disbelief subsided, I wondered – as is my habit – what five minutes of analysis might tell us. In the news story, a man is determined to win a video game console by playing a carnival game where balls are tossed into tubs. He did not win; in fact, he lost around $2,600. There are (at least) three subproblems to consider: 1. How much to wager: the player’s point of view. Given that the game console was worth approximately $100 and not unique to the carnival, we quickly conclude that if the object was to obtain a game console, the best course of action may be to not go to the carnival at all. In order to assess whether or not to play, we would have to know what our probability of success in a carnival game. While this is something that we do not know, we may make a few basic assumptions. Suppose that the wholesale value of the prize is $100. If the game costs $5 to play, we can be almost certain that an individual “on the street” has less than a 5 percent chance of winning on a single try – otherwise the “house” would never make a profit. In order to make a reasonable profit, the real odds of the game are probably lower, in the neighborhood of 1 percent. We assure ourselves, saying that we are better at tubs than the average person. If this were really the case, we should play the game some predetermined number of times, with predetermined maximum losses (10 at a cost of $50 seems reasonable) and
W W W. I N F O R M S . O R G

BY HARRISON SCHRAMM

76

|

A N A LY T I C S - M A G A Z I N E . O R G

simply walk away when we reach our predetermined threshold. Human beings are notoriously bad at walking away. The reason is because we (mis-) Figure 1: A giant Rasta count the money banana. spent as “invested” instead of properly “lost.” The more we play and are unsuccessful, the more evidence we gather that we are bad at the game. For example, if we have tossed the ring twice and missed both times, I would have a point estimate of 0 percent probability of success, but could construct a 95 percent confidence interval of my true probability of success as the interval 0-75 percent. If we have tossed the ring 20 times and missed, our confidence has now shrunk to 12 percent. 2. How much to allow wagered: The carnival’s point of view. The house
A NA L Y T I C S

could simply allow a player to lose as much money as he wishes. The longer the game goes on, the more money the carnival makes. There are two reasons why this “greedy approach” – both mathematically and in the usual sense – may not be optimal. a. If the player becomes very upset he could bring unwanted things outside of the wager into play. In this case it was law enforcement, but we might imagine a rough cousin. b. If we take too much of a player’s money at once, he or she will never come back. If we space out their losses over time, and allow them to win every so often, then they may lose more in the long run.

Figure 2: Exceptionally optimistic (95 percent upper bound) estimate of success as a function of the number of unsuccessful trials. The longer we play without success, the more certain we should be that we aren’t very good at carnival games! This chart was computed by solving: 1 – (1 – p)N = B , i.e. p = 1 – eln(1 – B) / N.
J U LY / A U G U S T 2 013 | 77

FIVE- M IN U T E A N A LYST

Both points reflect that a carnival is different than a casino. For one thing, an individual losing $2,600 at a casino is not newsworthy! People find the idea of large losses in a gambling house more palatable than at a carnival. For a carnival, it would seem that the maximum amount to take from a person would be some small multiple of the maximum prize, in this case $200 to $300 (and throw in the prize to boot).
3. How to lose all of your money at the carnival (or anywhere else). Reason that the probability of winning the prize is non-zero (it is), then they may use the properties of the geometric random variable to conclude that given enough tries, you will certainly win the prize. Armed with this (mis-)information, increase the stakes “double or nothing” until your eventual win.

work, the player needs to be relatively confident that he will be successful on at least one toss before losing it all. In order to have a 50 percent chance that he would be successful at least once in three rolls, the player would need to be approximately 20 percent certain that he would be successful on any one roll. Not comforting, given that his point estimate of winning when he began playing “Double or Nothing” is zero. “Double or Nothing” is a very good way to lose all of your money very fast. Bonus footnote: When I was young, there was a pizza establishment in my hometown that had a video game where for 50 cents, you could win prizes, the “grand prize” being lunch. I had a high school classmate who through several weeks of effort learned the game well enough that he could win lunch every day. He got lunch for 50 cents each day for about a week before the proprietors grew tired and removed the game! ❙
Harrison Schramm ([email protected]) is an operations research professional in the Washington, D.C., area, and he is a member of INFORMS. REFERENCES
1. http://newsfeed.time.com/2013/05/01/manloses-life-savings-playing-carnival-game-winsgiant-banana/ 2. A very nice explanation involving Casanova can be found on p. 333 of “Probability and Random Processes” by Grimmett and Stiraker, Oxford University Press, 2001.

The reason this does not work is that the player has not considered how much he will lose before his eventual win. This result is true in general [2]. Consider our contestant lost $300, then lost the rest in a few rounds of “double or nothing’; in this case, three. In order for “double or nothing” to be a viable gambling strategy, the cumulative distribution of winning has to grow faster than you are hemorrhaging money, and your losses are exponential in that they grow like 2N. At the carnival, for “double or nothing” to
78 | A N A LY T I C S - M A G A Z I N E . O R G

W W W. I N F O R M S . O R G

THIN K IN G A N A LY T I CA LLY

Self-driving car
Self-driving cars are cars that can drive themselves without a human behind the wheel. This technology should be available in the not-too-distant future. New algorithms will need to be developed to help route these cars to get their passengers to their desired destinations efficiently. Figure 1 shows 10 people in need of transportation. Their current location (the pick-up point) is indicated by the person icon and their desired destination (the dropoff location) is indicated by the building icon. The purple arrow indicates the path from the pickup location to the drop-off location. Your job is to order the passengers so that they are picked up in an order that minimizes the total distance travelled by the self-driving car. The car can start at any pickup point. You may only carry one person at a time. The car does not need to return to its starting point after the last person is dropped off. There is only one self-driving car available for use. Use Pythagorean theorem to calculate the distance between cells. For example, the distance between the person nearest the lower left corner and her drop-off point is 3.162 km. QUESTION: What is the minimum distance the car must travel in order to transport all of the passengers from their pick-up points to their drop-off points? Send your answer to [email protected] by Sept. 15. The winner, chosen randomly from correct answers, will receive a Magic 8 Ball. Past questions can be found at puzzlor.com. ❙
W W W. I N F O R M S . O R G

Figure 1: How far must the car travel to get everyone where they want to be?

BY JOHN TOCZEK
John Toczek is the director of Decision Support and Analytics for ARAMARK Corporation in the Global Risk Management group. He earned his bachelor’s of science degree in chemical engineering at Drexel University (1996) and his master’s degree in operations research from Virginia Commonwealth University (2005). He is a member of INFORMS.

80

|

A N A LY T I C S - M A G A Z I N E . O R G

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close