Mark Twain famously said, “There are three kinds of lies: lies, damned lies and statistics.” Statistics is now helping slice thru the data deluge andpowering the field of big data, data sciences or analytics.
The goal of Analytics = [Better outcomes, smarter decisions, actionable insights, relevant information]. How you get there varies.
Predictive analytics allows us to turn data into valuable, actionable information. Three basic cornerstones of predictive analytics are:
Decision Analysis and Optimization
Predictive modeling identifies and mathematically represents underlying relationships in historical data in order to explain the data and make predictions, forecasts or classifications about future events.
Predictive models typically analyze current and historical data on individuals to produce easily understood metrics such as scores. These scores rank-order individuals by likely future performance, e.g., their likelihood of making credit payments on time, or of responding to a particular offer for services.
Predictive models can also detect the likelihood of a transaction being fraudulent (Risk Detection). Predictive models are frequently operationalized in mission-critical transactional systems and drive decisions and actions in near real time.
A number of analytic methodologies underlie solutions in this area including:
Applications of both linear and nonlinear mathematical programming algorithms, in which one objective is optimized within a set of constraints,
Advanced “neural” systems, which learn complex patterns from large data sets to predict the probability that a new individual will exhibit certain behaviors of business interest.
Statistical techniques for analysis and pattern detection within large datasets.
Predictive models summarize large quantities of data to amplify its value.
Decision Analysis and Optimization
Decision analysis refers to the broad quantitative field that deals with modeling, analyzing and optimizing decisions made by individuals, groups and organizations.
Whereas predictive models analyze multiple aspects of individual behavior to forecast future behavior, decision analysis analyzes multiple aspects of a given decision to identify the most effective action to take to reach a desired result.
Integrated approaches to decision analysis incorporate the development of a decision model that mathematically maps the entire decision structure; proprietary optimization technology that identifies the most effective strategies, given both the performance objective and constraints; the development of designed testing required for active, continuous learning; and the robust extrapolation of an optimized strategy to a wider set of scenarios than historically encountered.
Optimization capabilities also include a proprietary mathematical modeling and programming language, an easy-to-use development and visualization environment, and a state-of-the-art set of optimization algorithms.
Transaction profiling is a technique used to extract meaningful information and reduce the complexity of transaction data used in modeling. Many solutions operate using transactional data, such as credit card purchase transactions, or other types of data that change over time.
In its raw form, this data is very difficult to use in predictive models for several reasons. First, an isolated transaction contains very little information about the behavior of the individual who generated the transaction. In addition, transaction patterns change rapidly over time. Finally, this type of data can often be highly complex.
To overcome these issues, a set of proprietary techniques are used to transform raw transactional data into a mathematical representation that reveals latent information, and which make the data more usable by predictive models. This profiling technology accumulates data across multiple transactions of many types to create and update profiles of transaction patterns. These profiles enable the neural network models to efficiently and effectively make accurate assessments of, for example, fraud risk and credit risk within real-time transaction streams.
Increasingly, teams are pushing the envelope of how to use information retrieval, machine learning, computational linguistics, matrix and graph algorithms, unsupervised clustering & data mining to solve predictive problems.
Who are some Predictive Analytics Providers
focus is on generation of new data, insight/foresight
exploring data, finding insights
expect uncertainty and probability and pattern rather than specific data
computational and probabilistic techniques
Vendors who provide this capability include:
Marketing services market — Fair Issac, Acxiom, Epsilon, Equifax, Experian, Harte-Hanks, InfoUSA, KnowledgeBase, Merkle and TargetBase, among others. These vendors compete with traditional advertising agencies and companies’ own internal information technology and analytics departments.
Origination market — Fair Issac, Experian, Equifax, and CGI, among others.
Customer management market — Fair Issac, Experian, among others.
Fraud solutions market — Fair Issac, Actimize, a division of NICE Systems, ID Analytics, Experian, Detica, a division of BAE, SAS and ACI Worldwide, a division of Transaction Systems Architects, in the banking market; IBM and ViPS in the healthcare segment; and SAS, Infoglide Software Corporation, NetMap Analytics and Magnify in the property and casualty and workers’ compensation insurance market.
Collections and recovery solutions market — Fair Issac, CGI, Experian, and various boutique firms for software and ASP servicing and in-house scoring and computer science departments, along with the three major U.S. credit reporting agencies and Experian-Scorex for scoring and optimization projects.
Insurance and healthcare solutions market — Fair Issac, Emdeon, Ingenix, ViPS, MedStat, Detica, a division of BAE, SAS, Verisk Analytics and IBM.
These vendors are classified into a variety of market categories:
business process management and business rules management providers;
providers of credit reports and credit scores;
providers of automated application processing services;
neural network developers and artificial intelligence system builders;
third-party professional services and consulting organizations;
providers of account/workflow management software; and
software companies supplying modeling, rules, or analytic development tools.
Example of Predictive Analytics: Coupons in Grocery Stores
Each Saturday, you head to Kroger (a grocery store) and fill up your cart. The cashier scans your items, then hands you a coupon – for $1.00 off your favorite brand of ice-cream. Withhundreds of thousands of grocery items on the shelves, how does Kroger know what you’re most likely to buy? Using predictive analytics and data from loyalty cards, computers in real-time are able to crunch terabytes and terabytes of your historical purchases to figure out that your favorite ice-cream was
the one item missing from your shopping basket that week. Further, the computer matches your past pu
Example of Predictive Analytics: ”MoneyBall” with Oakland A’srchase history to ongoing promotions in the store. So with your bill, you receive a coupon for the item you are most likely to buy next time.
During the late 1990s, the New York Yankees were the most acclaimed team in Major League Baseball. Small market teams like Oakland Athletics (Oakland A’s) had to change the way they did business. The A’s were not a wealthy team, in fact were ranked 12th (out of 14th) in payroll.
A core strategy question in sports is: How to compete with rich teams? How to spot and acquire low-cost undervalued talent that is a “force multipler”?
In 1999 Billy Beane (manager for the Oakland Athletics) found a novel use of data mining. Beane hired a statistics grad to analyze baseball statistics advocated by baseball guru Bill James. Beane was able to hire excellent players undervalued by the market. A year after Beane took over, the A’s ranked 2nd!
While the Yankees paid its star players tens of millions, the A’s managed to be successful with a low payroll. How did they do it? When signing players, they didn’t just look at basic productivity values such as RBIs, home runs, and earned-run averages. Instead, they analyzed hundreds of detailed statistics from every player and every game, attempting to predict future performance and production. Some statistics were even obtained from game footage by using video recognition techniques. This allowed the team to sign great players who may have been lesser known but were equally productive on the field.
The Oakland A’s started a trend, and predictive analytics began to penetrate the world of Baseball. The application of predictive analytics to a wide variety of sports is now standard practice. It’s important to note that baseball statistics is not new. Leveraging stats to make hiring decisions is.
According to historical record, Dodgers General Manager Branch Rickey hired the first baseball statistician in 1947, after which the use of statistical analysis in baseball grew. But the practice took a major leap forward in 1977 when Bill James began self-publishing works about a new discipline he called Sabermetrics.
Sabermetrics uses statistical analysis to analyze baseball records and make determinations about player performance. James called sabermetrics “the search for objective knowledge about baseball”. Sabermetricians have questioned some basic assumptions about how talent and player contributions are judged and created quite a stir. But over time, many sabermetric ideas have found wide acceptance.