Archive

Archive for the ‘Predictive Analytics’ Category

Building a One-Way ANOVA R Extension in SAP Predictive Analytics

One of the first statistical tools many new analyst use is the Analysis of Variance, a collection of statistical methods used to decompose and understand the causes of variation within a set of data. One-way ANOVA is perhaps the most basic of these methods and a staple of most statistical software. While SAP has included many predictive algorithms in their SAP Predictive Analytics Expert application, it is missing many of the common descriptive algorithms used by data scientists to better understand their data. Luckily, it is relatively easy to build in custom R extensions to accommodate any descriptive statistical needs.

What is One-Way ANOVA?

One-Way Analysis of Variance is used to compare means within 3 or more samples to evaluate whether or not all groups have the same means (in effect, there is no difference in the monitored statistic between the groups). Of course there is natural variation in data, so the actual means of the groups may vary slightly, but the age-old question persists: is the difference statistically significant???

One-Way ANOVA is an omnibus test, which means that if the null hypothesis (all means are the same) is rejected, it offers no additional information on which of the group(s) are different from each other, simply that at least one of them is different enough to reject the hypothesis that they are all the same.

For additional background on One-Way ANOVA, see the relevant Wikipedia article.

To download full PDF, and Continue Reading…

Hilary BlissAbout Hillary Bliss
Hillary is a Senior Manager – Data & Analytics at Protiviti, and specializes in data warehouse design, ETL development, statistical analysis, and predictive modeling. She works with clients and vendors to integrate business analysis and predictive modeling solutions into the organizational data warehouse and business intelligence environments based on their specific operational and strategic business needs. She has a master’s degree in statistics and an MBA from Georgia Tech.

Categories: Predictive Analytics

50 Business Problems I’ve Addressed with Predictive Analytics, Data Science, and Advanced Analytics

I was reading Vincent Granville’s recent blog post and thought I might add a few of the problems I’ve addressed in my career. While these are not all data science they do fall within advanced analytics.

  1. Estimating hybrid yields and crop characteristics across multiple geographies, soil types, climates, and ecosystems based on performance of limited field trials. Estimating the same for heretofore uncrossed inbreds.
  2. Accurately forecasting the demand for promotional items driving the market basket 18 months into the future in order to accommodate an extended supply chain.
  3. Estimating the brand capital associated with consumer brands in the marketplace. E.g. What is the value of a brand in the marketplace in terms of both goodwill on the balance sheet and an organizations ability to leverage the brand capital through marketing to deliver sales.
  4. Network optimization for optimizing supply chains. Optimizing supply chain routing in near real-time.
  5. Optimal scheduling of ship dates for seasonal goods based upon stochastic analysis of probable events along the chain in order to assure supply without clogging the pipe.
  6. Call center optimization.
  7. Forecasting the demand for retail store associates across a large retail chain and optimizing the schedule based upon that forecast.
  8. Predicting what goods are most likely to be purchased during a hurricane warning.
  9. Predicting the annual sales of a prospective retail site based upon demographic, market, competitor and other data.
  10. Optimizing locations for retail site selections using game theory and accounting for cannibalization and impact on key competitors.
  11. Optimal routing for transportation fleets given uncertainty.
  12. Pricing optimization, including setting pricing strategies for every day pricing.
  13. Estimating the impact of pricing changes on demand for high affinity items. Understanding expected impact of promotional pricing on over-all revenues based on affinity items and additional trips.
  14. Estimating the impact of operational activities on competitors and the likelihood of cannibalization.
  15. Estimating sell through dates for seasonal goods and understanding the risk profile for lost sales opportunity and clearance/write-off.
  16. Identifying potentially fraudulent transactions at the register and in the back office regarding cash and check deposits.
  17. Identifying potentially fraudulent workman’s comp claims.
  18. Identifying primary causes associate with employee injury accidents and prioritizing limited resources to prevent and mitigate lost productivity, workman’s comp expenses, and long term liability associated with miss-handled claims.
  19. Identifying potential tax savings due to missed tax benefits across multiple tax jurisdictions.
  20. Analyzing production data to identify root cause of quality issues and productions slow downs in manufacturing environments.
  21. Analyzing clinical trial data for safety and efficacy of treatment protocols.
  22. Analyzing system performance logs to understand bottlenecks in production and analytical lab computing environments.
  23. Analyzing consumer behavior to understand the impact of marketing message theme, channel preference, pricing sensitivity, seasonal good purchase cycle, brand affinities, product affinities, loyalty engagement, net promoter score, customer satisfaction, lifetime value, purchase driver, style preferences, color preferences, size preferences and other brand specific factors.
  24. Integrating consumer behavior data with attitudinal and demographic data to make  cohort level inferences regarding behavior.
  25. Response and uplift modeling to understand the impact of direct marketing efforts in a test and learn environment.
  26. Establishing value of information models to map response/uplift to the financial benefit they bring to the organization.
  27. Establishing a champion/challenger approach to model deployment and consistently measuring the impact the model brings to the business.
  28. Leveraging machine learning to assess when a model needs to be re-scored, refit, remodeled, or replaced.
  29. Integrating insurer, practitioner, and population data, formulary status, and negotiated pricing levels for branded prescription medicines to analyze the impact on long term sales and profits.
  30. Applying artificial intelligence techniques to assist in optimum model selection across predictive analytics solutions.
  31. Forecasting sales and returns in the publishing industry this requires forecasting at the distributor and retailer level and understanding revenue recognition, probability of returns and the publisher’s liability associated with returns.
  32. Integrating omni-channel data in order to model customer response to brand treatments across multiple touch points.
  33. Estimating the most likely customer segment for cash baskets in a retail environment with a high percentage of cash transactions.
  34. Optimizing the retail supply chain for demand driven pull.
  35. Individual store assortment planning for large chain retailers based upon customer behavioral profiles.
  36. Dealer performance estimation and visualization (GIS) in the automotive industry.
  37. OMNI Channel retail performance marketing delivering uplift modeling in a champion/challenger environment and integration into a marketing automation system.
  38. Estimating the customer response to postpaid plan upgrade offers (propensity/uplift) by micro audience and offer theme.
  39. Marketing mix modeling at the individual store level for a larger retailer.
  40. Leveraging early IOT data sources to improve forecasting results.
  41. Analysis of disparate data sources on Hadoop to understand the impact on quality of management decisions.
  42. Automation and application of artificial intelligence techniques in the champion and challenger process for on-going predictive model performance management.
  43. Stochastic optimization of project outcomes (based on data science generated predictors) for the purpose of project portfolio management.
  44. Preparing stochastic based financial projections based upon known cause and effect and predictive relationships (tactics driving KPI’s driving financial performance) for complex businesses.
  45. Estimating future gross margin contribution across a large research and development portfolio for new technology introduction, new product introduction, and continuous product improvements. Application of those estimates to identify key success drivers. Stochastic modeling of the key drivers and estimation models to allocate and optimize annual R&D budgets.
  46. Optimizing balance sheet and off-balance sheet labor costs in a manufacturing environment based on key production predictors including outside economic variables, internal scheduling constraints and forecast demand.
  47. Estimating future resource utilization based on forecasts, historical forecast accuracy, sales pipeline, and existing contract terms.
  48. Predicting customer churn and developing optimal retention policies to prevent it.
  49. Identifying cross sell and up-sell opportunities in a business to business environment.
  50. Identifying root cause of sub optimal brand performance across a portfolio of brands and determining optimal corrective action to improve financial performance.

Patrick McDonald HeadshotAbout Patrick McDonald 

Patrick McDonald is an Associate Director with Protiviti focused on advising clients in the Retail, Manufacturing and Telecommunications industries on analytical solutions. Over a 20 year career in advanced analytics, Patrick completed tours in a big four firm and leading analytics technology and software companies.

 

Categories: Predictive Analytics

How Can I Increase the Value of My Marketing Investments Using Predictive Analytics – A Real Life Use Case

In my last blog, I discussed how predictive analytics can increase your marketing bang for the buck by giving you clear insights into where to spend your marketing dollars. In this entry, I’ll give you a real-world example of how a retail department store chain with multiple product categories decided who to target for a store mailer using analytics and customer segmentation.

To determine their target audience, the retailer wanted to gain a better understanding of their customer segments and where money was being spend across those segments. The first step was determining which customers shop which categories. Working together, we mapped over 2 million customers, identified spend by product category and then clustered customers by product category and category spend.

Predictive Product Category

To download full PDF and Continue Reading…

Patrick McDonald HeadshotAbout Patrick McDonald

Patrick McDonald is an Associate Director with Protiviti focused on advising clients in the Retail, Manufacturing and Telecommunications industries on analytical solutions. Over a 20 year career in advanced analytics, Patrick completed tours in a big four firm and leading analytics technology and software companies.

Categories: Predictive Analytics

How Can I Increase the Value of My Marketing Investments Using Predictive Analytics?

If your marketing strategies cost more than they earn, they obviously aren’t good long-term marketing strategies. One of the most useful tools at your fingertips for ensuring and increasing your marketing investments’ value is predictive analytics. Specifically, using predictive analytics to anticipate an individual customer’s needs and wants. Predictive modeling can provide profound insights into customer preferences and trends, allowing you to tailor your strategies around the customer. This is customer experience optimization. Customer experience is a major revenue driver!

If you understand which questions you’re trying to answer or issues you’re trying to resolve from a business perspective, you can build models that will help you understand a customer response to a particular treatment, allowing you to address those key business questions and engage customers more personally.

Some key questions or issue you might want to begin with are:

  • Not enough customers
  • Customers not buying enough
  • Engaging the wrong customer
  • Haven’t found the right customer
  • What new markets can we engage and how

To download full PDF and Continue Reading…

Patrick McDonald HeadshotAbout Patrick McDonald

Patrick McDonald is an Associate Director with Protiviti focused on advising clients in the Retail, Manufacturing and Telecommunications industries on analytic solutions. Over a 20 year career in advanced analytics, Patrick completed tours in a big four firm and leading analytics technology and software companies.

Categories: Predictive Analytics

Using People Analytics to Increase Employee Loyalty

The ability to attract and retain a loyal employee base and understand root causes for employee disengagement and disloyalty are key strategic objectives for every organization – big or small. If you want to improve employee productivity and/or decrease the cost associated with attracting and retaining employees, you need to move along the analytics maturity curve and start leveraging People Analytics.

What is People Analytics?

Put simply, people analytics is a predictive, data-driven approach to managing people at work. Analytics centered around your employees. It is used to address people-related issues, such as talent acquisition, performance evaluations, leadership positioning, hiring and promotion, job and team design, and employee compensation.

Increasing Employee Loyalty Using People Analytics

People analytics help you merge employee data, company data, and market data to predict and interpret valuable employees’ behaviors, as well as operations-level insights, giving you competitive vision for developing your retention strategies.

To download full PDF and Continue Reading...

 

John Harris Headshot About John Harris

John Harris, Senior Manager – Predictive Modeling and Advanced Analytics, has over 16 years of industry experience applying strategic thinking and advanced analytical skill set to optimize resources, improve processes and develop quantitative models that turn data into decision-aid information for all levels of leadership. Airline and energy utility employers have attempted to patent his deliverables related to predictive and optimization modeling.

Categories: Predictive Analytics

HANA Sentiment Analysis with SAP Predictive Analytics

One of the new features I’m most excited about with the new SAP Predictive Analytics 2.4 update is the HANA Sentiment Analysis module that’s been added for HANA online mode.

The HANA Text Analytics engine has been available for several years, but has remained somewhat inaccessible due to the relatively complex interface required to use it.

I’ve written in the past about how much I love the SAP HANA (and SAP Data Services) text analtytics engine for analyzing social media data surrounding the Sharknado movie, and I also co-authored an SAP Press E-Book about the various text analytics tools available within the SAP ecosystem, but I’m so excited because this new component makes this powerful tool available to business users for the first time. For full details on how SAP text analytics works, how to create and use full text indexes within SAP HANA, and a long list of business-related applications for text analysis, checkout the SAP Press E-Book Text Analytics with SAP, released in October 2015.

HANA Sentiment Analysis Module

With Expert Analytics within Predictive Analytics 2.4, SAP has opened up a portion of this functionality to business users within the Expert Analytics tool. The HANA Sentiment Analysis module is found under the Data Preparation function category.

To download full PDF and Continue Reading…

Hilary BlissAbout Hillary Bliss 
Hillary Bliss is a Senior ETL Consultant at Decision First Technologies, and  specializes in data warehouse design, ETL development, statistical analysis, and  predictive modeling. She works with clients and vendors to integrate business  analysis and predictive modeling solutions into the organizational data  warehouse and business intelligence environments based on their specific operational and strategic business needs. She has a master’s degree in statistics and an MBA from Georgia Tech.

 

Categories: HANA, Predictive Analytics

What’s New in SAP Predictive Analytics 2.4?

Released just before Thanksgiving 2015, SAP’s latest enhancements to the Predictive Analytics suite introduce some exciting new features and algorithms.

Automated Analytics

The first major enhancement for the Automated Analtyics side in quite some time really enables Automated users with HANA to take full advantage of the power of HANA for the first time. I don’t have any screeenshots to share for this feature, but essentially this allows Automated Analytics to leverage the APL (Automated Predictive Library) within HANA to perform all model training on the HANA server; previously HANA was used simply as a data source and the data was fully transferred either to the local machine (desktop version) or the Automated Analytics Engine server for processing.

It will be interesting to see how this affects licensing for the Predictive Analytics products going forward; for example if an organization only intends to use data on HANA for predictive activities, it could possibly mean that they no longer need to license the Engine component and can perform predictive analysis on big datasets through the HANA connection, if all the desired functionality is available within HANA.

Expert Analytics

With this update, SAP has introduced 2 new HANA components to the Expert Analytics platform and 1 new custom component option.

To download full PDF and Continue Reading…

Hilary BlissAbout Hillary Bliss 
Hillary Bliss is a Senior ETL Consultant at Decision First Technologies, and  specializes in data warehouse design, ETL development, statistical analysis, and  predictive modeling. She works with clients and vendors to integrate business  analysis and predictive modeling solutions into the organizational data  warehouse and business intelligence environments based on their specific operational and strategic business needs. She has a master’s degree in statistics and an MBA from Georgia Tech.

Categories: HANA, Predictive Analytics

Predictive Model Maintenance

A piece of predictive analytics I want to focus on in this blog entry is predictive model maintenance. It is not a favorite topic, but it is something that needs to be done and actually considered before you even build your models.

You have to make sure that the predictive models you build are still performing—a month, six months, a year later—the way you need them to in order to make those important business decisions. Model maintenance provides an opportunity to evaluate model accuracy and make updates to the model if it is no longer accurate.

But let’s begin with planning for model maintenance. What does that look like?

Building Your Models

First, develop your models with maintenance in mind. Pick an algorithm that is fairly straightforward and easy to update. This means you may have to compromise a little between having a model that is as accurate as possible or having a model that is reasonably uncomplicated and easier to implement. Don’t choose the most complicated model just because it is one-tenth of a percent more accurate, but could potentially be so much harder to maintain! It is better to make those trade-offs in the beginning; you can really speed up your time to market for model implementation and cut down on maintenance time with a simpler model. Also, try to table drive as much as you can, as this will make the archival process and maintenance much easier.

When developing your models, make sure you have whole team involvement. Often, those who build the models may not be who implements and maintains them. It is important to keep open communication between teams to make future maintenance a smooth process. You want the folks who know your tools, systems and environments to advise during the model development process. You also want to make sure that model owners are kept up to date on any changes to the data collection process so they can adjust the models accordingly.

Lastly, save all of your model building materials! There is a huge time and knowledge loss if the original builder leaves and takes the code and build knowledge. You almost always have to do a rebuild when this happens.

Model Reviews and Ongoing Maintenance

So when is the best time for model maintenance? Ideally, schedule regular reviews and refreshes of your models. You can do this on a regular basis and do all of your models at once, or set a schedule for each model. Your data has to maintain reliability or your model performance decreases.

Models should also be evaluated for maintenance after:

  • Changes to any model inputs
  • User interfaces change
  • Quality of the data changes
  • Changes in environment
  • Macroeconomic conditions
  • Competitor influences
  • Change to population
  • Customer profile changes
  • Product offering changes

Once you’ve tested and reviewed your models, you have to determine what kind of maintenance you need to do. There are two types of model maintenance:

Refresh/refit: This is an instance where you are using the same variables as the models used before, but perhaps you only need to tweak the weights of those variables. You are not evaluating everything in the model, but simply making minimal changes to a particular aspect of the model. This type of maintenance requires very few system changes and allows you to leverage your existing implementation strategy.

Rebuild: This is a situation when you want to reconsider your model inputs and/or algorithm. Obviously, this is a much higher level of effort than just doing a refresh, and could potentially be as much effort as the initial model build. You may even have changes to the implementation. But if your model and its performance has degraded significantly, this may be the best option.

You have to gauge your effort level based on what is needed to bring that model back to its highest performance level.

Once you’ve made your model changes, you need to test the new models and communicate the new model and its changes to your users. If you have stored historical scores from your models, be sure to tag them with a model version number so you know which scores came from the original version and which came from the new version—with a major model enhancement scores may change drastically or even have an entirely different meaning.

Conclusion

Again, model maintenance is not particularly a favorite activity for anyone, but it is something that you really must do, and in fact, plan for before you even build your model. You really can’t afford to let your models degrade. Your customers, conditions and environments are changing every day. Your models need to reflect those changes so your business can keep up.

Hilary BlissAbout Hillary Bliss
Hillary Bliss is a Senior ETL Consultant at Decision First Technologies, and  specializes in data warehouse design, ETL development, statistical analysis, and  predictive modeling. She works with clients and vendors to integrate business  analysis and predictive modeling solutions into the organizational data  warehouse and business intelligence environments based on their specific operational and strategic business needs. She has a master’s degree in statistics and an MBA from Georgia Tech.

 

Categories: Predictive Analytics

Deciding How to Implement Your Predictive Models

Implementation is the piece of your analytics strategy that makes your models available to everyone. Your models aren’t valuable until they can be accessed from wherever they need to be, whether that is in a business intelligence environment or within an application. If no one can get to the information, it’s useless!

There are two ways to implement your predictive models and pros and cons to each. It is helpful to view the pros and cons of each method relative to your needs, rather than to the other method.

Batch Scoring

Using this method, data is scored in the modeling application, and then the data is written back to database via JDBC/ODBC connection or flat file transfer. This is the best option for something like a direct mail drive.

Pros

  • Generally easier to implement; faster implementation and minimal testing required
  • Model score vintage/versioning/stability is easier to control
  • Model updates require no programming changes
  • Scores can be accessed by multiple applications

Cons

  • Lag in obtaining model scores, not an instantaneous response

Real-time Scoring

Using this method, you export the model scoring algorithm to other applications. The scoring equation depends on algorithm. This is best for something like an online price quote based on customer data.

Pros

  • Fresh model scores immediately available

Cons

  • Requires significant programming/integration effort/testing
  • Updating model requires re-development/testing
  • May be difficult to get consistent scoring snapshots/history
  • Requires a bit more statistical expertise

Even though it may look like the choice is simple, based only on the pros and cons, when deciding which way to implement your model, ask yourself the following:

  • How will the model be used and what are the user requirements?
    • Is there a need for an instantaneous response? For example, an online insurance quote would require real-time scoring, while a monthly update to a customer database would do well with batch scoring.
  • What areas/applications will need to use the scores?
    • How spread out does the information need to be? Will the data be in your sales system, customer system, and operational system? All of the above? Or perhaps just one, like your marketing system?
  • What data is required to calculate scores and where is this data available?
    • Do you need fifty pieces of data from fifty different departments in order to calculate, or just one or two pieces of information?
  • How often is the model going to be updated (refreshed or rebuilt)?
    • If you will need to rebuild your model every month, you are not going to want to extract that algorithm and put it into different software

Who should be involved in the implementation process?

Model implementation is really a team effort between the data scientists and/or model developers and the systems team who make the systems “talk” to each other. So it’s really important to take a holistic approach when building and deploying your models. Everyone is a potential stakeholder, so ensure you have the right people involved from the beginning of the process.

Wrap Up

Deciding which method to use isn’t a matter of one method necessarily being better than the other, but instead a matter of one method being more suited to a particular output, or better for the type of information you’re working with, or the skills of the people on your team who will be using the data and why. Asking yourself the above questions as well as considering future model maintenance and potential model degradation, are what drives your methodology.

Good thing I’ll be discussing model maintenance in my next blog!

Hilary BlissAbout Hillary Bliss
Hillary Bliss is a Senior ETL Consultant at Decision First Technologies, and  specializes in data warehouse design, ETL development, statistical analysis, and  predictive modeling. She works with clients and vendors to integrate business  analysis and predictive modeling solutions into the organizational data  warehouse and business intelligence environments based on their specific operational and strategic business needs. She has a master’s degree in statistics and an MBA from Georgia Tech.

Categories: Predictive Analytics

SAP Predictive Analytics Custom R Component for Correlation Plot

One great use of SAP Predictive Analytics Expert Analytics R Custom Components is to create an easily-run process for a particular calculation, chart, or visualization that you want to perform often. One example of this is correlation analysis, which is a step that many data scientists perform as a pre-processing and inspection step when building a predictive model.

Expert Analytics does have some built-in data analysis exhibits that are automatically generated whenever a predictive workflow is run (some examples are below), but they only produce exhibits for numeric data fields.

custom component 1

custom component 2

To download full PDF and Continue Reading…

Hilary BlissAbout Hillary Bliss
Hillary Bliss is a Senior ETL Consultant at Decision First Technologies, and  specializes in data warehouse design, ETL development, statistical analysis, and  predictive modeling. She works with clients and vendors to integrate business  analysis and predictive modeling solutions into the organizational data  warehouse and business intelligence environments based on their specific operational and strategic business needs. She has a master’s degree in statistics and an MBA from Georgia Tech.

Categories: Predictive Analytics