Critical Success Factors For Quantum Traffic Optimization Projects

This is a copy of an original post on the Inside Quantum Technology blog here.

The Strategic Brief:

A review of traffic optimization efforts since 2017 reveals common factors leading to successful outcomes. Australia’s NSW state transport department just announced the southern hemisphere’s first optimization project. However, the project has not yet included some of the key skills needed.

Bus With Atomic Symbols

The northern hemisphere has monopolized projects exploiting quantum computing to optimize traffic management and navigation. All that is about to change. Australia’s New South Wales state government recently announced the southern hemisphere is getting their first traffic project. The government-led partnership will attempt to solve traffic issues for a city where movement is complicated by a large harbor and rivers dividing north/south flow. It is not clear whether the government has gathered all the necessary partners to the task.

Examining projects that tackled tracking traffic optimization since 2017 reveals several critical success factors for Quantum Computing efforts:

  • plentiful data on supply availability
  • plentiful data on demand location and timing
  • data engineering skills to clean and validate source data
  • analytics and data science skills for analyzing data
  • modeling skills for path optimization
  • quantum developer skills and algorithm expertise for building the quantum application
  • classical developer skills to interface to user-facing applications

Volkswagen Led the Way in Traffic Optimization Projects

In “Quantum Computing Strategies: 2019”, Lawrence Gasman & Peter Morgan wrote about the Volkswagen Group (VW) as one of the first automakers to work intensively with quantum computing technology.

Their early leap into Quantum Computing was justified by Volkswagen’s CIO Martin Hofmann during a CEBIT talk as “… the learning curve in this field is very long. And then there’s simply the risk that you’re stuck behind the wave when it rises. Know-how in this field has to be built now and employees ought to get into this subject area now so that the group does not lag behind later. We feel sure that the know-how we build today will give us a competitive edge.”

In 2017, Volkswagen led a test project in Asia, using raw data from Beijing’s plentiful connected taxis. Partnering with the taxi service delivered significant quantities of data on traffic flow and transport supply – taxis. Using a hybrid system combining classical and quantum computers, the project predicted the demand for taxis up to an hour in advance of need. Drivers could plan to arrive at locations within the city where demand (passengers) would otherwise be underserved.

By 2019, Volkswagen was leading projects in Europe and felt ready to run a pilot managing bus traffic carrying thousands of passengers for the Web-Summit technology conference in Lisbon, Portugal. The pilot’s partnership with CARRIS public bus services provided a ready source of specific data on supply, demand, and trip time supporting navigation optimization for the conference buses.

These early efforts are already being used by others in the Quantum Computing industry. During the Web-Summit conference, Volkswagen’s earlier research acted as the basis for Masayuki Ohzeki’s efforts. Ohzeki, from Tohoku University, created an algorithm calculating the ideal escape routes in the event of a tsunami – needing a quantum computer to run in real time. Ohzeki’s algorithm sources smartphone GPS data from millions of people affected to determine that individual’s exact position and calculate how he or she can reach the nearest safe place. No simple task when you take into account the acute traffic situation and the possible movements of people fleeing. Bountiful, clean data is critical for success.

Fulfilling Needs Through Partnerships

Of interest across these projects are the partnerships Volkswagen used. Quantum annealing is especially well-suited for these projects as it can natively solve optimization problems. No surprise to see D-Wave as a constant platform and partner for Volkswagen’s projects. (Note: these partnerships are not exclusive with D-Wave also working with other auto companies including Toyota, Ford, and Denso.) The Volkswagen applications for Lisbon were developed jointly with the software specialists Hexad. The analytics-focused PTV Group provided the movement flow analysis within its city model. For the Barcelona project, Volkswagen added data science specialists Teralytics and telecommunications service provider Orange to deliver quantities of data on people locations and movement.

Southern Hemisphere’s First Announced Project

In the southern hemisphere’s first project, the NSW government transport department’s partnership with Q-CTRL calls for a proof of concept. The department can supply plentiful raw data on transport supply across buses, ferries and trains, and Q-CTRL bring a mature choice of python-based quantum development stacks and experience to the mix.

However, based on an analysis of similar projects to date, some critical skills appear missing from the announced partnership. While the department delivers real-time supply data, optimization requires knowledge of potential demand to answer questions like: where are the passengers gathering; where will they gather; and when will demand exceed supply. The addition of a mobile phone telco such as Optus or Telstra could supply real-time passenger demand data to the partnership. Similarly, detailed traffic flow-analysis and city modeling requires specific data science and engineering expertise to convert algorithm outputs into useful instructions for drivers and captains.

The Impact of Unclean Data

Unclean data can have a significant negative impact on quantum applications. For example, an Inside Quantum Technology client who focuses on stock futures runs a data validation step before their model iterations predicting imminent winners and losers. They shared how a period of inaccuracies in related equity data extended the data validation stage into a six-hour process. Unclean data can not only delay results, but the longer run times increase compute costs as well.

As Jeffrey Cohen, the President of Chicago Quantum, noted “Understanding the data that’s available, how clean it is, and how you might gather and use it – that’s almost always the long pole in the tent.”

Slowdown is the New Outage (SINTO)

This is a copy of an original post on the AppDynamics blog here.

The Strategic Brief:

With ‘Orange Is The New Black’ (OITNB) wrapping its final season, let’s reclaim the title formula ‘x is the new y’ with SINTO. This post explores tracing, monitoring, observability and business awareness. By understanding the difference in these four methods, you’ll be ready to drive agile applications, gain funding for lowering technical debt, and focus on customer retention.

Sunset Photo

Common application outage sources have been addressed by implementing Agile, DevOps and CI/CD processes. The resulting increase in system uptime allows site reliability engineers (SREs) to move their focus onto tuning performance, and for good reason. While outage-driven news headlines can cause stock prices to plummet short term, the performance-driven reputation loss is a slow burn for longer-term customer loss.

Whether accessed via web browsers, smart phones or Internet of Things devices, slowdowns drive customers to abandon shopping carts and consider competitors. Slowdowns lead to reputation loss for enterprises—a loss that may even flow to an engineer’s career. If you were considering hiring an SRE, how much weight would you give to the company’s reputation for poor or unpredictable customer experiences?

As high blood pressure is a silent killer of humanity, slowdown is the silent killer of reputations.

Slowdowns vs Outages

Consider the significant differences between outages and slowdowns:

Slowdowns are commonly the result of resource constraint. Either you don’t have enough of the resource, or you’re using the resource poorly, causing contention. If you have too many network transactions on a narrow bandwidth, or if system memory is filled with unnecessary locked pages, a slowdown could result. In a prior life while managing hospital data centers, I saw invalid HL7 messages generate recurring error records into message queues and choking inter-hospital communications. Nurses had to run between laboratories and wards with results as the needless error messages caused a slowdown in the genuine laboratory results getting through. We know outages lose customers, but when there are no outages, what will drive customer loss?

Slowdown is the new outage. #slowdownisthenewoutage #SINTO

Insight vs Observability

DevOps methodologies came with a minimum requirement for monitoring application performance in production.

In turn, SRE comes with the requirement for observability—the capacity to reach into the code and answer an unpredictable question.

While observability supports diagnosis, insight is needed for resolution. SRE implementations create a team of engineers delivering a platform of products and processes for developer usage to ensure highest availability. In addition, SRE moves the focus from reaction to proaction, generating a requirement for spotting the initial predictors of slowdown. This creates the need for a way to observe what code is doing while running production. Observable metrics need context to become actionable insight.

AIOps delivers the ML-driven automatic baselines and contextual correlation to allow SRE teams to engage preemptively (which in turn improves business outcomes, as Garter’s AIOps paper reports). Once a predictor anomaly is triggered, the SRE team can respond by updating a SQL query, coding a new function call, or scaling up resources to prevent the slowdown from escalating into a threat to the business. Post-response, the SRE team can then pass the details back to the application owners for longer-term resolutions.

While dtrace or manual breakpoints may be great for single applications on single machines,  they will “often fall short while debugging distributed systems as a whole,” notes Cindy Sridharan in Distributed Systems Observability. When trying to diagnose a complete customer experience relying on multiple business transactions in distributed multi-cloud production applications, observability falls short of insight. The good news is that if you have implemented monitoring as part of your DevOps rollout, the APM used to react to outages can be expanded to observe and diagnose slowdowns.

Finding Insight on Top of Observability

Neither monitoring nor observability is an end unto itself. For slowdown detection, we must see the broader picture of the total user experience. We must be able to take a step back from our usual I-shaped technical silos and apply T-shaped skills to seek insight into the causes of slowdowns.

Supporting observability can overload applications with additional code for metric creation capturable by APM. Observability only requires the individual metrics be present within the code without correlating them into the overall customer experience.

Delivering insight requires several key functions:

  • Baselines identifying normal performance
  • Segmented metrics of customer business transactions to identify weak points
  • Levers to isolate code portions within the production environment
  • Common trusted metric sources that span technology silos
  • Overhead minimization when performance is normal
  • Noise filtering from using ML-trained filters for anomaly detection

Creating observability within each application individually incurs technical debt, while an SRE-supporting APM solution can deliver observability across multiple applications. Moving to a DevOps or SRE model is problematic when you lack an understanding of how to observe and gain insight from metrics. Read more on how APM applies to DevOps.

Remember, it is the metric you don’t watch that bites you.

5 Critical Metrics When Deciding What To Automate In AIOps

This is a copy of an original post on the Forbes blog here.

The Strategic Brief:

What are the best ways to apply AIOps in your IT environment? Here are five key metrics to consider.

Flagpole photo

We automate for three benefits: to improve responsiveness, remove drudgery, and deliver consistent results. But automation has consequences, too. As you automate you’re potentially creating technical debt. The automated procedure must be kept up to date whenever you update the systems it automates. If it impacts, say, the network and you change your networking vendor, you’ll have to update the automation and the scripts around it. That’s why it’s important to assess what you need (and don’t need) to automate.

You may wish you could create an all-encompassing automation platform. However, automating reactions to production anomalies may include some major resolution tasks, like a rebuild or recovery of a database. Based on my consulting work, I’ve developed five criteria that I use when working with clients to help them decide what to automate in their IT environments.

Five Criteria for Assessing What to Automate in AIOps

1) Frequency

Will it take longer to implement the automation than to respond manually to events?

The straw that broke the camel’s back applies frequently to IT anomalies. A first step in an automation assessment is to identify how often the triggering event or anomaly has, or may, occur. There’s no point in automating the reaction to a one-off event. On the other hand, even though this may be the first time the anomaly has reached a crisis point, it may have occurred before.

When an issue finally comes to your attention—when something breaks—it’s often just the final straw in a series of events, like when a system overloads after coming close many times in prior weeks or months. A query language built into your performance monitor is a powerful support feature, as it allows you to quickly search for times when you came close to an anomaly in the past. Once you know what metrics lead up to the anomaly, you can query to find out how often the event occurs.

2) Impact

Are you automating the solution to a major issue? If the anomaly has an insignificant impact on your overall enterprise, incurring the technical debt of an automated response isn’t the answer. And if the problem is just a temporary slowdown and the response you would automate has high risk, then automation isn’t a go either.

So ask yourself: What’s the cost to the business?

Conversely, if you’re dealing with a dinosaur-extinction type of impact—one that, say, could cost the business millions of dollars in lost sales—you’ll definitely need to automate a response so that your customers never take the hit. In fact, the anomaly will be fixed before your customers are even aware of it. That’s where tracking business transactions will enable you to correlate the business impact with the organizational value.

3) Coverage

Coverage describes the proportion of real-world process that can actually be automated. If the automated task requires a manual step in the middle, such as unplugging a cable or having to contact your cloud provider, automating other parts of the procedure may not improve reaction times at all.

But if you’re sure the automation will cover the entire solution—I’m thinking of simple things here like boosting network bandwidth—then obviously automation is both easy and the right way to go. Scoring this metric should be binary: either the process can be fully automated or can’t be automated at all.

4) Probability

The probability of successful automation measures the accuracy of the reactive procedure. There are two sides to this metric: the uniqueness of the trigger, and the certainty of the reaction’s outcome. The triggering anomaly must be unique enough to identify that the reactive procedure is definitely the best way to address the event. Accurate root cause analysis (RCA) is critical and one of the significant benefits of applying machine learning or AIOps. However, an accurate RCA is only part of the solution, as the automated reactive procedure must predictably generate the same results in the same way each time.

5) Latency

One of the benefits of automation is improved responsiveness, and there’s a correlation between the value of automation and latency—the time an automated reaction will take to complete. Low-impact reactions, such as those that boost network bandwidth or increase the server or container pool, are perfect for automatic reactions. With these reactions the anomaly is often resolved before a human can even type in the necessary commands, and you avoid operator errors that can occur in manual responses.

Reactions that may take multiple hours to complete require caution. Do you really want to automatically start a multi-hour database rebuild or recovery, knowing that it will impact the production environment while it runs? You can still automate the commands to avoid operation error, but when the latency is long, you may wish to put an authorisation step into the automated reaction.

If an anomaly is happening often, and the automated reaction will resolve the anomaly faster than you can type, automate it!

The AIOps Features That Matter Most

When I work with clients, we assign a score to each of the key metrics. With some clients I have applied weightings to each metric to help balance business value against opportunity cost and technical debt. Totaling these scores not only helps us decide if something should be automated, but also with prioritizing the creation of reactive procedures based on business needs.

For effective business applications, you’ll need an application performance management (APM) solution with these required AIOps features:

  • Machine learning-driven anomaly detection and root cause analysis
  • Automated responses
  • Third-party integration capability

Your APM solution should also allow you to select automation procedures with built-in query languages and business transaction awareness. The ultimate goal here is to balance your efforts between automating the most valuable metrics, and freeing up your time to move from reactive to preemptive architecture and infrastructure reviews.

Successfully Deploying AIOps, Part 3: The AIOps Apprenticeship

This is a copy of an original post on the AppDynamics blog here.

The Strategic Brief:

By augmenting operations teams, AIOps enables organizations to preemptively ensure that applications, architectures and infrastructures are ready for rapid digital transformation.

Part one of our series on deploying AIOPs identified how an anomaly breaks into two broad areas: problem time and solution time. Part two described the first deployment phase, which focuses on reducing problem time. With trust in the AIOps systems growing, we’re now ready for part three: taking on solution time by automating actions.

French Clock

© 2019 Marco Coulter

Applying AIOps to Mean Time to Fix (MTTFix)

The power of AIOps comes from continuous enhancement of machine learning powered by improved algorithms and training data, combined with the decreasing cost of processing power. A measured example was Googles project for accurately reading street address numbers from its street image systems—a necessity in countries where address numbers don’t run sequentially but rather are based on the age of the buildings. Humans examining photos of street numbers have an accuracy of 98%. Back in 2011, the available algorithms and training data produced a trained model with 91% accuracy. By 2013, improvements and retraining boosted this number to 97.5%. Not bad, though humans still had the edge. In 2015, the latest ML models passed human capability at 98.1%. This potential for continuous enhancement makes AIOps a significant benefit for operational response times.

You Already Trust AI/ML with Your Life

If you’ve flown commercially in the past decade, you’ve trusted the autopilot for part of that flight. At some major airports, even the landings are automated, though taxiing is still left to pilots. Despite already trusting AI/ML to this extent, enterprises need more time to trust AI/ML in newer fields such as AIOps. Let’s discuss how to build that trust.

Apprenticeships allow new employees to learn from experienced workers and avoid making dangerous mistakes. They’ve been used for ages in multiple professions; even police departments have a new academy graduate ride along with a veteran officer. In machine learning, ML frameworks need to see meaningful quantities of data in order to train themselves and create nested neural networks that form classification models. By treating automation in AIOps like an apprenticeship, you can build trust and gradually weave AIOps into a production environment.

By this stage, you should already be reducing problem time by deploying AIOps, which delivers significant benefits before adding automation to the mix. These advantages include the ability to train the model with live data, as well as observe the outcomes of baselining. This is the first step towards building trust in AIOps.

Stage One: AIOps-Guided Operations Response

With AIOps in place, operators can address anomalies immediately. At this stage, operations teams are still reviewing anomaly alerts to ensure their validity. Operations is also parsing the root cause(s) identified by AIOps to select the correct issue to address. While remediation is manual at this stage, you should already have a method of tracking common remediations.

In stage one, your operations teams oversee the AIOps system and simultaneously collect data to help determine where auto-remediation is acceptable and necessary.

Stage Two: Automate Low Risk

Automated computer operations began around 1964 with IBM’s OS/360 operating system allowing operators to combine multiple individual commands into a single script, thus automating multiple manual steps into a single command. Initially, the goal was to identify specific, recurring manual tasks and figure out how to automate them. While this approach delivered a short-term benefit, building isolated, automated processes incurred technical debt, both for future updates and eventual integration across multiple domains. Ultimately it became clear that a platform approach to automation could reduce potential tech debt.

Automation in the modern enterprise should be tackled like a microservices architecture: Use a single domain’s management tool to automate small actions, and make these services available to complex, cross-domain remediations. This approach allows your investment in automation to align with the lifespan of the single domain. If your infrastructure moves VMs to containers, the automated services you created for networking or storage are still valid.

You will not automate every single task. Selecting what to automate can be tricky, so when deciding whether to fully automate an anomaly resolution, use these five questions to identify the potential value:

  • Frequency: Does the anomaly resolution occur often enough to warrant automation?
  • Impact: Are you automating the solution to a major issue?
  • Coverage: What proportion of the real-world process can be automated?
  • Probability: Does the process always produce the desired result, or can it be impacted by environmentals?
  • Latency: Will automating the task achieve a faster resolution?

Existing standard operating procedures (SOPs) are a great place to start. With SOPs, you’ve already decided how you want a task performed, have documented the process, and likely have some form of automation (scripts, etc.) in place. Another early focus is to address resource constraints by adding front-end web servers when traffic is high, or by increasing network bandwidth. Growing available resources is low risk compared to restarting applications. While bandwidth expansion may impact your budget, it’s unlikely to break your apps. And by automating resource constraint remediations, you’re adding a rapid response capability to operations.

In stage two, you augment your operations teams with automated tasks that can be triggered in response to AIOps-identified anomalies.

Stage Three: Connect Visibility to Action (Trust!)

As you start to use automated root cause analysis (RCA), it’s critical to understand the probability concept of machine learning. Surprisingly, for a classical computer technology, ML does not output a binary, 0 or 1 result, but rather produces statistical likelihoods or probabilities of the outcome. The reason this outcome sometimes looks definitive is that a coder or “builder” (the latter if you’re AWS’s Andy Jassy) has decided an acceptable probability will be chosen as the definitive result. But under the covers of ML, there is always a percentage likelihood. The nature of ML means that RCA sometimes will result in a selection of a few probable causes. Over time, the system will train itself on more data and probabilities and grow more accurate, leading to single outcomes where the root cause is clear.

Once trust in RCA is established (stage one), and remediation actions are automated (stage two), it’s time to remove the manual operator from the middle. The low-risk remediations identified in stage two can now be connected to the specific root cause as a fully automated action.

The benefits of automated operations are often listed as cost reduction, productivity, availability, reliability and performance. While all of these apply, there’s also the significant benefit of expertise time. “The main upshot of automation is more free time to spend on improving other parts of the infrastructure,” according to Google’s SRE project. The less time your experts spend in MTTR steps, the more time they can spend on preemption rather than reaction.

Similar to DevOps, AIOps will require a new mindset. After a successful AIOps deployment, your team will be ready to transition from its existing siloed capabilities. Each team member’s current specialization(s) will need to be accompanied with broader skills in other operational silos.

AIOps augments each operations team, including ITOps, DevOps and SRE. By giving each team ample time to move into preemptive mode, AIOps ensures that applications, architectures and infrastructures are ready for the rapid transformations required by today’s business.

Successfully Deploying AIOps, Part 2: Automating Problem Time

This is a copy of an original post on the AppDynamics blog here.

The Strategic Brief:

Built-in AI/ML—such as in AppDynamics APM—delivers value by activating the cognitive engine of AIOps to address anomalies.

Asian Clock 1

© 2017 Marco Coulter

In part one of our Successfully Deploying AIOps series, we identified how an anomaly breaks into two broad areas: problem time and solution time. The first phase in deploying AIOps focuses on reducing problem time, with some benefit in solution time as well. This simply requires turning on machine learning within an AIOps-powered APM solution. Existing operations processes will still be defining, selecting and implementing anomaly rectifications. When you automate problem time, solution time commences much sooner, significantly reducing an anomaly’s impact.

AIOps: Not Just for Production

Anomalies in test and quality assurance (QA) environments cost the enterprise time and resources. AIOps can deliver significant benefits here. Applying the anomaly resolution processes seen in production will assist developers navigating the deployment cycle.

Test and QA environments are expected to identify problems before production deployment. Agile and DevOps approaches have introduced rapid, automated building and testing of applications. Though mean time to resolution (MTTR) is commonly not measured in test and QA environments (which aren’t as critical as those supporting customers), the benefits to time and resources still pay off.

Beginning your deployment in test and QA environments allows a lower-risk, yet still valuable, introduction to AIOps. These pre-production environments have less business impact, as they are not visited by customers. Understanding performance changes between application updates is critical to successful deployment. Remember, as the test and QA environments will not have the production workload available, it’s best to recreate simulated workloads through synthetics testing.

With trust in AIOps built from first applying AIOps to mean time to detect (MTTD), mean time to know (MTTK) and mean time to verify (MTTV) in your test and QA environments, your next step will be to apply these benefits to production. Let’s analyze where you’ll find these initial benefits.

Apply AI/ML to Detection (MTTD)

An anomaly deviates from what is expected or normal. Detecting an anomaly requires a definition of “normal” and a monitoring of live, streaming metrics to see when they become abnormal. A crashing application is clearly an anomaly, as is one that responds poorly or inconsistently after an update.

With legacy monitoring tools, defining “normal” was no easy task. Manually setting thresholds required operations or SRE professionals to guesstimate thresholds for all metrics measured by applications, frameworks, containers, databases, operating systems, virtual machines, hypervisors and underlying storage.

AIOps removes the stress of threshold-setting by letting machine learning baseline your environment. AI/ML applies mathematical algorithms to different data features seeking correlations. With AppDynamics, for example, you simply run APM for a week. AppDynamics observes your application over time and creates baselines, with ML observing existing behavioral metrics and defining a range of normal behavior with time-based and contextual correlation. Time-based correlation removes alerts related to the normal flow of business—for example, the login spike that occurs each morning as the workday begins; or the Black Friday or Guanggun Jie traffic spikes driven by cultural events. Contextual correlation pairs metrics that track together, enabling anomaly identification and alerts later when the metrics don’t track together.

AIOps will define “normal” by letting built-in ML watch the application and automatically create a baseline. So again, install APM and let it run. If you have specific KPIs, you can add these on top of the automatic baselines as health rules. With baselines defining normal, AIOps will watch metric streams in real time, with the model tuned to identify anomalies in real time, too.

Apply AI/ML to Root Cause Analysis (MTTK)

The first step to legacy root cause analysis (RCA) is to recreate the timeline: When did the anomaly begin, and what significant events occurred afterward? You could search manually through error logs to uncover the time of the first error. This can be misleading, however, as sometimes the first error is an outcome, not a cause (e.g., a crash caused by a memory overrun is the result of a memory leak running for a period of time before the crash).

In the midst of an anomaly, multiple signifiers often will indicate fault. Logs will show screeds of errors caused by stress introduced by the fault, but fail to identify the underlying defect. The operational challenge is unpacking the layers of resultant faults to identify root cause. By pinpointing this cause, we can move onto identifying the required fix or reconfiguration to resolve the issue.

AIOps creates this anomaly timeline automatically. It observes data streams in real time and uses historical and contextual correlation to identify the anomaly’s origin, as well as any important state changes during the anomaly. Even with a complete timeline, it’s still a challenge to reduce the overall noise level. AIOps addresses this by correlating across domains to filter out symptoms from possible causes.

There’s a good reason why AIOps’ RCA output may not always identify a single cause. Trained AI/ML models do not always produce a zero or one outcome, but rather work in a world of probabilities or likelihoods. The output of a self-taught ML algorithm will be a percentage likelihood that the resulting classification is accurate. As more data is fed to the algorithm, these outcome percentages may change if new data makes a specific output classification more likely. Early snapshots may indicate a priority list of probable causes that later refine down to a single cause, as more data runs through the ML models.

RCA is one area where AI/ML delivers the most value, and the time spent on RCA is the mean time to know (MTTK). While operations is working on RCA, the anomaly is still impacting customers. The pressure to conclude RCA quickly is why war rooms get filled with every possible I-shaped professional (a deep expert in a particular silo of skills) in order to eliminate the noise and get to the signal.

Apply AI/ML to Verification

Mean time to verify (MTTV) is the remaining MTTR portion automated in phase one of an AIOps rollout. An anomaly concludes when the environment returns to normal, or even to a new normal. The same ML mechanisms used for detection will minimize MTTV, as baselines already provide the definition of normal you’re seeking to regain. ML models monitoring live ETL streams of metrics from all sources provide rapid identification when the status returns to normal and the anomaly is over.

Later in your rollout when AIOps is powering fully automated responses, this rapid observation and response is critical, as anomalies are resolved without human intervention.  Part three of this series will discuss connecting this visibility and insight to action.

Successfully Deploying AIOps, Part 1: Deconstructing MTTR

This is a copy of an original post on the AppDynamics blog here.

The Strategic Brief:

Quantifying the value of successful AIOps deployment requires tracking subsidiary metrics within the industry default of mean time to resolution (MTTR). This post breaks out the metrics that form MTTR and divides them into two categories: problem and solution.

Somewhere between waking up today and reading this blog post, AI/ML has done something for you. Maybe Netflix suggested a show, or DuckDuckGo recommended a website. Perhaps it was your photos application asking you to confirm the tag of a specific friend in your latest photo. In short, AI/ML is already embedded into our lives.

The quantity of metrics in development, operations and infrastructure makes development and operations a perfect partner for machine learning. With this general acceptance of AI/ML, it is surprising that organizations are lagging in implementing machine learning in operations automation, according to Gartner.

The level of responsibility you will assign to AIOps and automation comes from two factors:

  • The level of business risk in the automated action
  • The observed success of AI/ML matching real world experiences

The good news is this is not new territory; there is a tried-and-true path for automating operations that can easily be adjusted for AIOps.

It Feels Like Operations is the Last to Know

The primary goal of the operations team is to keep business applications functional for enterprise customers or users. They design, “rack and stack,” monitor performance, and support infrastructure, operating systems, cloud providers and more. But their ability to focus on this prime directive is undermined by application anomalies that consume time and resources, reducing team bandwidth for preemptive work.

An anomaly deviates from what is expected or normal. A crashing application is clearly an anomaly, yet so too is one that was updated and now responds poorly or inconsistently. Detecting an anomaly requires a definition of “normal,” accompanied with monitoring of live streaming metrics to spot when the environment exhibits abnormal behaviour.

The majority of enterprises are alerted to an anomaly by users or non-IT teams before IT detects the problem, according to a recent AppDynamics survey of 6,000 global IT leaders. This disappointing outcome can be traced to three trends:

  • Exponential growth of uncorrelated log and metric data triggered by DevOps and Continuous Integration and Continuous Delivery (CI/CD) in the process of automating the build and deployment of applications.
  • Exploding application architecture complexity with service architectures, multi-cloud, serverless, isolation of system logic and system state—all adding dynamic qualities defying static or human visualization.
  • Siloed IT operations and operational data within infrastructure teams.

Complexity and data growth overload development, operations and SRE professionals with data rather than insight, while siloed data prevents each team from seeing the full application anomaly picture.

Enterprises adopted agile development methods in the early 2000s to wash away the time and expense of waterfall approaches. This focus on speed came with technical debt and lower reliability. In the mid-2000s manual builds and testing were identified as the impediment leading to DevOps, and later to CI/CD.

DevOps allowed development to survive agile and extreme approaches by transforming development—and particularly by automating testing and deployment—while leaving production operations basically unchanged. The operator’s role in maintaining highly available and consistent applications still consisted of waiting for someone or something to tell them a problem existed, after which they would manually push through a solution. Standard operating procedures (SOPs) were introduced to prevent the operator from accidentally making a situation worse for recurring repairs. There were pockets of successful automation (e.g., tuning the network) but mostly the entire response was still reactive. AIOps is now stepping up to allow operations to survive in this complex environment, as DevOps did for the agile transformation.

Reacting to Anomalies

DevOps automation removed a portion of production issues. But in the real world there’s always the unpredictable SQL query, API call, or even the forklift driving through the network cable. The good news is that the lean manufacturing approach that inspired DevOps can be applied to incident management.

To understand how to deploy AIOps, we need to break down the “assembly line” used to address an anomaly. The time spent reacting to an anomaly can be broken into two key areas: problem time and solution time.

Problem time: The period when the anomaly has not yet being addressed.

Anomaly management begins with time spent detecting a problem. The AppDynamics survey found that 58% of enterprises still find out about performance issues or full outages from their users. Calls arrive and service tickets get created, triggering professionals to examine whether there really is a problem or just user error. Once an anomaly is accepted as real, the next step generally is to create a war room (physical or Slack channel), enabling all the stakeholders to begin root cause analysis (RCA). This analysis requires visibility into the current and historical system to answer questions like:

  • How do we recreate the timeline?
  • When did things last work normally or when did the anomaly began?
  • How are the application and underlying systems currently structured?
  • What has changed since then?
  • Are all the errors in the logs the result of one or multiple problems?
  • What can we correlate?
  • Who is impacted?
  • Which change is most likely to have caused this event?

Answering these questions leads to the root cause. During this investigative work, the anomaly is still active and users are still impacted. While the war room is working tirelessly, no action to actually rectify the anomaly has begun.

Solution time: The time spent resolving the issues and verifying return-to-normal state.

With the root cause and impact identified, incident management finally crosses over to spending time on the actual solution. The questions in this phase are:

  • What will fix the issue?
  • Where are these changes to be made?
  • Who will make them?
  • How will we record them?
  • What side effects could there be?
  • When will we do this?
  • How will we know it is fixed?
  • Was it fixed?

Solution time is where we solve the incident rather than merely understanding it. Mean time to resolution (MTTR) is the key metric we use to measure the operational response to application anomalies. After deploying the fix and verifying return-to-normal state, we get to go home and sleep.

Deconstructing MTTR

MTTR originated in the hardware world as “mean time to repair”— the full time from error detection to hardware replacement and reinstatement into full service (e.g., swapping out a hard drive and rebuilding the data stored on it). In the software world, MTTR is the time from software running abnormally (an anomaly) to the time when the software has been verified as functioning normally.

Measuring the value of AIOps requires breaking MTTR into subset components. Different phases in deploying AIOps will improve different portions of MTTR. Tracking these subdivisions before and after deployment allows the value of AIOps to be justified throughout.

With this understanding and measurement of existing processes, the strategic adoption of AIOps can begin, which we discuss in part two of this series.

Is Your IT Workforce Ready For AIOps?

This is a copy of an original post on the Forbes blog here.

The Strategic Brief:

AIOps will change the way organizations operate.

In the AIOps-enabled enterprise, where artificial intelligence and machine learning automate tasks to augment technology operations teams, businesses undergo a monumental shift that enables them to be more proactive, predictive and ultimately preemptive.

IMG_6312 2

Along the journey to the AIOps-enabled enterprise, the skills needed in your ITOps, DevOps, and site reliability engineering (SRE) teams will also evolve, requiring skills in customization, integration, automation, and governance. Most organizations aren’t ready for this seismic shift, however. A recent survey of 6,000 IT professionals shows the vast majority of global enterprises have yet to start an AIOps strategy.

The Current State of Operations

Let’s examine how we got to today’s IT operations organization. Specifically, I mean the people monitoring and managing the production environment, whether or not they have “operations” in their title.

In the last decade, the drive to agile and DevOps solutions moved operations towards development, creating the new skill set requirement of release engineering (RelEng), which is responsible for automating application deployment and providing structure for the software development lifecycle (SDLC). This required connecting the dots across domains (server, network, database, frameworks, code dependencies, and so on), and began changing development and operations from I-shaped professionals (deeply skilled in one area) to T-shaped professionals (skilled in one area but also knowledgeable in other domains).

You may notice that RelEng focuses on SDLC tasks such as automating builds, tests and QA— essentially automating all the work for deploying an application into production. In the past, DevOps failed to pay equal attention to the operations effort needed during the production lifespan of an application. AIOps addresses this DevOps weakness by applying AI/ML to anomaly detection, root cause analysis, resolution and verification, and by driving automation of anomaly resolutions. This means the AIOps enterprise will require a different skill set from ITOps, DeVops and SRE professionals.

The New Skill Profile for AIOps

AIOps is reducing the shelf life of two operational skills: the Sherlock Holmes-esque investigative skill that is the heart of root cause analysis, and the experience-based knowledge that lives within an individual. Instead, AIOps will identify or short-list the root cause, and resolvable actions will be captured and automated where warranted. When a clear root cause is found and a matching automated resolution in place, AIOps will be able to address the issue without human interaction.

Similar to cloud services, AIOps will require skills in customization, integration, automation, and governance. While team members with specialist skills will still have value, AIOps will encourage learning and collaboration with other disciplines, and allow you to measure how IT capability and growth are helping to achieve a goal. This represents a shift from the I-shaped and T-shaped specialist to a full-fledged versatilist.

The AIOps professional is a cross-domain expert who uses domain-specific skills to control a progressively widening scope of coverage, and who is equally at ease communicating the technical and business impacts of an issue.

Capability Levels Track Transition to AIOps

To align your team with the AIOps profile, define an alternate career path for them. IT professionals may see their careers tied to a siloed technology certification, and consider time spent learning other domains as coming at the expense of their specialization. You can delineate an alternate path by assessing their current skills, setting goals for the level your enterprise requires, and then building training and incentive programs to transition them into the new skill set.

A simple, six-level scale (based loosely on Bloom’s taxonomy used in education to assess learning effectiveness) can be used for assessment and goal-setting. Each domain’s skills can be measured against the individual’s capability.

The Six Levels Of IT Capability

  1. Awareness: The most basic level; professionals are aware that the technology or practice is in use somewhere in your enterprise.
  2. Understanding: The ability to understand where the technology or practice is used in the enterprise, and which team to contact if anything needs to be done with it.
  3. Applying: Performing basic tasks to manage the technology or practice, with a standard operating procedure (SOP) providing guidance.
  4. Analyzing: Knowing how to view related measures in an application performance monitoring (APM) solution and describe the cross-domain integration present for the technology or practice.
  5. Automating: Defining, creating and deploying automated processes for the technology or practice, allowing automatic resolution of anomalies by AIOps.
  6. Architecture: Designing and enacting an architecture for new implementations of the technology or practice. There may be vendor or institutional certifications available at this level.

The above capability scale can be applied across specialized technologies and more general practice and soft-skill areas. The technologies you assess, which will depend on what is used in your enterprise, may include: AWS, Azure, containers, microservices, Kubernetes, databases, network, infrastructure hardware, embedded frameworks, cloud service providers, APM tools, management tools, and more.

In addition, you will need to add categories for non-technical areas including:

  • Sharing: to incent the capture of knowledge for automation
  • Security: while this may appear as a technology, security is also a process and a behaviour that overlaps with governance
  • Programming: assessing the ability to create automation scripts and actions, including knowledge of language and usage of APIs
  • Governance: understanding where the technology sits within industry regulation and best practices

You can deploy AIOps without waiting for your skills transition to complete, as the technology provides significant benefits immediately. To realize the full value of AIOps, it’s essential to move your existing teams to a new skills profile. This transition can occur during your AIOps deployment. By using capability levels, goals and incentives, you’ll gain a clear path for growth, allowing teams to help your AIOps deployment succeed.

Augmenting operations teams via AIOps frees up time for team members. This time can be used to extend capabilities across domains and into the business, transforming professionals’ skills to fit the new AIOPs profile. Just as the business organization evolved to support citizen technologists and citizen data scientists, IT must evolve to support citizen business evangelists and automation strategists.

Nine Essential Skillsets for Competitive Digital Transformation

This is a copy of an original post on the AppDynamics blog here. I added further reading at the end for your enjoyment.

The Strategic Brief:

If you’re reading this, there’s a good chance you’re an Agent of Transformation ready to change the world. As your enterprise pivots towards AIOps, your team must accumulate the right skills to embrace digital transformation while innovating at scale.

Street Art in Cartagena, Colombia
Large and midsize enterprises successful at competitive transformation have one characteristic in common: careful team-building around both soft and technical skills. Let’s examine how you should think about your digital transformation team (even though it may not be called that). Since there are many books on building agile teams, squads and dojos, this post will focus on the soft skill mix that a majority of IT executives say is the roadblock to successful competitive digital transformation.

Application creation is facing accelerating waves of change. The World Economic Forum asserts we are entering the fourth industrial revolution, even as the third chugs along. Surviving concurrent revolutions requires our digital transformation approach to be as agile as our development methodology. Your transformation must result in a digitally competitive enterprise. The skills needed can be broken into three categories, each with three sub-categories.


Skills to Survive

Consider the bare minimum set of skills required for DevOps projects to avoid failure. These fall into three general subcategories: organizational, business and technology.


Organizational people line up the dominos for other participants to knock over. They ensure decisions are made and the work gets done as expected. These are skills or titles that DevOps practitioners will be well familiar with, including Scrum Master, Project Manager, Squad leader, and Technical Architect. Without these skills, effort tends to run overtime and wanders away from original goals.


Business people bring the reality check from the real world. They ensure that technical success will have business relevance, and that the business is ready for transformed business models and processes. Look for titles like Product Owner, Business Systems Expert, and Business Line Owner. As more digital natives enter your enterprise, expect a higher level of digital awareness and creativity from those bringing your business skills into the team.


Technology people build the complex clock and keep it ticking. Here you seek technology-specific skills such as TensorFlow, Kubernetes, or JavaScript that are needed by the specific architecture. On top of these siloed skills, look for general process experience as in DevOps, quality assurance, security, infrastructure, or integration.

These three groups are the essentials—the survival skills—for digital processes to exist and thus are the minimum set needed for digital transformation. Any enterprise going through this transformation has these skillsets—in some shape or form with engineering and organizational skills—in its transformation teams. However, once your business transformation introduces artificial intelligence as part of the architecture, you will need to think differently about the skills needed for success.

Skills for Machine Learning

The machine learning (ML) statistical revolution is changing the world. To embrace this change, enterprises must engage ML in two main ways: as a black box encapsulated within a vendor’s product; or custom-built for competitive advantage.

Application Performance Management (APM) is a good example
of the black box approach where AIOps or Cognitive Services
are delivered by your vendor, and the skills listed under
learning are not required.

When encapsulated, the needed skills are housed within the software vendor rather than in your organization, and the vendor will select the optimal algorithms and training frameworks for each type of data and specific use case. For targeted solutions like DevOps, the encapsulated approach is best.

However, you may be surprised by some of the skills required for your business to build out a data science team and gain competitive advantage from machine learning. Research from Accenture and MIT broke the skills surrounding artificial intelligence into three categories: trainers, explainers and sustainers. (The Jobs That Artificial Intelligence Will Create)


Trainers are what we see commonly in AI today. They match models and frameworks to specific tasks, and identify and label training data. Trainers help models look beyond the literal into areas such as how to mimic human behavior, whether in speech or driving reactions. In London, a team is trying to teach chatbots about irony and sarcasm so they can interact with humans more effectively.


As AI gets more advanced, the layers of neural networks creating answers will exceed simple explanations. Explainers will provide non-technical explanations of how the AI algorithms interpret inputs and how conclusions are reached. This will be essential to attain compliance, or to address legal concerns about bias in the machine. If you create AI to approve mortgages, for instance, how will you establish the AI is not inflicting bias based on gender or creed? The explainer will play a necessary role.


Someone needs to ensure the AI systems are operating as designed ethically, financially and effectively. The sustainers will monitor for and react to unintended outcomes from the “black box.” If the AI is selecting inventory and setting prices, a sustainer will ensure there is no resulting price-gouging on consumer necessities—thus avoiding customer revolt.

The machine learning marketplace is the opposite of the gig economy. In the gig economy, skills are a commodity, like driving a vehicle. You can swap cars and still be a skilled driver. In contrast, the needed skills for ML may change with every new type of data. When your competitive digital transformation seeks customer facial recognition as shoppers walk in the store, you will likely apply Tensorflow and hire for those skills. Next, the business may want to recommend adjacent products to a customer. The optimal algorithm will be a decision tree, and now you’ll need to hire for that skill. Later you may need email text inference, which requires skills in text tokenizing and stemming before the email data can be fed into Tensorflow. You end up using different languages and frameworks for each new use case. Even within a single use case, the optimal algorithm may change over time as particular frameworks improve for specific tasks.

For the technical hire, you should qualify on aptitude rather than skills. Find the right person, then train them. The apprenticeship approach of giving workers time to learn shows you value your people, which enhances loyalty. You either accept apprenticeship as a cost, or you will need to hire an army of individuals. With AI/ML, you will initially hire the trainers that select and code models. As you do, consider who will grow into the explainers and sustainers.

Regardless of whether your transformation includes machine learning, there are additional skills you’ll need to attain competitive business transformation.

Skills to Compete

Now we are getting into a different mind space altogether. Inclusiveness and variety are now stated goals for leading competitive companies. News headlines have multiple examples where applications failed embarrassingly due to the lack of variety, digital awareness and experience in the transformation team. Even an automatic soap dispenser can have bias if it delivers foam to light-skinned hands but not into the hands of people of color. In this real-world example, the dispenser registered light reflected off caucasian skin, but the Fitzpatrick scale tells us you need a stronger light to trigger the sensor for people of color. A broader team or testing regimen would have identified the problem before release. Similarly, Amazon immediately cancelled a machine learning project once aware of the inherent bias of its trained model. Amazon, hoping to better prioritise future applicants, trained a ML model with resumes from previously successful candidates. Unfortunately, the trained model kept selecting males because most of the successful resumes in the past decade had been predominantly male.

For competitive digital transformation, add these three new groups of skills to your requirements:


Firstly, look at your overall culture and diversity. Without considering culture, you may easily leave your reputation in tatters as in the examples above. Seek out variety in gender. Combine millennials with baby boomers and mix digital natives with digital immigrants. Even variation in birthplace and societal culture creates the variety of viewpoints needed to ward off potential bias. Hearing different voices will help identify gaps in testing criteria and in training data sets.


The second set of skills leads to “digital dexterity.” Remember, you want the benefits of digital transformation to be experienced by the largest number of people across your organization. This effort involves evangelizing the changes to the entire organization through training and communication. Ensure that all those using technology feel completely comfortable and skilled with the technology. Identify an ambassador to the executive team, someone outside the regular reporting structure. Look for a person on the fast path to leadership—maybe recently out of college—and assigned a mentor from the executive level. This ambassador will communicate important achievements and crucial requirements when needed. Also, look for an internal VC. Sometimes the executive sponsor of the transformation is not the same person as the budgetary sponsor. Ensure someone has the skills to build a VC-like pitch for continued funding.


Today’s app-driven world makes User Experience (UX) and Customer Experience (CX) critical. These are terms not equivalent, as UX is an app category focusing on human interaction with technology, while CX goes beyond the application to the full interaction a human will have with your organization. Are people walking in a door, or onto a factory floor, or calling via phone to reach your digitally transformed technology? What happens after they exit the website or application? Owning these experiences is as critical to successful competitive digital transformation as understanding the experiences offered by your competitors. It’s essential to correlate user and customer experience to application performance and business impact.

The best way to understand the strengths of your team for competitive digital transformation is to create a simple table of skills mentioned above as rows, and team candidates as columns. As you build out the team, check off the skills. In essence, any skill not provided by the team will need to be provided by you as the Agent of Transformation.

Further Reading:

Turning Digital Transformation into Digital Dexterity

This is a copy of an original post on the AppDynamics blog here.

The Strategic Brief:

In a disruptive business world, digitizing the traditional workplace is not enough. Digital dexterity gives you power to make lasting, impactful change.

The goal of digital dexterity is to build a flexible, agile workplace and workforce invested in the success of the organization. This dexterity allows the enterprise to treat employees like consumers—researching their challenges, goals and desired technologies—and then allowing the employees to exploit existing and emerging technologies for better business outcomes. This post advises line-of-business and product owners, already acting as agents of transformation inside their enterprises, on extending the metamorphosis into dexterity.

© 2003 Marco Coulter

© 2003 Marco Coulter

The Road vs. The Mountaintop

The journey begins with digital transformation, a road leading to multiple destinations. It is not a singular goal, but rather a way of life. Even digital-first enterprises continue to transform as they experiment with different business models or expand into new markets.

Enterprises executing digital transformations share three common goals:

  • Making analog tasks digital
  • Seeking new ways to solve old problems
  • Making the business better

While all three goals are important, the technical challenges of digital transformation often end up overshadowing the goal of improving the business. Transformation must leave the business not just different, but better. The transformed enterprise needs to be more agile in both application development and business. Digital transformation needs to result in a new company with digital ’savvy’, an understanding of the power of the data being collected, and the flexible and informed mindset required for digital dexterity.

Dexterity vs. Transformation

The real goal of digital transformation is to shorten the time required to transform business processes. How quickly can you spot a new or altered opportunity? Is the business digitally savvy enough to comprehend the possibilities of new technologies like blockchain, internet of things, and edge computing? Is your business now digitally dextrous?

Digitization without exploiting resultant data is a negative technical investment.

The next step in this journey—data extraction and data-driven decision making—mines the real value of going digital. The significant power of digital over analog is the ease of accumulating and assessing data, including data on each customer click, each cloud system executed upon, each line of code, and even each stage in a business process.

Often the first projects in digital transformation take long, lapsed periods of time. IT will need to rebuild itself first, and take many steps to respond faster to evolving needs. IT will also need to upgrade traditional waterfall models into agile development lifecycles with continuous integration and continuous delivery. Departments can restructure to create DevOps teams to reduce time from coding to deployment.

In the middle of long transformation projects, it is worth stepping back and asking anew: Why are we doing this? Digital transformation has been around so long, it may feel like it’s past the use-by date. Though some enterprises birth as digital-first, many are still struggling with basic analog-digital transformation. In the rush to deal with technology of multi-channel digitization, the goal often is missed.

(For more on digital transformation in specific industry verticals, read the AppDynamics blog posts on insurance, retail bankingand construction.)

Digital Dexterity

Once your digital processes are generating data, the next step is to ensure you can exploit the wisdom of that data.

Achieving digital dexterity requires a new culture on both the business and technical sides. The technology team not only needs the technical skills to transform, but also the diplomatic skills to boost the organization’s digital dexterity. Amongst the “best coders on the planet” that you hire, you will want to seed the best communicators and evangelists as well. The business team will initially need your support in understanding what can be exploited with technology; the technical team will need to communicate using business terms. Similarly, these teams need to be presented with clear correlations from their application deliverables to business outcomes. Developing a multichannel awareness may be a new thing for your salesforce.

The real measure of dexterity is the enterprise’s ability to empower technical staff to make business decisions, and business staff to drive technical choices.

Challenges You Will Meet


Gartner’s 2018 CIO Survey reveals that CIOs believe corporate culture is one of the biggest barriers to digital transformation, followed by resources and talent. Those three elements make up 82% of digital business impediments, the survey says.

Consider expanding DevOps into BizDevOps. For this, you will need a nervous system connected to all parts of your enterprise to define common goals for both the business and technical teams, both of which need a common, shared view of data to allow differently trained participants to discuss and identify solutions.

Build a common vision and strategy across your business and technology leaders. Collaborative learning across team and knowledge structures is an effective way to help employees become dextrous.

Embracing diversity is a key action that adds a variety of viewpoints for spotting new opportunities. Make sure your strategy considers the employee experience (also a good time to preclude bias for gender, disabilities, etc.). Consider if the approach makes the employee more business-literate and more empowered to exploit new business processes.

Application owners need to continuously search out ways to improve employee effectiveness. The applications we develop should always listen to, interpret, and learn from their users. In the same way smart speakers were extremely stupid initially but self-improved over time, the enterprise application should consider user activity and create more efficient workflows for the user.

Technical Delay

As part of digital transformation, enterprises build out business intelligence frameworks, creating data lakes and gaining a rearview-mirror view of their business. Executives may even bring on data scientists to create models to predict the coming quarter. Each of these actions has value but excludes one key timeframe: today. Right now.

Why Aim for Dexterity?

Every company today is experiencing disruption. In fact, more companies experience disruption than act as disruptors. Right now, there’s a startup somewhere that will eventually flip to a business model that challenges yours. It might be a small change, or a permanent change in the marketplace. Your job is to prepare your enterprise by making sure your employees are empowered with self-serve, consumer-like technologies, and that they’re aware of the possibilities of change.

A dextrous enterprise can easily respond to market movements and disruptions. New businesses can be created with less struggle once it’s easier to connect departments and businesses. Employees with common awareness of the business—and the technology supporting the business—can readily identify, define and exploit new revenue opportunities. The holy grail alignment of IT and business will come through having all parties look at the same data to enable data-driven decisions.

Remember, the dextrous enterprise provides a consumer-like experience for its employees.

Transformation must leave the business not just improved, but better at surviving disruptions. The transformed enterprise is more agile in both development and business. It is able to rapidly integrate and partner with external businesses when the opportunity or need arises, and connect disparate business processes into a new buyer’s journey when a disruptor changes the marketplace. Digital transformation needs to deliver a new company that understands the power of collected data and the flexibility to harness the latest technology.

Digital dexterity is people using digital technologies to think, act and organize themselves in new and productive ways.

For more uses cases supporting digital dexterity, read how customers are using Business iQ for AppDynamics.

An Interface Refresh Can Revitalize Existing Features

Refreshing an interface feels new, even when it does nothing that actually is new.

The Strategic Brief:
Applications are not always brand new. We all use many applications that have been in use for years, perhaps decades. For mobile applications, refreshing the interface is a requirement, rather than a nice to have. If you are lucky, it will be delivered by updates to the underlying OS at little development cost (e.g. notifications in iOS 10). If unlucky, matching your interface to the esthetic of an OS may require you to redesign your app from the ground up. If done well, redesign can be more than refreshing for customers, it can be reengaging.

Refreshing an application interface can reenergize your users
In kicking the tires of iOS 10, there are clear changes in experience for existing features. A noticeable one is notifications. Notifications still do what they always did – an application calls an API to let the owner know a piece of information by displaying it on the lock screen. It is still notifying .. but it looks like a new feature.
I reacted viscerally and immediately to the new notification style. It feels like a new feature. It makes me want to pay attention to the notifications more often and more deeply. I am re-engaged with notifications. It feels new, though it does nothing that actually is new.
See the before and after shots from Politico below.

iOS 9 Notifications


iOS 10 Notifications

The design, coding, testing and quality assurance around refreshing an interface costs money. It can even cost more than adding a new feature. If the user interface code is not isolated within the application code, a refresh can involve changes to a significant number of modules.

Public Betas are a two-edged sword

In the above example, Apple invested even further by offering a public beta as well. A beta means your feature receives significant testing before general availability. A public beta also introduces risk. If early reactions are negative, the release’s reputation is sullied. This uncertainty is mitigated by the ability to address problems before the product is released. Coordinating and supporting all the people involved in a public beta is an additional expense for a refresh.

Is refreshing an interface a good or bad idea?

Like everything in technology, the answer is .. “it depends”. If you are lucky, you may get a free refresh. In iOS10 – your code calls the same API but the underlying operating system gives the result a refreshed appearance. The display looks different, as in the Notifications example above.(1) Sometimes, the operating system or framework will force a refresh on you.If it changes it’s navigation esthetic, you may have to redesign your app from the ground up to match. Users expect consistency in their experience.

Making happier customers

If your goal is to improve the user experience, you will need usage details from your customers. If you have enough specifics to understand how your customers use the feature today, you stand a good chance of reworking the interface to optimize the common tasks. If you are not tracking usage, then an interface refresh is more likely to be an egotistical exercise of how your developers think it should be used.

Refreshing an interface may elongate an aging product’s lifespan

A young application grows by adding more features. Eventually the application matures and may not need more features. To keep the revenue alive, product managers will still want further releases. The good news is that every year the industry identifies new and more efficient navigation techniques. Adding these into an aging product is a valuable way to revitalize the product, and give yourself an additional version to release.

1. Though a little cheeky, you could try to claim this as an application refresh.

Why aren’t they using my new feature?



The Strategic Brief:

When users are not using a feature that was popular in early testing, consider whether you explained the value of the feature as well as the function. Even when the use of a feature may be easily self-taught, the benefit or purpose may not be so obvious. This is a necessary lesson for manual writers and even for those writing simple help pages. Consider asking your writers and editors to bring in samples of good and bad writing. Practising on other peoples work can remove the personal feelings of reviewing written work.

A recent question from a product manager asked why users were not using a new feature. In beta testing, users enthused about the feature. Now it was in the field, only the beta testers seemed to be using it. My response was a simple question, “Did you tell them why they should be using it?” Lack of awareness is the most common reason not to use a beneficial feature.

For years, I used Canon’s point and shoot cameras, but one of my pet peeves was their manuals. The point and shoot camera is intended for the amateur photographer. Taking a good photo can be complex and Canon builds features into the camera to allow for those special opportunities. Below is their attempt to describe the fish-eye effect.


Canon Manual Example

In the Advanced Guide part of the User Manual in a section titled “Shooting with a Fish-Eye Lens Effect (Fish-Eye Effect)”. The description of the feature is detailed as .. wait for it .. “Shoot with the distorting effect of a fish-eye lens”. Well, thanks for that! They do include a sample photo of a dog’s nose using the effect.Most of us are amateurs who never went to photography school and never used a fish-eye lens. Hmm, is this a special feature for shooting dog’s noses? Canon described the feature, but failed to describe why they had bothered to put the feature in the camera. (See the end of this article for my attempt at a rewrite.)Now let’s take a look at how the Nikon S1 manual describes their Creative Modes.


Nikon Manual Example

Still quite brief – but what a difference! Nikon describes the feature, breaking it out into what you should do and what the camera will do (especially for Night landscape). Then they add a brief overview of why the feature exists. That extra sentence in each description helps you learn why to use a feature. Nikon’s manual helps you be a better photographer.Technology produces a lot of user guides, technical manuals, and quick start guides. Some are excellent, like the actionable guidance found in most IBM ‘Red’ books. Some are terrible, merely listing the available settings with no guidance as to why the developers thought that feature was worth including in the product. Yes, even for software costing hundreds of thousands of dollars.One habit I picked up as a public speaker was re-writing other peoples speeches. Today when listening to speakers, I try to rephrase statements to improve the ‘punch’ and clarify the goal of the statement. This is a good habit to get yourself and your team into. Get your writers to bring in an example of good and bad manuals. Discuss why they see them as good or bad. Then get them to rewrite a portion of the bad one. As a quick example, here is my rewritten description for the Canon example above:

Shooting with a Fish-Eye Lens Effect (Fish-Eye Effect).This mode simulates the style of a wide-angle lens known as a fisheye lens. The center of the photo is distorted to appear closer to the camera and the edges made to appear more distant. Use this mode to create the impression of being as close as possible to the subject in the photo. In the sample photo of a dog, the full head appears, while the snout gains attention by appearing out of proportion.

Feel free to improve on my rewrite in the comment section.

Does the digitally savvy enterprise still need a CIO?

Saying you do not need a CIO because everyone uses technology is like saying you do not need a CFO because everyone has a bank account.


The Strategic Brief:

The digitally savvy company will adopt competencies from ‘born-digital’ successes like Amazon; ‘disruptive’ successes like Apple and ‘cross-over’ successes like GE. It will challenge prior investment strategies and be ready to shutdown traditional business streams in order to create new digital products and business models. Becoming digitally savvy requires a pilot. Even with a company of millennial digitally-aware staff, technology needs an advocate. The challenge is not normally the new hires, but making the C-suite and executive teams digitally aware. A company needs someone who has the executive profile to take the company from current state to digitally savvy state. The CIO can and should be the catalyst that increases the rate of travel towards the digital savvy destination.


The Digital Generation

The ‘senior’ generation in 2016 grew up in a world where technologist was a specialist role. I repeatedly learn this while volunteering in one-on-one teaching of computer skills to the elderly at NYC’s public libraries. “I was a nurse or builder, and I did not need to learn Excel. That was IT’s job.” is often heard. These senior students are often starting from “what is a browser?” and “how do I use a mouse?”. During their successful careers, they did not have to learn how to use technology. Technology was someone else’s job.

The generation reaching executive management today used technology since childhood. Technology is self-serve. This generation googles for answers to issues before they contact technical support, and prefer to bring their own devices to the workplace. If technology is self-serve and everywhere, then why does the IT department still exist? Saying you do not need a team focused on technology because the whole world uses technology is like saying you do not need a CFO because everyone has a bank account. In any digital organization, you still need someone focused on technology. Yes, their role needs to be very different. The role of the IT department is no longer just supporting technology or computerizing processes, it is now about weaving digital savvy into the products and services you deliver.


©2016 Marco Coulter

©2016 Marco Coulter

savvy |ˈsavē| – shrewdness and practical knowledge; the ability to make good judgments.

Digitally Savvy

The ‘digitally savvy’ company is aware of the disruptive nature of the digital age. Digital commerce and digital marketing significantly changed the way you sell, but did not necessarily change your company and offerings. Digital savvy is the next stage, requiring leaders to target digital products and promote digital-aware business model changes. The journey begins with assessing ‘born-digital’ companies (google, facebook, amazon) and deciding which competencies you must adopt from them. This should reach through to internal investment strategies and embracing approaches exploited by start-up companies and VC investors. Alongside is the continuous monitoring of changing materials and technology. Can miniaturized sensors enable deeper data analysis for your customers like it did for Babolat? Like Babolat, the digitally savvy company creates a program of work for how digital changes their products and services. Simply defining a goal of ‘we want to be digital’ is not enough. The end point is that you change what you make and sell, as well as how you sell it. For most companies, you may require board-member education sessions, supporting c-suite education, and then down several management levels. Digital savvy requires the ability to move your culture forward at the pace at which the market is moving.

Signs of the Wrong Road

Do the changes feel incremental? Polishing or incrementally extending todays model is a warning flag you are not approaching digital savvy correctly. Consider a magazine deciding to go digital. One approach would be creating a PDF version of their paper magazine and offer that to readers. This is a ‘polish’ to existing product, but is not changing their business model of price per issue. The other direction would be to use their product as content for a website. Build themselves an interactive community and changing to a click-based advertising business model providing a more dynamic reader experience by updating stories immediately before publication, and a better customer experience by providing data back to advertisers.

Are you over-reliant on past experience? Humans rely on pattern recognition. Pattern recognition can be a strength when reading this sentence. In reading, you are matching the letters to mental templates of the alphabet. In leadership, pattern recognition can be powerful in assessing risk based on prior experience – e.g. “we need data backups in case of a failure of our cloud provider”. But it quickly becomes a weakness when experienced managers are over reliant on past behaviors – e.g. “there is no point trying to hit quota in the first quarter as we always miss”. Acceptance of the current state as normal can be an impediment to success and even lead your company to irrelevance.

Is digital part of your strategic plan? Note, I am not asking if digital is part of your IT plan, but part of the overall company strategy. This should extend all the way to your internal investment portfolio. Traditionally, the internal portfolio for a company invests in new products once a proven business case is put together. Quick evaluation of potential is followed by detailed examination of the market and finally the potential product/service qualifies as an item on the strategic portfolio. Success is assumed due to the research performed before investment. That traditional investment portfolio with product traditional results. To match the disruption coming from start-up competitors, a large company needs to adopt the competencies of VC and Angel investors. The disruptive internal portfolio has up to a dozen projects receiving investment, with an expectation that only about three of them will deliver breakthrough successes. Higher risk, for higher gain. To gain the shrewdness of ‘digital savvy’, the digital must be part of company strategy, plans, and policy.

Are you ready to shutdown or curtail some traditional portions of your business? There is enormous gravity to existing business models, pricing and distribution methods. They are familiar and comfortable to you as well as your customers. Yet if you consider the successfully transformative digital companies you will notice they transform the business model as well as product and service delivery. Expect to make wholesale changes. Expect it to be painful. Make sure you plan to alleviate that pain as much as possible for staff and for customers. Your customers want you to succeed. They want to have a great experience. They will help you through the change, if you make it possible for them to do so.

Personalization Becomes Participation (In the Words of Monty Python)

Customers are individuals, and participate more deeply in response to a meaningful level of personalization.

The Strategic Brief:

If you are stepping up to responsibility for Customer Experience (CX) in your business, you may be considering the title as Chief Experience Officer or CEO. However, your CEO may question others having the same acronym as themselves, so the industry uses CXO. There is a better choice – Chief Participation Officer. Customers are individuals, and respond more deeply to a meaningful level of personalization. The real result of focus on customer experience is more than connection or engagement or permission to market. Participation with you as a business is the true success criteria for customer experience management.


Technology promises a personal customer experience for the hoi polloi.

“You’re all individuals!” declared Brian to his followers in the Monty Python film. The devotees responded in unison, “Yes, we’re all individuals!”. Lovely irony. Yet when I use an Automatic Teller Machine, I do not get to feel much like an individual. I receive a set of default options for my withdrawal; the same as everyone on the planet. What if the bank tracked my regular behavior and offer that amount and combination of notes as the first option to me? Decades have passed since the ATM was introduced in the early 1960’s. Meanwhile, the difference in customer experience from the first ATM and a current one is basically the addition of a color touchscreen.
The personalized experience used to be exclusive to the extremely wealthy. Downton Abbey gives tantalizing glimpses of such lifestyles. If wealthy enough, you could define how each aspect of the day would unfold for you: fresh flowers present before you awoke; medium poached eggs over a crispy bagel served to your bed; bath temperature set to your preference; clothes laid out; shoes shone; rose petals thrown on the steps as you exit towards your chaffer-driven transport. (1)

We never really got the individual experience.

The web was meant to get personal. Computer technology has always asked for your personal data and promised the individual experience in return. Does it feel that individual? On a modern website, the extent of personalization is often limited to showing a username and maybe some basic configuration settings. Commonly, you do not even get to re-organize menu items based on the options you commonly use. Amazon is well-known for its’ recommendations based on your purchases. Have you had the experience of seeing recommendations after you have made the purchase? You already have it, why show you recommendations that could only trigger buyers remorse? (2)
Smartphones moved technology into a personal scale. The device is both individual to a single user, and powerful enough to support a unique user-based experience. By allowing applications, mobile technology delivers access to information that can be customized to your specific interests and needs. Yet most applications fail to support meaningful personalization, let alone support disability access.

Developers misunderstand the value of exploiting the data that their customers share with them. While it is reasonable to exploit the data for your company’s benefit, you also need to reward your customers an exceptional customer experience.
The side benefit of personalization is that it requires a deeper understanding of the overall customer experience. What is really going on in the interaction? Why is the consumer there? What do you get? What do they get? Such detailed examination will often allow removal of many less important aspects of the interaction for both parties. You need to reach the experiential point where the customer participates with you. They can easily identify why they are using your offering, and are prepared to share that information through online reviews, feedback sessions with product teams, discussion forums. They are not just engaged, they participate.

Participation Scorecard.

Try this score-card to see how you deliver on key aspects of personal customer experience. This is not a complete list, but intended to give an instant status check of where your current participation health today. See how you score out of ten with each being worth one point.

  • You measure and set goals for the participation level from your customers – how many stars will they give your company this month? (be careful to watch the ratio of how often you ask against how often they use your services)
  • You measure and set goals for the participation value to your business – how many stars would the company give to these interactions with this consumer? (answering the is it worth it question)
  • Your application design process brings customers into the partnership (not just indirectly via product management)
  • Your delivered services have responsive design – they function and feel the same whether mobile app, web, face-to-face, etc.)
  • Your design values feature discoverability, feedback, proper mapping, appropriate use of constraints, the power to undo one’s operations, above prettiness or trendiness – though beauty is sometimes and effective delivery system
  • You keep information concurrent information across locations – if i just paid my bill via the web, don’t show it as outstanding on my mobile app
    You ensure applications reflect how customer decisions affect the outcome – e.g.  ordering a different item changes the expected delivery time
  • You get your code to do the work instead of your customer by automatically tuning the experience based on how we use your application – e.g. ‘we notice you use the share price page quite often, can we move this to your home page? Yes/no’
  • You make the data you have on your customer exportable so they can easily see and analyze what you know about them – e.g. Facebook’s ad preferences; Uber’s customer rating
  • You empower privacy for your customers by always using opt-in approaches and never a default checkbox – building a trusting relationship with the consumer so they will share information with you freely. NOTE: This is where the personalization will pay off. As users see how you use data to make their life better, and save them time or money; they will be prepared to share more details with you.

Update: Some suggest Uber as an exemplary personalized experience. Not so. Uber is the ‘Santa Claus’ experience – knowing if you have been naughty or nice. Uber puts coal in the drivers stocking if they have been naughty (and does this to the customer as well). Though Uber was more personalized than the default taxi experience, it fails for not personalizing the interface. If I never use for a delivery service, the UberRUSH option should eventually move itself off the main menu. That would be a personalized experience. Also for the sight-challenged, how about a method to get the drivers name and number plate in a huge font as the car arrives?


  1. Ok, I have never been that wealthy, but I imagine that is what would happen.
  2. Hmm. Could it be they their advertising revenue more important than your experience as a customer?

Delight is the Success Criteria for Application Code

The Strategic Brief:

The hardest decision in coding applications is to admit your users are NOT delighted – and that you have to rebuild. It takes discipline to move application development from functional to delightful. It can only happen when there is clearly understanding by coders of the purpose of the code. Connecting coders to customers is hard work. It takes deeper effort from product managers and marketing teams, though the payoff is worth it.

Public speaking is supposed to be one of the most stressful events in life. Luckily, I am pretty comfortable in front of audiences. As a musician in bands, I was lucky enough to perform before thousands. It was ok because I was part of a band. My inner critic could interpret success or failure as being caused by other members of the band! 😉 (It was the bassist’s fault, or the singer never won the audience). Then a friend asked me to read my poetry in a public forum before an audience of about 30 people. Yikes! The poetry was my own words and thoughts, performed by me. Solo. Whether the audience applauded or threw vegetables, it was solely down to me. They were judging me personally. Yikes!!

One of the things I love about code is that code does not judge. Code either compiles or fails. It runs or it does not. Code does not care what clothes I have on. There is no human value judgement involved with coding. Except that is not true.

Code has no value or quality on its own.

Until code has a user, it is the bad poetry that never leaves your high school diary. Using code makes it real – the user gives code purpose. The value judgement in code is what it does for a user. Airbnb makes finding a bed in a foreign town easier. The binary nature of code function is decided by whether a user can find a bed or not. The quality of code is found in the experience of the user.

Compare coding an application to making a movie. Director Jon Favreau describes the movie editing process. “The first [compilation of the film from the editor] you view is terrible! Each edit makes the film less terrible. Then somewhere in the process it starts to be good … and maybe even great.” Like a movie, the application code begins as terrible. Edits make it less terrible, and eventually the code runs. It is functional. HERE IS WHERE FAILURE HIDES. If you get to running, functional code and think the job is done. Well, sorry, you failed. The movie director is not even done once they get to a good edit. Movie directors then perform test screenings to see how audiences react. Based on audience reactions, reshoots are performed, final edits are made and then the film is complete. Running code gets you to the first alpha test. Here you should be both looking for user response, as well as bugs.

Sometimes delight is baked in. You were lucky enough to identify the delightful aspect for your customer during a sprint or early mockup phase. Sometimes delight is serendipity, in the coding process you find something beautiful to reveal to customers. Sometimes you get to functional and delight is still not there. The hardest choice to make is to admit your users are NOT delighted – and that you got it wrong – and you have to rebuild (or reshoot in movie terms). If coders cannot be users of the code themselves, you must connect them to customers, not just product managers. Code must be seen in use to understand delight in its’ use.

Applications have one distinct difference to movies. Movies get one distribution, or maybe an additional director’s cut. Applications get multiple versions. On the plus side, this allows you to address challenges over time. However, be careful this does not turn into the dependency on the adage of ‘we will delight them in the next release’.

EDIT: Reader’s asked what happened with the poetry reading? The audience applauded, but I realized I was better with processors than poetry. 😉

One App to Rule Them All (Why Messaging Matters)


Strategic Brief:

Are you processing credit card transactions on a website or mobile application? Maybe you offer a hosting service, or build business websites. Get ready, the nature of the web business transaction is changing. You should already have a strategy for accepting transactions from players like Apple’s Messages, WeChat and Facebook Messenger. When supporting Instant Messenger transactions, you will need more than purchase buttons. Sourcing AI-bot technology will be a requirement to support the presale and purchase conversations via these messenger applications.


U! S! A! Number One!!

We have heard the chant at ball games, olympics, and in convention centers. It seems an irrevocable truth (to Americans). Even knowing that it was not true in every circumstance, information technology is one sector where that leadership has remained for decades.

From the IBM mainframes of the 50’s, through distributed systems, PC’s, the internet, smart phones, and wearables, the US has commonly led the invention and adoption of technology for consumer usage. Competition drove this leadership.

US mobile applications began with potpourri approach.

Currently, the US mobile application economy relies on a flurry of different apps each eyeing a small piece of the pie from our needs and services. Need a car, try Uber or Lyft. Need to park a car, try Luxe. Need food, try Seamless. Need accomodation, try Airbnb. Want to update your friends, try Facebook. Catching up with friends on video, try Skype. Want to buy a cup of coffee, try Starbucks. Looking for a camera, try BandH. Want to listen to music, Spotify. Want to shop online, try Amazon. Want to know if a restaurant is good, try Yelp. Want to pay for something with someone you do not know, try PayPal. (Hmm, just realized I have 190 applications on my smartphone. Need help managing credentials for all these applications, try 1Password.)

This is competition at its best, right? Players competing in the specific field until one comes out as the consumer default. Sometimes though, don’t you get tired of entering credit card details in each, and creating a new login and thinking of a new password?

China disrupts with a unified model.

WeChat is the ultimate ‘platform’ player, offering all the services described above and more embedded within a single application. In China, fixed price items, like a croissant, come with a QR code; just scan it from WeChat and you have paid. One login for shopping, food, travel advice, accomodation, and staying in touch with friends. To give you an idea of scale, by offering a ‘lottery’ approach to gifting friends WeChat Wallet grew 100 million users in a single month (1) – over three million per day – from what started as a messaging application.

US leadership in mobile applications initially defined the market. For WeChat, entering the field later let them observe existing players on the US playing field and gather together the approaches into a single platform for the Chinese market they understood well.

So far, WeChat is limited to a Chinese artifact, with only 10% of active users outside of China. I first learned of the application from Chinese students when a panel judge for an NYU Master course. For now, it does not need to worry about competition from US players in the Chinese market. The diversity of applications needed to match WeChat makes the existing user base a tough mountain for any intruder to climb.

The challenge for WeChat will be to find a footing in other countries against incumbents. If your family is already on Facebook/Instagram/Snapchat, what will motivate you to switch to WeChat? (Hint: Consultants with strong in-country connections for regions with distinct cultural and business idioms like the EU, should be pounding on owner Tencent Holdings door).

US mobile application companies now seen responding.

Facebook Messenger introduced the ability to transfer money inside the application. They did not see 100 million new users resulting from their approach, so expect further attempts.

While most coverage of Apple’s WWDC conference highlighted the planned additions of ‘stickers’ into Apple’s Messages application – the feature with subtle impact is the support of API’s. This will permit application developers to connect into Messages for communicaiton, and also allow them to connect into Apple Pay for business transactions. Want to participate in a paid treasure hunt in New York CIty? Imagine registering on Messages and receiving a calendar reminder and map location in reply giving you instant directions to a starting point and an invitation to a Messages group. When you meet up there, the guide shows a QR code for you to scan. Once your payment is approved, you receive a PDF in iBooks that gives you the directions, clues and treasures to find. Everything happening on the Messages platform.

WeChat has a stronger feature set today. It is in front of US leaders like Facebook and Apple. The US does not like to be number two. The next two years are going to be very interesting in global mobile platforms.

  1. Acknowledging as the source of the 100 million new user number.

Is DevOps Enough in a SaaS World?


Strategic Brief:

The devops approach was meant to give us agile development AND more manageable applications – reducing cost of operations and making apps more reliable. Being close to ops helps, but it is not the full story. It works well for individual applications, but can miss the complex interactions between applications, and between applications and the users or consumers who operate them. The Sprint approach that grew up within Google is good as a starting point, but a more continuous relationship needs to grow out of it.



When I was a system programmer, I worked on a project setting up a HL7 (1) communication network for a group of hospitals. In early configurations, different applications processed HL7 transactions at different rates, so you needed a queuing mechanism to allow backlogs to be solved without losing data. So far, so good. Except, each laboratory application and queuing application was developed separately and often came from different vendors. Even within the same application, each component might have its own commands to collect information on transactions. Developers gave no thought to the operational need for measuring and managing the overall queues once in production. (Developers over here, operations over there).

This hot mess went live.

The phone calls began. Hospitals complaining about laboratory results not getting back to the wards in time. It turned out that some format errors caused transactions to get stuck in the queues (the application code definitely under-delivered in dealing with variances in field content). Operations needed a way to see what was going on between the applications.

An in-house developer created a complex PERL script triggered the various commands, cleaned up the results, and reported on the queues. The first attempt revealed all the details of the queues. Yippee, development solved the problem.

DevOps will only solve part of the problem.

Now operations had ALL the details, but without any meaningful way to interpret and act on it. Operations did not know what to identify as an impending problem. How long a delay was acceptable in a hospital? (Developers over here, operations over there, customers further out.)

Finally, the developers met with some nurses (customers!) to understand the ‘time’ requirement. The goal was that a transaction from labs to bedside should always be faster than the time for a human (generally a nurse) to physically run there.

A rewrite of a script highlighted queues approaching thresholds. Now meaningful thresholds could be set and recommended actions be added in. The stuck transactions were flagged for development to sort out. If queues got too long, the relevant hospital could be informed to switch over to manual delivery of results to wards. The angry phone calls stopped.

  1. A little background. HL7 is a standard for transfer of clinical information between various healthcare applications.

The ‘I’ in CIO is for Information

The Strategic Brief:

For the last sixty years, the title for the person in charge of IT should really have been the Chief Digitization Officer rather than Chief Information Officer (CIO). Today’s technology enables the CIO to focus on information as well as technology. As CIO, you must own the connection of your customers to your business – the customer experience (CX). Personalizing this experience will require collecting more information about your customers. There are multiple information collection approaches, and you must select those that will give you sufficient details, and more importantly match the type of relationship desired with your customers.

For decades, the purpose of information technology was to capture and store information about the business. The 1957 Hepburn/Tracy classic “Desk Set” tells the story of “two extremely strong personalities clashing over the digitization of a TV network’s research department.” (1) In the film, the news research department was humans, books, papers and all of that knowledge was being digitized. Digitizing existing information was all the rage.


Chief Information Officers (CIO) have existed since World War II, yet for the first seventy years it may have been more accurate to call them Chief Digitization Officers. They were responsible for taking processes and information from analog to digital. Now, with mobile applications and the internet of things (IoT), the job is changing enough that they are finally earning the ‘I’ in the title.

For decades, CIO’s spent most of their time on acquiring real estate to house computers; operating the computers, networks and storage; developing or buying needed software; connecting devices to the centralized systems; and managing the people needed to make all this happen. IT was really digitizing existing processes – instead of bank tellers handwriting deposits in giant tomes with manually totaled results, they entered the deposit into a computer. Companies did not learn that much more about their customers, they were merely digitizing existing information. In simple terms, that early CIO focused on the ‘technology’ part of information technology, not on the information part.

Leap forward sixty years to 2016’s omnipresent mobile applications and accompanying personalization. IT now moves well beyond the role of digitizing processes into owning the connection of the business to the customer. The new CIO will now own the frontline customer experience (CX). Customer experience personalization is a crucial survival tenet for 2016.

In addition to all the technical responsibilities above, the CIO must now focus on the ‘information’ part of IT. To support customer experience, and in particular personalization, a business must collect and understand significantly more information about the customer. A critical component of customer trust will be clearly explaining why you are collecting and using the information and how you are protecting the customer during and after collection. (This is so important, it will get a separate article with deeper analysis soon).

With each generation in society, the relationship between a customer and a business becomes more digital than IRL (2). Take the example of enterprise software vendors in a sales cycle for a new technology. Previously, the vendor team would visit the client and explain the new technology to initiate the sales process. Today enterprise IT staff use the internet and social networks to discover and research new technologies. Once the technology is identified, the IT team seek out the vendors offering the needed features. Most of the sales cycle is done before a salesperson even enters the conversation. Technology supporting self service is only the first step in IT involvement in the customer experience.

The takeaway is that while existing information must still be aggregated and connected, significantly more new information must also be attained to support improving the customer experience. This is now a key requirement for the business relationship with customers, and must be handled with finesse. Base how you will collate information on the nature of the relationship you want with your customers. We can transform our relationships with our customers by making the technology part of IT take a step back and focusing on the information part. The CIO is finally earning the ‘information’ part of their title.

  1. For fun, compare how computers are portrayed in that film versus the ‘Mr Robot’ TV series. For more fun, compare Hepburn’s job to Google.
  2. In Real Life

NYU SPS Masters “Ask the Experts Series”

Adjunct Professor Steven Menges recently interviewed me for NYU’s Masters in Management and Systems (STEM) program. The interview is broken into six short videos for you to enjoy.

  • How is information technology transforming marketing?


  • What are investors looking for in a business plan pitch?


  • What is influencer marketing especially around business-to-business marketing?


  • How can a data set be used in marketing?


  • What new technologies should we be watching for?


  • What advice would you give for students in our Masters degree programs?