辅导案例-ISA 496

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

4/28/2016 Miami University ISA 496 Final Report
Page | 0
Understanding the
LexisNexis
Customer Experience
The Dream Team:
MIAMI UNIVERSITY ISA 496 Client Project
4/28/2016 Miami University ISA 496 Final Report
Page | 1

Table of Contents
I. Introduction………………………………………………………………2
II. Project Approach and Outcomes …………………………………….......2
III. Methods…………………………………………………………………..4
A. Data Retrieval……………………………………………………...4
B. Data Understanding…………………………………………….......5
C. Data Preparation………………………………………………........6
D. Data Modeling and Analysis…………………………………….....6
E. Data Visualization……………………………………………….....8
IV. Insights and Client Delivery …………………………………………......22
V. Concluding Remarks………………………………………………..........26
VI. Appendix………………………………………………………………....27

4/28/2016 Miami University ISA 496 Final Report
Page | 2

ISA 496: LexisNexis Final Report

I. INTRODUCTION

As students in the ISA 496 practicum course, we work to provide analytics consulting to
various business clients to work through and solve analytical, data driven problems. This
semester, we have been tasked with completing a client project for LexisNexis -- a leading global
provider of content-enabled workflow solutions designed specifically for professionals in the
legal, risk management, corporate, government, law enforcement, accounting, and academic
markets1. The basis of our project revolves around the customer feedback generated by
LexisNexis products and how to understand it. The process of obtaining and storing this data is
not currently standardized, leaving room for manual error. LexisNexis has asked that we help
them better understand the customer sentiment related to their products, specifically Lexis
Advance. We have developed an understanding of project requirements and expectations of our
client, which in part has shaped the main goal of our analysis: To develop an automated
process methodology for LexisNexis that will help them to better manage customer
feedback.
This process will allow LexisNexis to identify and extract subjective information in order
to derive valuable business insights. Our team is interested in analyzing how net promoter score
(NPS) relates to the customer feedback data at hand. Through our deliverables, LexisNexis will
be able to better understand their consumer, which will enable them to make informed decisions
in the future.
II. PROJECT APPROACH & OUTCOMES
We understand that there is a tremendous amount of value to be gained from better utilizing
customer data. In order to fully satisfy LexisNexis’s needs, we have taken note of the specific
questions they asked us to consider. These include:
 Who uses the products? What do users like/dislike about the product? What is
their overall sentiment regarding LexisNexis?

1
"Solutions for Professionals Who Shape the World." Welcome to LexisNexis. RELX Group, 2016. Web. 03 Mar.
2016. .
4/28/2016 Miami University ISA 496 Final Report
Page | 3

 What drives user evaluation through Net Promoter Score (NPS)?
● What aspects of the product should we improve? What aspects of the product
should we test better?
Our goal is to answer these questions by completing four objectives:
1. A categorization or “binning” of the text data to better identify the top customer
concerns.
2. A visualization or dashboard to gauge customer service and sentiment.
3. An analysis or model relating the customer feedback data to the NPS.
4. An implementable process that will help LexisNexis assess changes over time, show
correlation between data streams, as well as correlation between usage and evaluation.

In order to complete these objectives, our team utilized the following process methodology
to structure our project.

Figure 1. Dream Team Project Process Plan

Please refer to Appendix A for descriptions of all tools and software packages utilized in the
development of our solution.

Data
Retrieval
Data
Understanding
Data
Preparation
Data
Analysis &
Modeling
Data
Visualization
Insights &
Client
Delivery
4/28/2016 Miami University ISA 496 Final Report
Page | 4

III. METHODS
1. Data Retrieval
Upon project kickoff, LexisNexis provided a series of datasets for project use. The
following table describes the datasets:
Table 1. Description of Data Received from LexisNexis
After receiving this data, we set out to further understand the current collection methods
used in retrieving it. We sought to understand how the data was created and collected to identify
possible solutions and reduce bottlenecks in the processes. In identifying issues with current
collection, we can suggest resolutions at the source. We first began by looking at the Call Topics
dataset. Calls are split into different categories at their initiation, ranging from account care,
legal, news and financial and technical support, amongst other things. We were provided a look
into the current Spectrum tool used to collect the information from these calls, which we were
informed is a constantly evolving system. Example screenshots of this tool are provided in
Appendix B, Figure 2. Because of the proactive move towards a new tool, our focus of change in
the collection process will be upon the retrieval of email data as well as NPS data, rather than
call topics.
Data File
Type
No. of
Observations
Source
Call Topics .xlsx 277,988 Spectrum: The LexisNexis Customer Support Desktop
NL Feedback
Emails
.msg 3,607 Customer Emails to LexisNexis Service
Representatives with comments through product
webpages
Net Promoter
Score
Information
.xlsx 15,262 NPS Questionnaire Responses
VOC & Summary
Information
.xlsx 1,405 Voice of Customer Survey
LA (Lexis
Advance)
Feedback
.xlsx 11,510 Summarized Call Topics
4/28/2016 Miami University ISA 496 Final Report
Page | 5

The NPS questionnaire is currently sent out weekly with a response rate of 3.5%2. For the
most recent week of data provided, approximately 70 responses were received. This suggests that
about 2,000 NPS questionnaires are sent out weekly to select customers. A screenshot of the
NPS questionnaire is provided in Appendix B, Figure 3.
Where we saw the most opportunity for improvement in the collection process is in the
collection of the NL Feedback emails. These emails are generated through a drop down on each
product’s homepage, which offers a section for comments, alongside an optional name, and
email address. A screenshot of a feedback email section is provided in Appendix B, Figure 4.
Because this section was made up of manual input entirely, each response was incredibly
difficult to get in a useable format to analyze. We are aware that the product is specified based
on which product page the email is prompted from, but we believe that offering customers the
option of standardized drop downs for multiple issues that they are experiencing could be very
beneficial. We have derived these from our LDA topics found through analysis and will discuss
the solutions further in our Insights & Client Delivery section.
2. Data Understanding
In order to understand the datasets, we began to explore them with different software
packages. For the Call Topics and NPS Information datasets, we used JMP to analyze the
distributions of key variables (included in Appendix C, Figures 1 and 2), generate useable
information from the data, and identify areas requiring further preparation. These distributions
reveal that the majority of customer call topics come from the Call Type - Access
Product/Service, the CSR Split - Legal Research Tasks or Technical Support, and the Market -
Law Firm associate fields. Most of the responses for the NPS questionnaire come from Small
Law customers. While most responders are considered detractors, the overwhelming majority
say they definitely will continue use of Lexis Advance.
When exploring the NL Feedback Emails dataset, we encountered difficulties due to the
file format. In addition, a portion of the emails were automatic replies and email chains with
customer service representatives, the removal of which was important to improving the quality of
the dataset.

2 “The return rate to NPS emails varies by customer segments, but on average it is approximately 3.5%” Quoted
from Jim Robinson, Google Groups Discussion: NPS Customer Survey Questionnaire Desired. Feb 11. 2016.
4/28/2016 Miami University ISA 496 Final Report
Page | 6

We determined that the VOC & Summary Information and LA Feedback datasets
contained data too summarized to be useful in accomplishing our objectives. Therefore, we will
not be using these datasets in the development of any models or processes.
3. Data Preparation
Each of the data sets provided required the use of different tools to cleanse, transform,
and manipulate them to generate useful information. These tools are included in Appendix A,
Table 1. The goal of this cleansing and transformation was to get the data in a format compatible
with R in order to complete analysis, create visualizations and generate findings regarding the
problem at hand.
The NPS and Call Topics data sets were easy to clean in Excel, after which they were
both imported into R for analysis. Appendix D, Tables 1-3 provide lookup tables of recoded
variables in the Call Topics data.
The Feedback emails were converted from Microsoft Outlook .msg files to raw text data
through a system called Total Mail Converter, followed by a Python script in order to put the
data into the proper format for R. Further preparation and processing in R will be included as
part of our analysis section.
4. Data Analysis & Modeling
Once data were prepared and compatible with R, we completed all data analysis using R
and its component packages. In order to accomplish our team’s project objectives, we utilized a
combination of text mining, sentiment analysis and topic modeling methods.
In order to conduct text mining, we used the “tm” package in R for all three data sets.
Within “tm”, we placed all of the open-ended text into a document term matrix. A document
term matrix is a mathematical matrix that describes the frequency of terms that occur in a
collection of documents. The rows correspond to documents in the collection and the columns
correspond to terms. A more detailed explanation of how we constructed the document term
matrix is located in Appendix F, List 1.
4/28/2016 Miami University ISA 496 Final Report
Page | 7

As part of objective two, we wanted to identify the most prevalent emotions3 customers
feel towards LexisNexis’s products. We conducted this analysis within R Studio with a tool
called “Syuzhet” to calculate the percentage of emotion in the text. The NRC sentiment function
within “Syuzhet” implements Saif Mohammad’s emotion lexicon. According to Mohammad,
“the NRC emotion lexicon is a list of words and their associations with eight emotions (anger,
fear, anticipation, trust, surprise, sadness, joy, and disgust)”.4
For even further analysis of the NPS data and customer sentiment, we conducted
sentiment analysis using the “Stringr” package in R to apply a sentiment function frequently used
and studied by Richard T. Watson, a professor at the University of Georgia5. Sentiment analysis
is a popular and simple text mining method of measuring aggregate feeling. The function
matched the words within the document term matrix with a dictionary of positive words (score =
+1) and negative words (score = -1). The goal of this method is to calculate a sentiment score
(sum of positive words – sum of negative words) which can be aggregated at multiple levels of
the NPS data.
Although a goal of this project was to understand the sentiment of all customers, our team
strayed away from conducting sentiment analysis for the NL Feedback emails dataset. The
reasoning behind this was the likelihood of the dataset to be passive or negative. The email
section on LexisNexis’s product page was presented as an outlet for customer service issues,
therefore positive feedback is unlikely. To provide valuable analysis for LexisNexis, we
analyzed the NL Feedback Emails data set using LDA or Latent Dirichlet Allocation; a form of
topic modeling. Diane J. Hu of University of California - San Diego describes “Latent Dirichlet
Allocation (LDA) [as] an unsupervised, statistical approach to document modeling that discovers
latent semantic topics in large collections of text analysis and topic modeling”6. Topic modeling
allows LexisNexis to categorize and bin (in accordance with objective 1) their product related

3 Saif, M. (n.d.). NRC Word-Emotion Association Lexicon. Retrieved April 21, 2016. from
http://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm
4 Saif, M. (n.d.). NRC Word-Emotion Association Lexicon. Retrieved April 21, 2016. from
http://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm
5 Watson, R. T. (1995). Data management: an organizational perspective. John Wiley & Sons, Inc.
6 Hu, D. J. (2009). Latent dirichlet allocation for text, images, and music.University of California, San Diego.
Retrieved April, 26, 2013.
4/28/2016 Miami University ISA 496 Final Report
Page | 8

text issues in an interactive and adaptable way. We completed this analysis using R’s “lda”
package.
Topic modeling, specifically LDA, is typically evaluated by measuring the performance
of a training model on secondary, validation data. This process is iterative and can be
implemented using a technique referred to as Gibbs sampling, which is effective with
probabilistic models such as LDA. Gibbs sampling allows us to sample from a probabilistic
distribution without having to perform more mathematical integral calculations. Further
information regarding the implementation of Gibbs sampling for LDA evaluation can be
accessed in the accompanied citation.7
The combination of sentiment analysis and topic modeling methods we used are made
even more impactful when visualized using techniques like word clouds, “LDAvis” and more
which will be discussed in greater depth in the next section.
5. Data Visualization
Visualizations enable LexisNexis to grasp difficult analysis methodologies in order to
improve decision-making. We created a multitude of visualizations in order to make sense of the
large datasets provided. Accompanying the report, we have provided the R script necessary to
produce an interactive dashboard. This combines all visualizations into an environment allowing
LexisNexis to easily and effectively connect and understand our visualizations, removing the
silos of the separate data sets. Through data visualization, we set out to answer the following
questions prompted by LexisNexis.
Who uses the products? What do users like/dislike about the product? What
is their overall sentiment regarding LexisNexis?
NPS Sentiment Analysis
We used the NPS dataset in order to begin answering these questions. As previously
mentioned, NPS scores ranks users into three groups—Detractors, Passives, or Promoters. It is a

7A gentle introduction to topic modeling using R [Web log post]. (2015, September 29). Retrieved March 3, 2016,
from https://eight2late.wordpress.com/2015/09/29/a-gentle-introduction-to-topic-modeling-using-r/#comments

4/28/2016 Miami University ISA 496 Final Report
Page | 9

way of measuring the willingness of customers to recommend a product or service to others and
a general way to gauge overall customer satisfaction with and loyalty towards a company.
By manipulating the NPS data from the LexisNexis survey in R, we were able to create
word clouds as a measure of sentiment within groups. The detractor word cloud shows that
those responders describe Lexis Advance as difficult to search within, dislike the results, and
often make comparisons to Westlaw, a product from Thomas Reuters. Those responders who
were categorized as passive, on the other hand, describe Lexis Advance as complicated and are
frustrated with the results of their searches. Responders who fell into the promoter category
described Lexis Advance as easy to use, useful, powerful, quick, up-to-date, and as a quality
tool.
Detractor Word Cloud Passive Word Cloud

Promoter Word Cloud

Figures 2-4. Word Clouds for NPS Categories

4/28/2016 Miami University ISA 496 Final Report
Page | 10

NPS Emotions Analysis
We were also interested in visually gauging the emotional sentiment within each group.
To reiterate, the analysis is based off Saif Mohammad’s emotion lexicon and implemented using
R’s “Syuzhet” package. In reviewing the results, the most prevalent emotion shown by detractors
and passives is fear. On the other hand, the overwhelming emotion for promoters was trust.

Figure 5. Emotional Sentiment for Detractor
4/28/2016 Miami University ISA 496 Final Report
Page | 11

Figure 6. Emotional Sentiment for Passive

Figure 7. Emotional Sentiment for Promoter

4/28/2016 Miami University ISA 496 Final Report
Page | 12

Mean Sentiment for NPS Category, Segment, and Firm Size

Figure 8 shows the calculation of mean sentiment score for each NPS category. As one
would likely hypothesize, the mean sentiment score for detractors is the most negative while the
mean sentiment for the promoters is the most positive with a score of 5.17.

Figure 8. Mean Sentiment Score by NPS Category

In Figure 9, we measured the mean sentiment score by user segment. According to the
NPS questionnaire responses, there are 8 segments: Unknown, BIS (Business Insight Solution),
Corporate Large, Corporate Small, Large Law, Small Law, Federal Government and State/Local
Government. According to this visualization, small law firms have the lowest mean sentiment
score. BIS, or Business Insight Solution users, have the highest mean sentiment score and
therefore have more positive feelings about Lexis Advance than the rest of the segment.
However, looking at the overall average sentiment scores, the highest score is only a two. This
suggests that overall these segments do not have strong positive feelings towards Lexis Advance.

4/28/2016 Miami University ISA 496 Final Report
Page | 13

Figure 9. Mean Sentiment Score by Segment

4/28/2016 Miami University ISA 496 Final Report
Page | 14

In Figure 10, we evaluated mean sentiment score by firm size. We found that firms with
1500-1999 employees have the highest mean sentiment score, meaning that they have more
positive feelings about Lexis Advance than any larger or smaller firms.

Figure 10. Mean Sentiment Score by Firm Size

Another important question we sought to answer was:
What drives user evaluation through Net Promoter Score (NPS)?
Predictive Modeling with NPS Data
Through data visualization and modeling we found that Continue to Use - whether and to
what degree a customer plans to continue using LexisNexis Products - and NPS Rating are
highly correlated, evidence of which can be seen in Appendix F, Table 1. On average, users who
specify that they definitely will continue to use Lexis Advance have an NPS rating of 9.83 while
users who specify that they definitely will not have a rating of 1.53.
4/28/2016 Miami University ISA 496 Final Report
Page | 15

We developed a decision tree using SAS Enterprise Miner modeling the NPS user
evaluation data excluding the Continue to Use variable in order to find other variables, which
might drive user evaluation. A full picture of the model is included in Appendix F, Figure 1. We
found that when the variable Continue to Use is excluded, the most significant variables in
explaining the categorization of responders as Promoter (+1), Passive (0) or Detractor (-1) are
whether the responder had Decision Authority and the extent to which the responder uses (the
variable Advance Usage) Lexis Advance. On average, an individual who uses LexisNexis
products fewer than three times in 30 days is about 79% likely to be categorized as a detractor or
passive when he or she has some decision authority in the products they use. Without those
decision tree splits, there is a 70% likelihood of classification as a passive or detractor.

4/28/2016 Miami University ISA 496 Final Report
Page | 16

Additional NPS Data Visualizations
The relationship between the segment to which a customer belongs in and the variable
Continue to Use also provides ideas for what drives user evaluation. This graph presents an
interesting story—while the majority of all users plan to continue use of Lexis Advance,
customers from the small law segment are the most likely to discontinue use.

Figure 11. Relationship between Segment and Continued Use

The bar charts below share more information regarding NPS user evaluation. The first bar
chart shows the relationship between NPS Category and different segments. To reiterate, the
NPS questionnaire identifies 8 segments: Unknown, BIS (Business Insight Solution), Corporate
Large, Corporate Small, Large Law, Small Law, Federal Government and State/Local
Government. We found that the segment of customers who most often promoted Lexis Advance
were Large Corporations, Large Law Firms and Federal Government Agencies. The segments
4/28/2016 Miami University ISA 496 Final Report
Page | 17

that contained the most detractors included Small Law firms and State/Local Government
customers. These findings reveal that there is an association between smaller, more locally
focused customers and dissatisfaction with Lexis Advance.

Figure 12. Relationship between NPS Category and Segment
The second bar chart shows the relationship between NPS Category and firm size by
percentage. According to this plot, NPS ranking is associated with the size of the firm. Firms
with less than fifty employees have more detractors than promoters, and firms with 1000 or more
employees have more promoters.

4/28/2016 Miami University ISA 496 Final Report
Page | 18

Figure 13. Relationship between NPS Category and Firm Size

The final question we worked to answer was:
What aspects of the product should we improve and which aspects
should we test better?
Call Topics Analysis
We created visualizations with the Call Topics dataset as well as the NL Feedback emails
dataset. Our first insight comes from Call Topics, which reveals that the top three most common
call reasons for all three products are formulating searches, troubleshooting/access issues, and
how to use feature/function. It is clear that an area to address is the ability for the customer to
find the information they need. As a search engine, that functionality is paramount.
4/28/2016 Miami University ISA 496 Final Report
Page | 19

Table 2. Top Ten Call Reasons by Product
NL Feedback Email Visualization
Our text analysis of the NL Feedback Emails data was also significant in helping identify
areas of improvement in LexisNexis’s products. Figure 14 gives an initial look at the emails prior
to more in-depth topic modeling analysis. The bar plot reveals that the most comments and
complaints from these emails are associated with the FullDocView, Home, and Search page
names. Therefore, we suggest more attention be placed on improving the ease of navigation
within LexisNexis’s products.

Figure 14. Number of Comments & Complaints by Page Name
4/28/2016 Miami University ISA 496 Final Report
Page | 20

NL Feedback Emails Topics Modeling
To represent the results of our LDA topic modeling, we utilized “LDAvis”, a package
within R, which strives to make a fitted topic model interpretable and understandable. Since
LDA finds “hidden topics”, it is necessary to develop a visualization that discovers and
articulates the “found topics”. Through “LDAvis”, we can visualize questions of a topic model
like the meaning of a topic, the prevalence of a topic, and the relationships between the words
contained within a topic.
Figure 15 shows a static rendering of the Lexis Advance web based LDA visualization
using R’s “LDAvis” functionality. This visualization gives LexisNexis the ability to categorize
and bin customer’s open-ended comments, revealing areas of top customer concern (See
Objective 1). Accompanied with this report are the R scripts necessary to produce these
interactive visualizations for all LexisNexis products. As the user navigates the interactive
visualization, he or she can view the top 30 most relevant terms for each topic. From there, the
user will need to apply judgment to identify the hidden topic meaning. For Lexis Advance, the
topics found through LDA modeling are contained in Table 3.

Topic Frequent Terms Topic Description
1 Time, Frustrating, Work User interaction with product at work
2 Footnotes, Citation, Format Search content citations and references
3 Error, Issues, Tried Troubleshooting and Access Issues
4 Search, Filter, View Search-related functions
5 Appellate, Court, Statute Legal Matters and Case Law

Table 3. Description of Lexis Advance LDA Topic Categories
An annotated version of Figure 15 containing a breakdown of the components of this
visualization and a detailed description of each element are located in Appendix F, Figure 3. Our
goal was to stop at a number of topics which were clearly differentiated (no overlapping with one
another) and that were reasonably interpretable. Examples of other topic number outputs is
included in Appendix F, Figure 4.

4/28/2016 Miami University ISA 496 Final Report
Page | 21

Figure 15. Lexis Advance LDA Visualization
4/28/2016 Miami University ISA 496 Final Report
Page | 22

IV. Insights & Client Delivery
What does this all mean? After conducting extensive analysis and creating many visualizations,
we set out to identify the most valuable ways for LexisNexis to utilize our findings. Below are
actionable insights and recommendations, which address areas where LexisNexis can most
improve its products and customer relationships.
Insights Recommendations
Data Collection is manual, open-
ended and has low response
rates (especially NPS)
 Add dropdown options for NL Feedback Emails
comments with multiple pre-filled options that align
with the 5 LDA topics and/or most prominent call topic
categories
 Cater each NPS Survey question to all NPS categories.
Instead of binary responses, they would be ordinal: -1,
0, +1 responses. (Refer to Appendix F, Figure 8 for an
example)
 Incentivize users to take survey to improve the current
response rate (3.5%)
 Bolster web chat capabilities to better address
immediate problems
WHY? Better collection of data = better analysis =
more effective solutions
Mean Sentiment Scores are
higher among larger firms and
small firms have some of the
highest rates of detractors.
Current pricing favors large
firms with economies of scale8.
 Continue to market heavily to larger firms with
potentially larger contracts and approximately 41%
operational income growth
 Offer pricing options and discounts for small firms
more likely to detract
 Prevent more detracting scores because companies
whose NPS <60 see a greater (-54%) decline in growth
from detractors than increase in growth from
promoters9
WHY? Customers who are detractors are 5 X more
likely to attrite
Detractors from the NPS data
are apt to compare LexisNexis to
WestLaw
 Offer those detractors more easily accessible outlets to
express concerns with LexisNexis products
 Maintain communication with detractors and stress
strengths of LN over WestLaw

8 http://www.lexisnexis.com/terms/21/pricing/
9 Eastman, D. (n.d.). The ROI of NPS: How a Focus on Customer Loyalty Delivers Financial Gains. Retrieved April
21, 2016.
4/28/2016 Miami University ISA 496 Final Report
Page | 23

WHY? Customers who are detractors are 5 X more
likely to attrite
Customers struggle to formulate
searches and access the
documents they want, when they
want them
 Like with the product feedback comments, bolster web
chat capabilities to better address immediate problems
Why? As a technology platform, ease of use is a key
portion to determining actual use10
Table 4. Insights and Recommendations
In order to provide the most value in LexisNexis’ customer journey and improve upon the
insights we have gathered, we have provided the following deliverables:
1. A categorization or “binning” of the text data to better identify the top customer
concerns.
Through topic modeling and sentiment analysis in R, we were able to identify the top 10
reasons for inbound customer service calls by product according to the call topic dataset. The top
three were consistent for all product lines. These three reasons are 1) formulating searches, 2)
troubleshooting access, and 3) how to use features/functions.
In the email dataset, LDA modeling identified five distinguishable topics of concern for
Lexis Advance described in Table 3. The five topics include customers interaction with the
product at work, customers with requests for assistance when searching content citations and
references, customers who have troubleshooting and access issues, customers with comments on
search-related functions, and last customers with comments regarding legal matters and case law.
All supporting R Scripts and LDA visualizations provide LexisNexis the ability to reproduce
this analysis and keep these categories in mind when conducting future analyses.
2. A visualization or dashboard to gauge customer service and sentiment.
Using R’s Interactive web application interface Shiny Dashboard, we have developed a
dashboard merging the visualizations contained within this report into one interactive dashboard.
We have provided all supporting R script as an attachment to a web copy of the report or by
request.
3. An analysis or model relating the customer feedback data to the NPS.
Through the development of a decision tree model predicting the categorization of NPS, our
team was able to better identify the main drivers in user evaluation. These drivers include

10 Technology Acceptance Model. https://en.wikipedia.org/wiki/Technology_acceptance_model. Retrieved April
27, 2016.
4/28/2016 Miami University ISA 496 Final Report
Page | 24

Decision Authority – involvement in which product the customer uses –, Usage in the Last 30
days, and Browser Type. The extremely high correlation of the variable Continue to Use and the
composition of questions within the NPS survey caused bottlenecks in relating other customer
related data sets to NPS. In Table 4 and Figure 16, we provide a potential solution to improve the
collection of NPS data for ease of analysis in the future.
4. An implementable process that will help LexisNexis assess changes over time, show
correlation between data streams, as well as correlation between usage and evaluation.
Finally, the following page contains a process we believe has the ability to enable LexisNexis
to prevent customer attrition, improve product development and increase returns on a growing
number of promoters.

4/28/2016 Miami University ISA 496 Final Report
Page | 25

Figure 16. Flow Map of Process to Manage Customer Feedback
4/28/2016 Miami University ISA 496 Final Report
Page | 26

V. CONCLUDING REMARKS
We hope that our analysis of the three primary customer data streams has provided
LexisNexis with a better understanding of customer sentiment related to their products. Utilizing
this data in a more automated and standardized way, will allow LexisNexis to continue to be a
market leader and further distance themselves from competitors. LexisNexis believes that when
in the right hands, the information and technology they provide can enable people to change the
world. Giving LexisNexis the tools, insights, and recommendations to utilize the sentiment of
those people will enable them to deliver on that mission.
We thank the LexisNexis team for their time and effort in working with Miami University to
improve their organization. Please refer to the accompanying materials including our appendix,
presentation, and all code necessary to conduct analysis and visualizations to deliver our
solutions. If questions regarding our solution and its deployment come up, please do not hesitate
to contact team representative Lauren Curtis at [email protected].

4/28/2016 Miami University ISA 496 Final Report
Page | 27

APPENDIX
The following appendix contains supporting documentation, examples, and/or additional
visualizations not included in the final report. Figure and table numbering aligns with the section
that the appendix supports.
Appendix A: Project Approach and Outcomes
Tools Description
JMP Pro Interactive software for desktop statistical discovery11
R Free software environment for statistical computing and
graphics12
Total Mail Converter Convert Emails (MSG, EML) to PDF, TXT, DOC, PST in
batch via user interface or command line13. ~ $60 for a license.
Microsoft Excel and
Power BI
Excel 2013 Add-in for data preparation and visual analysis14
Python A programming language that lets you work quickly and
integrate systems effectively15
Microsoft Vizio Tool for process mapping and workflow visualization16
SAS Enterprise
Miner
Solution to create accurate predictive and descriptive models
on large volumes of data across different sources of the
organization17
TABLE A-1. Tools Utilized in Project

11
http://www.jmp.com/en_us/software/jmp-pro.html
12
R Core Team (2013). R: A language and environment for statistical
computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.
13
Total Mail Converter (2016). Computer software. Coolutils.com. N.p., n.d. Web. URL
https://www.coolutils.com/TotalMailConverter.
14
https://powerbi.microsoft.com/en-us/
15
Python (2016). Computer software. Python. Vers. 3.5.1. Python Software Foundation, n.d. Web. URL
https://www.python.org/.
16 https://products.office.com/en-us/Visio/flowchart-software
17 http://www.sas.com/en_ph/software/analytics/enterprise-miner.html
4/28/2016 Miami University ISA 496 Final Report
Page | 28

Appendix B: Data Retrieval

FIGURE B-2. Example Spectrum Tool
4/28/2016 Miami University ISA 496 Final Report
Page | 29

FIGURE B-3. Example NPS Questionnaire Questions
4/28/2016 Miami University ISA 496 Final Report
Page | 30

FIGURE B-4. Feedback Email Data Collection

4/28/2016 Miami University ISA 496 Final Report
Page | 31

Appendix C: Data Understanding

FIGURE C-1. Distribution of Key Variables in the Call Topics Dataset

FIGURE C-2. Distribution of Key Variables in the NPS Dataset
4/28/2016 Miami University ISA 496 Final Report
Page | 32

Appendix D: Data Preparation

TABLES D-1: D-3. Index of Recoding for Call Types, CSR Splits, and Market in
Data Preparation

Appendix E: Data Analysis & Modeling
List E-1.
Steps to creating a Document Term Matrix
 Read in CSV’s to create a data frame (essentially a data table similar to excel)
 Created a Corpus from Comments column of data frame
 Pre-processed corpus
o remove punctuation and numbers
o remove stop words
o convert text to lowercase
o stemmed words to root word
 Created a document term matrix(DTM) of remaining words
o Removed any documents that have no words as a result of pre-processing

Code Call Type
1 Access Product/Service
2 Account Mgt
3 Consultant
4 Doc Delivery
5 Search
6 Usability
7 Unknown
Code CSR SPLIT Desc
1 LEGL Legal Research
2 LNC Old System Category
3 NF News & Financial
4 NONC NA
5 OPER Access
6 TECH Technical Support
7 Unknown NA
Code Market
1 Academic associated fields
2 Bar
3 Consumers
4 Corporate associated fields
5 External
6 Government
7 Internal
8 Law firm associated fields
9 N/A
10 Unknown
4/28/2016 Miami University ISA 496 Final Report
Page | 33

Appendix F. Data Visualization

FIGURE F-1. NPS Decision Tree Model
4/28/2016 Miami University ISA 496 Final Report
Page | 34

FIGURE F-2. Decision Tree Cumulative Lift
The above Cumulative Lift chart shows that the Decision Tree model created to predict NPS is
only roughly 16% more accurate than random guessing. This supports the recommendation that
the available inputs and structure of the questionnaire do not support model building as it could.

TABLE F-1. Average NPS Rating by Whether Customers will Continue Use of LexisNexis

4/28/2016 Miami University ISA 496 Final Report
Page | 35

Figure F-3. Lexis Advance LDA Visualization
1. Left Panel: Inter-Topic Distance Map
 “Topic Landscape”
 Provides sense of topic similarity by approximating distances between topics
 Inter-topic differences calculated by the Jensen-Shannon divergence
 Scaling the set of inter-topic distances defaults to Principal Components
 Inter-topic differences mapped to PC allow user to analyze correlations between topics
3. Top Panel Slider
2. Right Panel
1. Left Panel
4/28/2016 Miami University ISA 496 Final Report
Page | 36

 Size of circle represents term-frequency or token frequenc
2. Right Panel: Top 30 Most Salient Terms
 Chuang’s Key Terms
i. quantities measure how much information a term conveys about topics by
computing the Kullback-Liebler divergence between the term & the marginal
distribution of topics (distinctiveness), which is optionally weighted by the terms
overall frequency (saliency)
 Relevance: method and/or measure to rank terms within topics and interpret topics
i. Compromise between the probability of word given the topic and the probability
within topic divided by overall frequency of the word
ii. How does a given term(w) apply to a topic(t) and how frequently is it found in t
iii. Most relevant terms are displayed in the bar chart on the right side of
visualization
 Comparing the widths of the red and gray bars for a given term, users can quickly
understand whether a term is highly relevant to the selected topic because of its lift (a
high ratio of red to gray), or its probability (absolute width of red)
i. Red = Relevance to selected topic
ii. Gray = Overall term frequency in text corpus
3. Top Panel Slider: Relevance Metric
 Adjust lambda in relevance metric calculation
 Lambda addresses the compromise presented in relevance metric
i. Lambda = 1: rank words solely on the width of the red bar (favors common
words)
ii. Lambda = 0: rank words solely on ratio of red to gray (favors rare words)
 Recent study claims ideal lambda is a value equal to .6

4/28/2016 Miami University ISA 496 Final Report
Page | 37

Figure F-4. Lexis Advance LDA Visualization Four Topics (Top) and Six Topics (Bottom)
4/28/2016 Miami University ISA 496 Final Report
Page | 38

4/28/2016 Miami University ISA 496 Final Report
Page | 39

Figures F-5: F-7. Classification Count by Topic of Lexis Advance (F-5),
NewLexis (F-6) and Research (F-7)

Figure F-8. Example Reformatted NPS Questionnaire Answers