Data mining and data warehousing in the airline industry

Data Mining in Airlines – NOTE

Data mining and data warehousing in the airline industry;


The global airline industry has recently greatly matured as air travel has become statistically
One of the safest modes of transportation, mainly because of its tool data mining which is applied to the aviation sector. The term “data mining” is often called as knowledge discovery which refers to the process of analyzing data from different perspectives and summarizing it in useful information by means of a number of analytical tools and techniques. The massive increase of air space demand has led to the need to analyze many different alternatives that can increase air space capacity while maintaining safety.
This article explains the research deals with the challenge of applying data mining techniques as data mining is a useful tool in conducting this analysis. It analysis methods, techniques and the kind of problems that data mining can help to solve. As there is a rapid growth of air travel, this data mining tool identifies safety related patterns and methods to provide air safety officers with information needed to formulate appropriate suitable actions.

Furthermore this article covers the data mining methods and focuses on the different pre-processing, the selection of features which have been applied, evaluation of data patterns and the different data sets which have been gathered from the analysis. Though its often challenging to access this data in order to make the best use of it. Analyze existing data basis, identify relationships between existing data systems so as to avoid inconstancies when these are move to the warehouse. The goal of warehousing data drastically reduce the time required in the smart decision making process. It is important for all airlines to make real time decision on traffic pattern, weather related issues, passenger scheduling, logistics, cargo deliveries, security, ticket pricing and changes in capacity planning.

Both data mining and data warehousing allow the investigation of potential flight safety related issues by integrating multiple databases and documents. This helps organization to improve and boost their decision making ability.


There are certain issues faced by the aviation such as technical issues. At times they are unable to support vast data as integration of data can be extremely complex which may affect timely decision making activities. Furthermore the size of the data base and query complexity may affect the type of system needed by the organization.
Secondly, the problem is that as more number of sources is combined, the quality of data may not be up to the standard. Data encryption might affect a company’s ability to use the information which may degrade company’s performance. Another major problem is that data mining may cause ethical and privacy challenges as users may be unaware of their information collected and used, this might result and violation of user’s privacy.
Other problems include the use of data for purposes other than those for which the data was originally collected, as unexpected pattern may evolve which may lead the organization to divert from its original query.
Problems related to data warehousing are that it is a tiring and resource intensive process which requires expertise to built up and maintain a data warehouse which may lead to higher costs. Furthermore, while the cost of storage is decreasing, the infrastructure cost for data management is increasing and airlines may not be able to secure the necessary funding. Insufficient information also create problems for manager to make decision and they often cannot collect, retrieve and dispense worker’s knowledge.


To overcome the major problems, the companies increasingly turn towards new technologies to meet their evolving data warehousing and data mining needs.
One of the solutions is Cloud Computing, which are the hosting services over the internet. This service can take the form either (SAAS) or (PAAS).It allows customers to run their own applications and to success the application via the internet. Another solution for business decision making is the textual data(e.g. mail,contracts,etc).In order to handle textual data, one emerging technology is Textual Extract Transform Load(ETL).To enhance the knowledge management process, companies can make use of existing technologies e.g. data warehouse and data mining.
Furthermore to tackle warehouse challenges, a knowledge warehouse must support the storage and warehouse architecture must support the feedback loops, as line knowledge extraction and real time storage request. Airlines must gauge and fulfill their resource needs and need a broadband system to face the situation.
Airlines can improve their cash flows by more accurately comparing the actual costs incurred and can use a Cloud Computing third party vendor for their data warehousing needs, which could further help reduce the total cost of ownership. Small carriers can be brought into use and air lines must access the level of data and processing of power of a Tiar One Airline.
Over the years, advances in data capture, computer processing power, disk storage capabilities and statistical software have increased the accuracy of analysis and brought the cost down. If data is effectively handled, it could save from bigger problems.


The entire process of the analysis of data further leads to market analysis. It determines the customers’ choices and how to fulfill their demands. Once the customers’ choices are a accomplished, it can lead the airline industry to flourish. The company’s resources could be effectively managed with financial planning, all sorts of risks could be evaluated and prior steps could be taken to overcome these, and greater benefits could be enjoyed.

The incapacity of this paper reflects that the analyst was unable to select appropriate and sufficient data which could not bring any fruitful results to the airline company. Nothing is mentioned here about customers’ preferences and how to make the airline industry a ‘big success’. However, the opportunity gained is to get to know about the problems being highlighted and better solutions for handling ‘ Big data ‘ and some ideas for better business.
TOPIC: How Airlines Mine Personal Data In-Flight
Flight Attendants Are Likely to Know What Fliers Will Buy on Board

Main Theme:-
The theme of this paper is how different airlines manipulate and use its customer data for customer satisfaction and increase in revenue of the airline.
Introduction or background:-
In this very short and simple paper the writer has studied and discussed the process of data collection by the airline’s crew from its frequent customers(passengers) and that how this data can help them to serve its regular or frequent flier better in future by utilizing the available data to them their fliers.
This phenomenon is modern and airlines are focusing and still studying how to best utilize these information, not only to improve their current revenue but also to be innovative in the service.
The cabin crew staff would be familiar with the flier’s presences and taste, and they shall provide those services with no delay and hindrance.
With advantages of this data a possibility is there that the some of the customer might not agree with this idea that their information shall be stored with the company and which can be leaked out in any case.
According to this study some of the customer would not mind if their preference related to their choice for travel , seats or food is recorded but certainly they do not like their personal information to store.

Main problem:
The main problem identified and pointed out in this paper is that most airlines are living in the era of 90’s where they are stuck with conventional system. They are only gathering information related to customer loyalty and their booking preferences which are only useful in the operations. This information is not sufficient to measure the customer satisfaction and preferences.
Methodology to record data:-
Data is a key to most of the businesses and are the core factor in the enhancement of business. Airlines are now in a stage to learn to use this data usefully. The common technique used to records customer data is the history of their behaviors and this is useful for frequent fliers. As frequent customers are the main target segment, hence the airlines record and arrange their data in their digital library from their past and recorded preferences and behaviors. The airlines are tend to store the demographics record of its customers.
Solution and claim:
Some big airlines for instance American, British Airways, jet blue airways corp. etc with a vision, are utilizing this data by converting this data in to digital libraries and organizing to better know customer preferences with a unique identifier which may be phone no. , passenger no, or email address.

This recorded information are then stored and provided to the cabin crew on a tablet or other smart devices, which will guide them about the certain information about the specific passenger like their favorite seat, meal and drinks etc.

Some airlines also are in the business of selling other company products in flight and records show that they generate millions in this kind of sale.

These sales records if recorded can help the Airline to store and keep the most wanted products and service in reach based on the customer information and choices.


The concept of data explained in this paper is very useful and practically applicable. This helps an airline to know their customer well and help it to improve their services. A customer when feel service that is ideally he/she is looking for, would certainly be loyal and become permanent flier. Not only that but he/ she become the word to mouth mode of marketing for that specific carrier too.
As this practice is good for frequent fliers, it on the other hand is not addressing the issues of service for other non- regular customers, which may outline the regular customers. So working and focusing only on frequent fliers will lose a chance to grab other potential customers.
On the other hand, the data collection and storage of this data may be offensive to some customers and they would never wish their personal data as address or travel destination be kept recorded. In worst case it can cause the airline to lose its reputation and if not managed tactfully can become a disaster.
TOPIC: Using decision rules to achieve mass customization of airline services
•    The idea of low cost airlines is not always successful to be profitable as it might compromise at the risk of disaster and reputation. Instead, customers opt for quality service which leads to customers satisfaction and brings high revenue
•    Air lines are forced to cut cost and services as much as possible. Air line revenue comes from carriage of passengers, cargo, male and contracting services.
•    Dominance – based Rough Set Approach (DRSA) helps airlines by cutting cost i.e variable cost.
•    DRSA has its own advantages in the airline service industry to analyze survey on airline service quality. A set of decision rules are induced from the passenger preference data, expressing the relationship between attributes, values, and service ratings.
•    Traditional approach was that the airline quality was evaluated by the passengers and used methods like SERVQUAL, AHP and TOPSIS to rate the airlines, depending on courtesy, safety and comfort.
•    Importance performance analysis was later used to construct evaluation maps of service attributes to identify areas for improvement, as passengers were more concerned about the responsiveness and assurance.
•    Positioning map was established which helped airlines to have knowledge about their competitors, their strengths, weaknesses, areas of improvement, etc.
•    DRSA removes dispensable attributes that does not affect the overall service rating.
•    DRSA is used to formulate airline service strategies by generating decision rules that model passenger preference for airline service quality.
•    DRSA could help airlines eliminate some services associated with dispensable attributes without affecting passenger perception of service quality.
•    DRSA could also help airlines achieve mass customization of airlines services and generate additional revenues by active or passive targeting of quality services passengers.
•    DRSA was an ordered information table . Each row represents an object and each column represents an attribute.
•    Each cell in the table is an evaluation by the respondent of that row about the attribute of that column.
•    While most studies used traditional statistical technique to test their hypothesis and make improvements, DRSA claims to do the opposite by pruning away dispensable.
•    DRSA rules claim to express the antecedents using the more general preference relations” >” and “<” instead.
•    DRSA induces the passenger preference model through classification examples given by passengers in a survey on airline services.
•    The result is a set of decision rules that are actionable by airline managers and could be used to formulate an airline’s service strategy.
•    Results are presented in the sequence : results using reducts and core , decision rules generated with service attribute and decision rules generated with both personal and service attributes.
•    No doubt DRSA achieves the goals of decision analysis and to give recommendations of future decisions.
•    DRSA allows airlines to achieve mass customization of airline services while generating additional revenue for the airline.
•    DRSA is incomparable to any other airline service and technique.
TOPIC:  Efficient Computer Experiment-Based Optimization through Variable Selection
Section A
The study evaluates variable selection for use in regression in data mining through designed experiments as well as statistical modeling useful in representing complex objective function evaluated only by solving optimization sub problem (Shih et al., 2014).
Since huge applications have a huge number of variables, direct utilization of computer experiments requires exceedingly large designed experiment as well as considerable computational effort. Naturally, variable selection can be conducted after executing small computer experiments. However, conventional techniques of variable selection cannot be applied in huge variables. Therefore, the study explores utilization of regression trees as well as multiple testing procedures where performance of chosen variables evaluated by determination coefficient as well as relative errors (Shih et al., 2014).
There are classifications and regression trees (CART) are the common data mining techniques in supervised learning. CART algorithm utilizes binary recursive screening to divide variable space to rectangular regions in accordance with the similarity within the values of response. The commonly used multiple testing is false discovery rate (FDR) procedure. FDR procedure used in variable selection necessitates a definite response variable, which separates data into groups (Shih et al., 2014).
Section B
Variable selection in the study is significant when considering a large number of explanatory candidate variables, which also believed to contain many unnecessary or redundant variables. The conventional two-phase stochastic programming utilizes Bender’s approach. However, two-phase stochastic programming in large-scale problems tends to be sluggish to converge, hence use of two-phase DACE (design and analysis of computer experiments) approach decreases computation used in carrying out optimization (Shih et al., 2014).
The solution the study provides is variable selection that speeds up large-scale optimization techniques according to the design as well as analysis of computer experiments. In the study, DACE is a design utilized in organization of a collection of executed computer experiments in order to facilitate appropriation of statistical “metamodel” which approximates measure of performance from the computer experiment (Shih et al., 2014).
For an existing airline network owning 50 stations with 2358 legs, DACE Phase decreased decision parameters to 1264 dimensions from 6537 with multi-step procedure deriving 141 preliminary extreme points that were later extrapolated to 3562 design points within the practical region. Then the second-phase subproblem is solved for every one of the design points. Of the 1264 variables for decision, many unnecessary ones still exist that can be established through variable selection, to provide a considerably small collection of design points (Shih et al., 2014).
The use of subset in the study maximizes or minimizes the appropriate criterion. In the study, two obvious subsets include the entire set of variable or the finest single variable. However, problem lies in the selection of intermediate subset, which is superior to the two extremes. The weakness lies in finding the necessary variables among the complete collection of variables.
TOPIC: A novel decision rules approach for customer relationship management of the airline market
Section A
The topic is about customer behavior and firms developing permanent relationships with customers (Liou, 2009).
The problem the study tries to solve is precisely forecasting customer behavior to help firms minimize loss of existing customers by vigorously creating an enduring relationship with customers (Liou, 2009).
The paper uses factor analysis; as well, it introduces Variable Consistency Dominance-based Rough Set Approach (VC-DRSA) within customer relationship management (CRM) in airline market (Liou, 2009).
Section B
Literature regarding data mining as well as its application in CRM in the airline market is virtually silent. Various firms ought to be capable of determining their customers’ significance in order to keep or even nurturing potential profitable customers. Customer relationship management (CRM) remains an essential part in the current airline business owing to globalization, market saturation, enhanced competition, and swift technology advances. CRM aim involves understanding the profitability associated with customers and retaining profitable ones (Liou, 2009).
CRM being a dynamic procedure for running a customer–company relation ensures that customers choose to prolong mutually valuable commercial exchanges while restraining themselves from participation in exchanges, which are unprofitable to the company. CRM remains an essential business strategy that a firm should focus on desires of its customers; as well, it must integrate customer-oriented methodology all over the organization (Liou, 2009).
The paper applied the VC-DRSA in CRM of airline market, which is offered in the paper. Contrary to traditional statistical techniques like analysis of discriminant, the potency of rough set theory is it never requires fundamental statistical assumptions (Liou, 2009).
In several real-life issues, organizing properties of the measured attributes have a significant role. For example, such characteristics of objects like product quality and market share often receive typical consideration as criteria in economical problems (Liou, 2009).

TOPIC: Rogue components: their effect and control using logical analysis of data
Section A
The study is about rogue components that plague the airline business by creating havoc in programs of asset management. The study describes the way rogues components develop, summarizes the process leading to hampering asset management program; as well, it examines various negative effects that ensue (Mortada et al., 2012).
The study main problem is controlling the development of rogue components.
The study uses supervised learning technique of data mining known as Logical analysis of data (LAD) within CBM with the aim of detecting rogue elements within a collection of repairable elements. The study applies the resultant LAD decision model on a range of turbo compressors owned by an airline fleet (Mortada et al., 2012).
Section B
Maintenance as well as reliability programs in aircrafts are essential in guaranteeing wellbeing and airworthiness associated with airplanes. Rogue components are very difficult to recognize and is capable of spreading throughout the component population. The major problem surface when rogue elements find their way to asset management programs through spare part inventory for operator. The discovery of such elements is essential to guarantee reliability. The rationale for a component developing rogue breakdown arises from repair or even overhaul tests that never address 100% the component’s functioning, features or environment (Mortada et al., 2012).
In CBM, detection of fault is only achieved where there are indicators capable of revealing information regarding the asset status by screening them. The skill to exploit these indicators relies on where, within the maintenance procedure, rogue element detection occurs. Implementing LAD algorithm can occur at one point in the process either before or after the component gets stored in the repair shop. Carrying out detection before repair, any needless resources, which may be spent on a rogue elements are saved (Mortada et al., 2012).
Results of the tests in the study illustrated LAD technique has the ability to detect automatically rogue components by feeding performance history of the components into the LAD algorithm. Automatic rogue components detection solves the issue of sifting through records to visually evaluate each component (Mortada et al., 2012).
The benefit of using rogue component detection is that it saves a lot of time as well as resources since LAD can accomplish in seconds a task that currently takes days in the industry. However, performing detection of rogue components before repair means that LAD algorithm relies on other indicators in generating decision regarding rogueness in a component (Mortada et al., 2012).
TOPIC: Security investment in aviation industry: a longitudinal analysis
Section A
The study evaluates the way environmental transformation and project features affect the distribution of security-associated Airport Improvement Program (AIP) funding with aviation industry (Wang, Andoh-Baidoo & Sun, 2014).
Attention toward secure transportation dates back to a succession of terror attacks carried out on surface and air-borne vehicles. Terrorist attacks on transportation motor vehicle or even a facility may produce disastrous personal injury, enormous loss of assets, and vast effects to public psychology. One significant function associated with government agencies involves providing financial as well as technical support toward transport carriers in order to improve security procedures (Wang, Andoh-Baidoo & Sun, 2014).
The study evaluates longitudinal trend regarding federal security investment in transportation and its cross-sectional components (Wang, Andoh-Baidoo & Sun, 2014).
Section B
Among the diverse transportation security parts, airport security remains the most contentious because it is vastly invested however it raises concerns. Before the 2001 September 11 terrorist attack, Federal Aviation Administration (FAA) had the responsibility of guaranteeing  airport security with the major source of security funding being Airport Improvement Program (AIP). However, following the attack, Congress Act approved by the then President Bush allowed creation of TSA and tasked it with most of “land-side” security responsibility. In spite of the changes regarding security-related tasks from FAA to TSA, still FAA is responsible to a section of security in airports, particularly the land-side functions (Wang, Andoh-Baidoo & Sun, 2014).
A conceptual connection to the word “security” divulges several aspects of AIP security initiatives supported through government funding.
Results of the study supports theoretical framework that two levels exist that influence the distribution of grants in transportation security and they include environmental changes in macro level as well as project features in micro level. Moreover, analysis points out that 9/11 events and the economic crisis in 2008 greatly affected the provision of grants associated with security and all grants in various ways (Wang, Andoh-Baidoo & Sun, 2014).
The main weakness in the analysis of the study is that it is restrained by the variable choices because of secondary data. For instance, airport region and type are the only two variables describing project characteristics that only approximately indicate location and scope. Moreover, FAA discloses only financed AIP projects, increasing the difficulty of investigating factors that influence awarding of grants. Despite the weaknesses, the study findings still generate some significant implications to both practitioners and researchers. Moreover, it shows that investment in transportation security positively responds to terrorist attacks like 9/11 although this is not the case with economic events like the 2008 economic crisis (Wang, Andoh-Baidoo & Sun, 2014).
TOPIC: Applying a New Model of Customer Value on International Air Passengers’ Market in Taiwan
Main Theme:-
This paper proposes a new model to discover customer value of air passengers by using data mining technologies. The results of this research can be applied in database marketing systems.


?    The procedure applies (RuleQuest Research Pty Ltd, St Ives, New South Wales, Australia), decision tree; transaction records; Frequency, Price Discount, Destination and No-Show (FPDN model; Recency, Frequency and Monetary model based) model variables; and socio- economic variables to create decision rules for airline business.
?    This research provides an approach in finding customer values and establishes meaningful models, which is composed of the Frequency, Price Discount & Destination.
?    The procedure of this paper can be applied in a database marketing system.
?    Hence, data mining is defined as ‘using data analysis and machine learning methods to process data to create meaningful models’. The technologies of data mining, which usually are classification, prediction, estimations, rules, clustering and visualizations, are widely used in marketing systems to find customers’ knowledge and to enhance their loyalties and contributions.
?    In terms of cluster analysis, the RFM model can be partitioned based on customer values to extend the life cycle of customers. Businesses implement a variety of marketing plans to each cluster for enhancing shopping frequencies.
?    Usually, customer values are estimated using the RFM model to increase profits of business. However this research replaces the RFM variables with the other profit variables for air passenger market.
?    This research proposes a model for finding true customer values of airlines. The RFM variables are profit variables for businesses; thus, the FPDN variables are designed to be profit variables of airlines in estimating passengers’ customer values.
?    The FPDN variables consist of positive and negative variables: F (Frequency) and D (Destination) variables are positive profit variables to airlines, whereas P (Price Discount) and N (No-Show) are negative profit variables to airlines.
?    The FPDN model is composed of four variables: Frequency, Price Discount, Destination and No-Show, which are designed for realizing passengers’ shopping behaviors and the benefit of airlines.
?    Passengers are high-profit contributors for airlines if they travel frequently or if their destinations are far, whereas they are low-profit contributors for airlines if they usually purchase high-discounted tickets or if their no-show times are high.

Claim/ Logic of research
•    This paper proposes a procedure to discover customer knowledge, adopting the FPDN model (variables), the transaction records and the socio-economic data of international air passengers as research variables.
•    This study uses See5/C5.0 decision tree algorithm to process the FPDN model, the socio-economic variables and the FPDN markets to create decision rules in database marketing plans.
•    However, this research adopts data mining technologies to process clustering and classification for creating useful models.

The research methods are mainly the See5/C5.0 decision tree algorithm, market segmentation and the FPDN model analysis.
1.    Questionnaire design and sample size
a.    To focus on this market, this study used a questionnaire survey method to collect data. The questionnaire was designed based on the variables of the FPDN model, socio-economic and transaction records.
b.    All the data collected using the Purposive sampling method was first-class data. The measured sample size was calculated using then ¼ Z2_Pð1-PÞe2 , where n is the sample size, which is under the following conditions: confidence interval: 99% (Z: 2.58), tolerance e:0.03 and probability ratio p: 0.5.

2.    Market segmentation by FPDN model
The data of the questionnaires are arranged in processing this analysis. The five levels of the FPDN model are separated, and levels one to five are arranged from the lowest to the highest value. As mentioned in the Decision tree section, the lowest customer value is 1-1-1-1, whereas the highest one is 5-5-5-5.

This paper uses the FPDN variables to process market clustering. Air passengers’ market is partitioned into nine clusters based on the FPDN variables. The See5/C5.0 algorithm processes the data by using cross validation and 25% of pruning to create three decision rules, which are shown in Table 2

Airlines may apply this proposed procedure in their CRM or marketing systems to discover customer values, FPDN markets and decision rules. Three decision rules are generated by this empirical case study; airlines may apply them for the market of Taiwan.
This research develops four profit variables for airlines and generates three rules for understanding air-traveling passengers. This study applies data mining technologies and proposes a procedure in discovering decision rules as well as the FPDN model (based on the RFM model).
The results can be applied in airlines or in other businesses


Leave a Reply

Your email address will not be published. Required fields are marked *