Top 10 Essential Data Science Topics To Real-World Application From The Industry Perspectives

1. This is the start
These days, data science and statistics have never been more popular. John Tukey (Brillinger 2014) stated that the best part about being a statistician was that you can play in every backyard. Xiao Li Meng (2009) also stated that, “We do not just get to play in and clean up everybody’s backyard anymore.” Now, we are invited into the living rooms of everyone and given the responsibility to be their first quantitative nanny.

Professors Lin Lin, He, Wing, and He have written great summaries on research opportunities for statisticians. Education and research provide key knowledge and talent (supply) for the society and industry. We give industry perspectives on Wing and He & Lin 2020 and also discuss other topics that are important for industry demand. Researchers interested in industrial applications may find our discussion useful. To support our comments, a framework for analytics is presented below.

2. There are three types of analytics and their relationship to causality
Figure 1 shows the industry framework to three types analysis. Traditional statistical modeling is about selecting a random population to gain information about the population. While statistical inference is not primarily about prediction, today’s industry tends to embrace predictive analytics. Prescriptive analytics at the top is used to guide decision making when you are certain that something will happen. Prescriptive analytics, which is used in weather forecasting to report on past weather patterns and identify regional patterns, can be described as descriptive. Predictive analytics is weather forecast. Prescriptive Analytics is where you determine whether a storm will hit tomorrow and if it will affect your ability to go to work/school. You may be able to make a causal assessment of every decision regarding safety and effectiveness. You then balance safety and benefit to make the best decision. Prescriptive Analytics, also known as prescriptive analytics, is the ability to see what will happen if we do A or B. To calculate the effect, causal inference must be used. See Bojinov et al. For further discussion on causality, see Bojinov et al. (2020).

3. Data Science Projects: What are the Typical Skill Requirements?
Figure 2 illustrates a typical data-science project. This is similar to Wing’s (2019) Data Life Cycle. The project begins with analytic consultation to define the problem and scope, and then it proceeds to gather and process data. Next, models are created (analytics), and insights are extracted. Finally, reports are produced. The business then deploys the models and implements them in a system. These stages require four skill sets: Computer Science, Statistics/Mathematics, Subject Matter Expertise (important for the application area), and Soft Skills (see Lo, 2019). These are the topics of our 10 topics.

4. Comments on Wing, He & Lin Papers, and Other Topics
The topics listed below follow Wing, He & Lin’s guidelines. The five next topics are very relevant topics for the industry and are often additional to their list.

4.1. Causal Inference
He & Lin’s seventh topic, Causal Inference, which is located between prescriptive and predictive analytics in Figure 1, can be used not only for medicine, but also for marketing, policy, and political election. You can see its relevance in business by looking at the middle column in Table 1. It contains examples based upon the 4P’s. Although RCT is the most popular method of causal measurement for online advertising (e.g. to compare advertisements A and B), it’s not always possible. Inference techniques that use causal inference are available, even if RCT is not possible. You might be interested in using observational data to assess the impact of a call to sales on purchase rates. See Figure 3. Sales associates have always contacted customers who are older or wealthier to increase their purchase rates. This is classic confounding because the treatment’s connection to outcome has an “backdoor” route through the confounds. Propensity score matching can be used to address this problem (see Rubin, 2006; Imbens & Rubin 2015.). Statistics, economists, and epidemiologists tend to concentrate on the estimation of effects of causes in their methods.
Directed-Acyclic Graph (DAG) and Bayesian Network (BN) are two powerful AI techniques that help discover causal relationships. This branch is much more common in computer science than it is in computer engineering (Pearl, 2000; Pearl and MacKenzie, 2019). Researchers and practitioners could benefit from guidance on the use of various approaches.

4.2. Heterogeneous treatment effect / Uplift modeling
He & Lin have created Quantitative PrecisionX as a first area of causal inference. The first area of Causal-X is to measure the overall treatment efficacy (Average Treat Effect, ATE). The latter aims at the individual/subgroup treatment efficacy (Heterogeneous Effect, HTE) for personalization. Radcliffe & Surry (1999), Lo (2002, 2008) and Lo (2002, 2008) pioneered Uplift Modeling which shares similarities with subgroup analysis. Its impact on business has led to it becoming a subfield. There are many techniques and packages for it. Rzepakowski & Jastrzewicz (2012), Yong (2015) and Zhao (2017). Zhang et.al. (2020).

4.3. Data Science Ethics
He & Lin’s second area, Fair and Interpretable Learning, Decision Making and Decision Making are becoming increasingly important. They have also grouped the AI/Data Science Ethics areas. This encompasses both quantitative as well as non-quantitative disciplines, as illustrated in Figure 4.

– We should always ask ourselves if the data used to train models is accurate. Are the predictors we use ethically acceptable? While we agree with Wing’s Precious Data view that solid methods must be used to solve the problems of precious data, data scientists should be aware of possible biases in found information.

Policy and data privacy are essential requirements. Data scientists have been working to safeguard individual identities since the ninth section of Wing. It might be useful to mention that privacy-preserving data mining is a collective approach to this problem. In it, data are altered or perturbed in a number of ways prior to model development. While retaining value for knowledge discovering, such transformations are summarized by Aldeen et.al. (2015), Mendes & Vilela (2017).

– Can fairness metrics be used to detect bias in a model after it is created? Is it possible to remove biases algorithmically?

– Model transparency is needed. Would you prefer an easily understood model OR one that requires explaining? See Rudin (2019).

– Do we need a governance system to oversee data science and deal with gray areas? Sandler & Basl (2019).

4.4. Deep Learning with Unstructured Data
Wing’s first topic is deep learning. This technique has many uses and is Wing’s primary focus. Although we may not understand the reasons for deep learning, Hornik’s 1991 research showed that multilayer perceptron can be used as a universal approximator. Modern techniques allow it to reach new heights through weight sharing, regularization and multiple layers. Deep learning, although it is only a subclass for predictive analytics (level 2 of Figure 1), has broad impact on unstructured data, such as images, text, and speech, which makes up most of Big Data. These data are briefly covered in the tenth section of He & Lin’s and the fourth of Wing. However the success of neural nets for unstructured Data deserves a special highlight. Text is the most valuable type of unstructured dataset. Many words contain data waiting for analysis by Natural Language Processing or deep learning. NLP is able to identify rules and agreements by using legal and contractual data, which are crucial data sources in compliance and legal analysis. Also, survey verbatim, electronic records of health, and doctor’s notes are all potential sources. Website owners may create a keyword search capability, which requires NLP to interpret and return answers.

4.5. Computational Tools and Technology
These are essential for the execution of projects. Reinsel et al. (2020), reported that the world’s data would grow by 3X, or 175 ZB, in the coming years. A growing data set and the rise in deep learning will require more computing power. What computational technologies should we acquire? Wing’
The seventh point focuses on the importance of computing systems in data-intensive applications. It also recommends new system design with efficient data access processing. He & Lin’s fifth points mention cloud-based distributed statistical analysis. These skills are essential for research but the following key skills can be applied to industry applications:

Data Knowledge: We need to have a solid understanding of data in order to properly analyze it. This is especially true for large volumes of data that come from many sources. Understanding the business requires knowledge of the data.

– Extracting Transform and Loading (ETL). This skill is vital, especially when working with Big Data. Professional data engineers are often able to assist you.

– Model production: While statistical analysis may produce results that can’t be used in a manufacturing environment, data science tends towards completing the production cycle through ongoing prediction and support (see FIGURE 2). Containerization tools are used to modernize model deployment. They make it easy and cost-effective to deploy scoring codes, adapt changes, and save time. See Rao (2019) and Kelleher and Kelleher (2018).

4.6. Analytic Consulting and Communication and Soft Skills
These skills are crucial for industry applications, as Kruhse Lehtonen & Hofman (2020), explains: “The highest level AI maturity is when all employees move in unison, silos dissolve, and data- and AI are used daily by everyone.” These are the following:

Business Consulting: These consulting services may include identifying potential, initiating projects and drafting proposals.

– General Business Communication: This involves understanding your business audience and speaking their language. It also includes storytelling and visualization. These skills may be part of business analysis programs.

Communication with IT professionals: Figure 2 shows how data scientists collaborate with data/tech professionals in multiple phases. Communication with IT professionals can help you communicate efficiently and facilitate development and deployment.

4.7. Descriptive analytics
Figure 1 illustrates Descriptive Analytics. This level includes data visualization. This approach was strongly suggested by John Tukey and Florence Nightingale. However, it may not be as prominent in the current data science and statistical programs.

– Data Visualization and Statistical Graphics. Visually appealing graphics can be powerful tools for sharing business insights with stakeholders. Unwin (2020), for example, stated that “A picture isn’t a substitute for a thousand word ” for data storytelling using graphics.

Data scientists need to have access to reports, summary statistics, profiling. One can perform behavior analysis on customer usage and ownership to get a better understanding of a customer’s demographics. To assess the differences between customer groups, significance testing is an option.

– Feature selection: These steps could be used to select features, which can then be used in predictive analytics.

4.8. Prescriptive analytics
Figure 1’s top level of analytics, prescriptive analytics for decision-making, is often under-focused in data science and statistics programs. These are the marketing 4Ps. If we can answer them causally, then we can use these results to optimize. Operations Research / Industrial Engineering covers optimization. It uses a combination of linear, dynamic, and stochastic programming techniques. The success of reinforcement learning in autonomous vehicles as well as chess games has attracted a lot attention recently. This is due to its integration of predictive and prescriptive data. Prescriptive analytics can be achieved by adopting an optimization mindset. This means that you start with a clear objective function and constraints and work through the problem in a systematic manner. Customer relationship management (or customer relations management) is a common business application. Figure 5, which illustrates a marketing campaign with many channels and multiple messages/offers to millions of customers, shows how it can be applied to optimize customer contacts.
What channel and message should customers be assigned? Uplift modeling can estimate the impact of channel/message on customer, but the complexity of many combinations makes it difficult to optimize algorithms. See Lo and Pachamanova (2015).

4.9. Social Sciences
Social Sciences can be a combination of many fields that data scientists can use to increase analytics and insights.

Microeconomics is a field that studies individual and corporate decisions. Let’s say you are selling ice cream. You want to find the lowest price for maximum profit. When you look at historical data, including sales data and prices at different points in the past, and then fit that line (Figure 6), it becomes apparent that there is a downward relationship between price and sales. The price elasticityundefined determines the slope of the curve. Multiplying sales estimates by unit profit yields total profit. Price minus unit cost (say 2,2) is the price profit. The best price is $5. This illustrates the connection between microeconomics and marketing statistics.

– Macroeconomics. You will be asked to create a causal map of how macroeconomic variables influence outcomes, such as sales, revenue or risk. Future macroeconomic conditions may not be known, but one can combine simulation and forecasting to simulate them. Refer to Oxelheim & Wihlborg (2008) or Leamer (2010).

– Behavioral Economics. Bringing Psychology and Economics together, we created Behavioral Economics. The result was a few Nobel prize winners, including Nudge Theory as summarized by Thaler & Sunstein (2009). Students will often opt-in to the webinars you offer if they are invited. Only a few students may sign up if they have to first register (opt-in). Opt-out is a better option than opting-in. Another example of choice architecture is that fewer choices are better than too many. People tend to avoid products with too many options. While some of these phenomena have been well researched, others remain unanswered. We can use what is already known to create a random experiment that will allow us to examine them and then make an impact on human behavior.

4.10. Expertise in the field the application is used for
These are common areas of application, and each one has a lot of domain knowledge (see Figure 7). Some programs offer an entire course in applications, while others have elective courses. Statistics and data science degrees may not include these subjects and thus students/practitioners may have to acquire themselves. For example, if you want to use data science in the field of marketing and sales you will need to work with experts in this area. Data scientists need to be familiar with operational, credit and market risk. You can identify potential opportunities and use analytics effectively once you are familiar with the domain.

5. In summary, this article discusses the various ways in which technology has changed our lives and how we interact with the world around us. It looks at how the use of technology has created both positive and negative changes, from providing us with more efficient communication to increasing the amount of cybercrime. It emphasizes the importance of understanding the implications of technology for our lives and how we should use it responsibly.
Data Science covers a broad range of applied and academic disciplines. Statisticians and data analysts can increase their knowledge through the acquisition of modern computational methods such as NLP and Deep Learning. This article provides 10 industry perspectives on topics that could be used to expand the knowledge and scope of professional education programs.

– Causal Inference
– Modeling of Uplift and Heterogeneous Treatments
– Ethical Considerations for Data Science
Deep Learning, Unstructured Data
– Computerized Devices and Systems
Analytic Consulting and Communication, Soft Skills
– Descriptive analytics
– Prescriptive analytics
– Studies of Human Society
– Expertise in the specific field of the application

These opportunities can also be used for applied research. In order to bridge the gap in supply and demand for research & industry education, it is important to encourage collaboration between academia and industry. Internships and students who are exposed to real world applications can stimulate curiosity and help develop skills.

Author

  • julissabond

    Julissa Bond is an educational blogger and volunteer. She works as a content and marketing specialist for a software company and has been a full-time student for two years now. Julissa is a natural writer and has been published in several online magazines. She holds a degree in English from the University of Utah.

julissabond

julissabond

Julissa Bond is an educational blogger and volunteer. She works as a content and marketing specialist for a software company and has been a full-time student for two years now. Julissa is a natural writer and has been published in several online magazines. She holds a degree in English from the University of Utah.

You may also like...