Skip to main content

Modeling of moral decisions with deep learning


One example of an artificial intelligence ethical dilemma is the autonomous vehicle situation presented by Massachusetts Institute of Technology researchers in the Moral Machine Experiment. To solve such dilemmas, the MIT researchers used a classic statistical method known as the hierarchical Bayesian (HB) model. This paper builds upon previous work for modeling moral decision making, applies a deep learning method to learn human ethics in this context, and compares it to the HB approach. These methods were tested to predict moral decisions of simulated populations of Moral Machine participants. Overall, test results indicate that deep neural networks can be effective in learning the group morality of a population through observation, and outperform the Bayesian model in the cases of model mismatches.


With the rapid development toward automation, future reliance on artificial intelligence (AI) for everyday tasks is clear. Often embedded within these tasks are small moral decisions: for example, is violating a minor traffic law justified when it saves the time of others? While humans take these small ethical decisions for granted, society must properly equip AI products with moral compasses if we are to entrust machines even with small daily tasks. Furthermore, confidence in an AI’s ability to make sensible moral decisions is key to winning public acceptance of such systems.

Public acceptance of AI as responsible moral agents is one of the greatest obstacles facing automation and machine learning. Bigman and Gray [1] highlights that people have shown distinct aversion to entrusting machines with ethical decisions in multiple studies, despite the fact that AI has demonstrated superior judgement to humans in certain domains. Other research and surveys indicate that a person’s previous exposure to machine-made decisions plays a crucial role in their confidence in ethical AI [2]. Formulating and demonstrating an easily applicable approach to programming moral agents is the first step in earning public trust in this domain.

Incorporating moral sensibility into machines remains challenging, as it is difficult to derive a quantitative model for objectively determining moral decisions. Current research in AI moral decision making often theorizes abstract and general approaches to training moral agents [3, 4] For example, Shaw et al. [4] proposes a machine learning framework where a group of statistically trained models determine a moral action based on each individual model’s decision, and the confidence each model has in the morality of other models [4]. Still, reducing complex moral scenarios to a form that a framework can easily digest is obtuse.

As with many problems, researchers can find inspiration in human cognitive abilities, including moral determination. English philosopher Jeremy Bentham theorized that individuals choose actions that yield the greatest social utility when faced with ethical dilemmas [5]. Research in universal moral grammar has supported this notion, additionally noting that the moral value of a decision also depends on the context and actions an agent must take within that decision, and not just the net result [6]. As such, it may be possible to model ethical decisions based upon the social utility of each option within a decision. In this paper, we investigate a deep learning based moral decision model, taking a hypothetical autonomous vehicle dilemma as an example.

The moral machine experiment

One scenario relevant to ethical modeling with social utility is the imminent crash of a self-driving vehicle: in this hypothetical situation, an autonomous vehicle with a catastrophic brake failure must decide between killing one of two distinct groups of people. This scenario is a suitable starting point for discussing ethical AI decision making, as it has been investigated extensively in the Moral Machine Experiment [7]. This experiment surveyed thousands of people worldwide for their preferences in autonomous vehicle ethical dilemmas [7]. In any given instance, a participant would be presented with an unwinnable scenario in which only one of two groups of people could survive (see Fig. 1). The survey aggregated answers based on region and evaluated the moral values that societies generally place on different abstract dimensions, such as age, social status, law adherence, and gender. Observing the data and attempting to transfer scenarios into comparative costs based on abstract values is the first step in creating a model that can ethically make these decisions.

Fig. 1
figure 1

Moral Machine Example: Example screenshot of a scenario from a Moral Machine scenario [8]. The participant must decide the more ethical course of action for the self-driving vehicle: swerving would kill all vehicle passengers (right), while maintaining the course would result in the death of all pedestrians (left) [8]

Hierarchical Bayesian modeling for moral decision making

Before describing a deep learning model, it is necessary to both credit and summarize the work by Kim, et al. in A Computational Model of Commonsense Moral Decision Making [9]. This paper observes the same scenario from the Moral Machine Experiment, and models human moral decisions as a random process based on the perceived social utilities between options in a scenario. Each autonomous vehicle scenario contains two options (y=0,1), and option y can be characterized by vector θy. The characters within θy hold various features (such as male, human, doctor, young, etc.). The total features of θy can be found after applying a linear transformation λ=F(θ)=Aθ, where λ is the sum of features in θ. Figure 2 shows the linear transform A used in Kim, et al. [9].

Fig. 2
figure 2

Feature Transform: Binary transformation matrix A used in Kim, et al., which converts a set of character traits θ into quantifiable features [9]

Kim, et al. models human decisions in these scenarios as a comparison of perceived social utilities. The social utility u of option i is calculated as follows:

$$\begin{array}{@{}rcl@{}} u\left(\theta_{i}\right) = w^{\top} F\left(\theta_{i}\right) \end{array} $$

where a vector w is an individual’s set of moral abstract weights for each feature [9]. For each scenario containing two options, i=0,1 (non-intervention and intervention respectively), the probability that the individual will choose the intervention function is modeled as [9]

$$\begin{array}{@{}rcl@{}} P(Y=1|\theta_{0,1}) = \frac{1}{1+e^{{-(u(\theta_{1})-u(\theta_{0}))}}} \end{array} $$

At the heart of this modeling, the w vector for a participant quantifies the abstract moral values that the individual holds for different features (for example, the value of the 4th vector element corresponds to the social value that individual places on youth). Furthermore, Kim, et. al assume that the distribution of these moral values for a culture can be characterized as a multivariate normal distribution, where the mean wg represents the group average, and the covariance matrix Σg represent in-group variances and co-dependence of values (for example, value for pregnancy are correlated with value for infancy).

$$\begin{array}{@{}rcl@{}} w_{i} \sim\ Normal\left(w^{g}, \Sigma^{g}\right) \end{array} $$

Assuming this underlying model, Kim, et. al proposed a Hierarchical Bayesian model, which observes participants’ decisions in the Moral Machine experiment, and predicts individual decisions by inferring underlying moral value set wi for each individual.

Creating a model without assuming an underlying normal distribution

The Hierarchical Bayesian model proves valuable in predicting decisions in the Moral Machine experiment [9]. Its efficacy, however, relies upon the assumption that the modeled abstract values are normally distributed. Indeed, many moral values result from a linear summation of other values, and thus will tend toward a normal distribution by the Central Limit Theorem. It is also possible, however, that other moral values are a more complex, non-linear function of other factors. Thus, it may not always be safe to assume an underlying normal distribution of moral values pertaining to a specific ethical dilemma.

In this paper, we propose the use of a deep neural network for predicting individual moral outcomes. While deep neural networks generally require sufficient amounts of training data, they do not require any prior assumptions regarding the decision process or population distributions of moral principles. Rather, a neural network implicitly learns these aspects through observation. In the following work, we train a deep neural network to predict individual moral decisions in the autonomous vehicle scenario, and compare its performance to that of a Hierarchical Bayesian model. We simulate participant decisions by maintaining the same decision-making process from Kim, et al. [9], but vary the underlying distribution of w, representing different possible distributions of moral principles.


Three models are considered for predicting individual moral decisions from Moral Machine scenarios: a deep learning model, a Bayesian model in which an underlying distribution was assumed, and a likelihood model where no distribution assumptions are made. These models were tested with simulated Moral Machine survey data, which were generated with various underlying distributions of moral values.

Participant simulation

Each virtual participant is characterized by their personal moral vector w, sampled from a population’s multivariate distribution. To create a normally distributed dataset, participants were i.i.d. sampled per Eq. 3, where group mean wg and covariance Σg are specified in Figs. 3 and 4 respectively. These parameters were selected roughly based upon the inferred distribution parameters for Danish participants in Kim, et al. [9].

Fig. 3
figure 3

Moral Principle Mean: Mean value for all underlying distributions of the moral principle vector w

Fig. 4
figure 4

Moral Principle Covariance: Covariance matrix for all underlying distributions of the moral principle vector w

Then, five generalized Gaussian multivariate distributions were created, with the probability density function (PDF) f(x):

$$\begin{array}{@{}rcl@{}} {}{f(x)} &=& \frac{\phi(y)}{\alpha-\kappa\left(x-\xi\right)}; \end{array} $$
$$\begin{array}{@{}rcl@{}} y &=& -\frac{ln\left(1-\frac{\kappa\left(x-\xi\right)}{\alpha}\right)}{\kappa} \end{array} $$

where ϕ is the standard normal PDF function, κ,α, and ξ are the shape parameter, scale parameter, and median respectively. The mean and variance for this distribution are as follows [10]:

$$\begin{array}{@{}rcl@{}} \mu &=& \xi - \frac{\alpha}{\kappa}\left(e^{\kappa^{2}/2-1}\right) \end{array} $$
$$\begin{array}{@{}rcl@{}} \sigma^{2} &=& \frac{\alpha^{2}}{\kappa^{2}}e^{\kappa^{2}}\left(e^{\kappa^{2}}-1\right) \end{array} $$

Each of these distributions was sampled by first sampling the previous normal distribution. Then, each element i in sample w was transformed to a target non-Gaussian distribution with the equation:

$$\begin{array}{@{}rcl@{}} \hat{w_{i}} &=& \frac{\alpha_{i}}{\kappa_{i}}\left(1-e^{-\kappa_{i} \frac{w_{i}-\mu_{i}}{\sigma_{i}}}+ \xi_{i}\right) \end{array} $$

μi and σi are the marginal mean and standard deviation respectively for the moral component value i. κ values for a target distribution were generated (see Table 1). The scale parameter αi and median ξi were then calculated based upon κi with the following equations, ensuring that the mean and covariance of the distribution are preserved during the transformation:

$$\begin{array}{@{}rcl@{}} \alpha_{i} &=& \frac{\mid \kappa_{i} \sigma_{i} \mid}{\sqrt{e^{\kappa^{2}}\left(e^{\kappa^{2}}-1\right)}} \end{array} $$
Table 1 κ values generated for five non-gaussian distributions
$$\begin{array}{@{}rcl@{}} \xi_{i} &=& \mu_{i} + \frac{\alpha_{i}}{\kappa_{i}}\left(e^{\frac{\kappa_{i}^{2}}{2}}-1\right) \end{array} $$

The κ values for the five non-Gaussian distributions were randomly generated such that each distribution displayed a different degree of skew, as evident by each distribution’s approximate average κ magnitude. Each distribution was i.i.d. sampled to create 3,000 participant datasets. For each dataset, 1,000 of these samples was set aside as a test set. An example of the transform’s effect on a marginal distribution is shown in Fig. 5.

Fig. 5
figure 5

Distribution Transform: Histograms of marginal distributions of the ’Human’ value for the normal dataset (left) and one of the transformed datasets (right)

Scenario and decision simulation

Similar to the actual Moral Machine experiment, each of the simulated participants in each group was given thirteen moral machine scenarios. The parameters θ0,θ1 of each scenario were randomly generated. Each option randomly has 0-5 people present, with a 75% chance that each option in one scenario contains an equal number of people (this is done to avoid trivial comparisons, such as 2 people vs 12, and having all decisions be completely dominated by the total number of people in either option). Certain values were preset or selected as a binary to ensure that scenarios were feasible within the Moral Machine framework (for example, the ’Intervene’ value was always 1 for θ1 and 0 for θ0). It should be noted that the parameter generation of each scenario is largely random, while survey questions in the Moral Machine experiment are mostly targeted towards isolating a single factor (gender, social status, etc.) [7].

The decisions from each participant were modeled as the random process described with Eqs. 1 and 2, and as outlined in Kim, et al. [9]. In separate tests, decisions were simulated with a deterministic version of Eq. 2, in which the maximum likelihood decision was always chosen (discussed in “Results and discussions” section).

Model creation and testing

A hierarchical Bayesian (HB), maximum likelihood (ML), and deep learning (DL) model for predicting moral decisions were created and tested with representative participant distributions. The key details are described in the following.

Hierarchical Bayesian model

For participant i, who handled scenarios \(\Theta _{i} = \left [\Theta _{i}^{1},... \Theta _{i}^{N}\right ]\) with decisions \(Y_{i} = \left [y_{i}^{1},... y_{i}^{N}\right ]\), the HB model maximized the posterior probability of wi, and used this estimate to predict other scenario decisions. The Bayesian model is based on the following equations from Kim et al. [9]:

$$\begin{array}{@{}rcl@{}} P\left(w_{i}, w^{g}, \Sigma^{g}\right) &\propto P\left(\Theta_{i}, Y_{i}|w_{i}\right) P\left(w_{i}|w^{g}, \Sigma^{g}\right) \\ &P\left(w^{g}\right)P\left(\Sigma^{g}\right) \end{array} $$

with likelihood:

$$ \begin{aligned} P\left(\Theta_{i}, Y_{i}|w_{i}\right) = \prod\limits_{k=1}^{N} P\left(y_{i}^{k} = 1 |\Theta_{i}^{k}\right)^{y_{i}^{k}} \left(P\left(y_{i}^{k} = 0 | \Theta_{i}^{k}\right)^{\left(1-y_{i}^{k}\right)}\right) \end{aligned} $$

This model was given the exact wg and Σg, but always assumed an underlying normal distribution of w; i.e., P(wg)=P(Σg)=1,P(wi|wg,Σg)ϕ(wi) where ϕ is the normal multivariate pdf with mean and covariance wg,Σg. Thus, this model represents a best scenario where a Bayesian model infers the distribution hyperparameters optimally, but assumes a normal underlying distribution, potentially mismatching the true underlying distribution.

Maximum likelihood model

The ML model is similar to the HB, but does not assume any information regarding the underlying distribution. Thus, for each individual, ML estimates for wi by maximizing the likelihood in Eq. 10.

Deep learning model

Unlike the other two models, the DL model does not explicitly estimate any moral principle vector w. Rather, it directly predicts a decision y from a vector of scenario parameters θ. Scenario i is input into the network as a length 24 vector reflecting \(\Theta ^{i}_{1}\) and \(\Theta ^{i}_{0}\). The model architecture consists of a sequence of densely connected layers with batch normalization and ReLU or sigmoid activation (Fig. 6). Binary cross entropy is used as a loss function. A learning rate of 5e−4 with a decay rate of 0.1 was used for training. For each distribution, the network was trained with sample sizes of 25, 50, 200, 500, 1,000, and 2,000 participants (per simulated participant, 8 questions were used for training, and 5 were used for validation). Finally, testing was conducted on the test set for each distribution, in which predictions from five test questions per simulated participant were evaluated. Initially, the neural network underwent an ’individual fine-tune’ in which the network parameters were briefly fine-tuned with eight questions worth of individual-specific data, but this practice was discarded, as it was found to have no significant effect on performance (discussed in “Results and discussions” section).

Fig. 6
figure 6

Neural Network Architecture: Network architecture for the DL moral decision model. Batchnorm signifies a batch normalization layer

Generation/Testing on generalized data

To generalize from the data found in the Moral Machine Experiment, an abstract dilemma in the same decision framework was also simulated and tested. This dilemma arbitrarily featured length 16 parameter vectors θ. To keep the dilemma as general as possible, the decision process is still modeled as an evaluation of utility U(θ), but no overlap between these parameters is assumed (i.e. the transform A is simply the identity matrix without loss of generality). The randomly generated wg,Σg, and κ values used for this simulation are included in Figs. 7, 8 and Table 2.

Fig. 7
figure 7

Generated Distribution Mean: Mean values of the principle vector w used to synthesize more general data

Fig. 8
figure 8

Generated Distribution Covariance: Covariance matrix of the principle vector w used to synthesize more general data

Table 2 κ values generated for five non-Gaussian distributions used to synthesize more general data

Results and discussions

Training of DL model

Figure 9 illustrates the accuracy of the DL model with various training sample sizes and underlying distributions of w. The approximate average of the absolute value of the shape factor k correlates with the average skewness of the underlying distribution denoted as \(\bar {\left |k\right |}\), the average skewness of underlying marginal distributions.

Fig. 9
figure 9

DL Accuracy: DL model predictive accuracy with different sample sizes and underlying distributions of w. Various distributions for w are denoted by the approximate average of absolute shape factors k (\(\left |\bar {k}\right |=0\) denotes the Gaussian distribution)

Unsurprisingly, the predictive accuracy increases with greater training samples, as the model is given a greater sample size to learn the distribution (although a size of 1,000 participants appears to perform slightly worse in certain instances). What is slightly unanticipated is the significant performance increase with increasing w distribution skewness. Despite all distributions sharing the same mean and covariance, the network can better predict decisions from participants sampled from a more skewed distribution. This is likely because an unskewed Gaussian distribution maximizes entropy for a fixed variance. That is, distribution entropy may decrease with increasing skew.

Within the results above, all model instances are performance-limited by the stochastic nature of the decision making process in Eq. 2: even if a model implicitly estimates moral values perfectly, the model can still only predict the maximum likelihood decision, and not the decision itself. To observe trends without this source of randomness, models were also tested with data simulated from a deterministic version of Eq. 2, where the maximum likelihood decision was always selected (results shown in Fig. 10). It can be seen that this change generally amplifies differences in performance between the distributions.

Fig. 10
figure 10

DL Accuracy (Deterministic Decision Process): DL model predictive accuracy with different sample sizes and underlying distributions of w. Various distributions for w are denoted by the approximate average of absolute shape factors k (\(\left |\bar {k}\right |=0\) denotes the Gaussian distribution) In this test, decisions were simulated with the deterministic version of Eq. 2

Interestingly, fine-tuning the baseline DL model with 8 individual-specific samples did not significantly increase the model’s performance. As such, the predictions of the DL model are purely based on group observations, and does not account for individual differences. This suggests that in this instance, 8 individual-specific questions is insufficient to benefit the predictive accuracy of the DL. It is hypothesized that as in-group variances increase, the need for effectively accounting for individual differences would increase.

Comparison of model performances

Figure 11 compares of model performances over different distributions. The ‘ground truth’ (GT) model is a predictive model where individual values w are exactly known. Thus, GT represents an upper limit the in predictive performance, limited only by the inherent randomness in Eq. 2. By comparison, Fig. 12 illustrates model performances when decisions are based on the maximum likelihood of Eq. 2 (GT accuracy = 100%).

Fig. 11
figure 11

Model Performances: Comparison of model performances (DL trained with 2,000 participants involved) over datasets with various underlying distributions of w, denoted by the approximate average of absolute shape factors k (\(\left |\bar {k}\right |=0\) denotes the Gaussian distribution). ‘GT’ denotes the predictive model in which the exact w for each participant is known

Fig. 12
figure 12

Model Performances (Deterministic Decision Process): Comparison of the model performances (DL trained with 2,000 participants involved) over datasets with various underlying distributions of w, denoted by the approximate average of absolute shape factors k (\(\left |\bar {k}\right |=0\) denotes the Gaussian distribution). In this test, decisions were simulated with the deterministic version of Eq. 2

The results shown in Figs. 11 and 12 indicate that a deep learning based model outperforms a hierarchical Bayesian model when the underlying distribution of w is skewed. On one hand, this is unsurprising, since the HBM’s prior assumed a normally distributed w. However, it is also worth noting that DL was able to achieve this accuracy without individual-specific data. Of course, we do not suggest that individual-specific data is unimportant when modeling moral principles. It is logical that individually-specific data would become increasingly important for accurate modeling as in-group variance increases. Indeed, we believe this implies a robust ability for neural networks to learn underlying group trends and decision processes, given enough training data. This ability is crucial, as the true distributions of population moral values, as well as how they affect moral decisions, are unknown.

In contrast to the DL model, the ML model’s performance was mostly invariant of the underlying distribution of w because it based predictions only on limited, individual-specific data, without any prior assumption. This implies that the information used by the DL and ML models are largely disjoint. As such, a model that effectively leverages population information via deep learning and limited individual-specific data via a maximum likelihood could be superior.

It is also worth noting that the HBM still outperformed the ML model in all instances, indicating that in this case a normal prior is still superior to no prior, as this assumption is still close to the actual underlying distribution. In fact, an increasing trend in the HBM accuracy beyond \(\bar {\left |k\right |}=0.5\) suggests performance gains due to a lower entropy in more skewed distributions counteracted performance losses from an incorrect prior assumption.

Further evaluations with generalized data

Models tested with the randomly generated abstract data were also analyzed, assuming both a random and deterministic decision process. Figures 13 and 14 plot the DL predictive accuracy over training size with the abstract data, while Figs. 15 and 16 compare the performances for each model type with this dataset. Overall, the trends seen in these experiments are consistent with the results seen in the specific, autonomous vehicle scenario. This demonstrates that these findings are not unique to the specific population and scenario parameters found in the autonomous vehicle instance.

Fig. 13
figure 13

DL Model Accuracy (Generated): DL model predictive accuracy with different sample sizes and underlying distributions of w, applied to the generalized, abstract data. Various distributions for w are denoted by the approximate average of absolute shape factors k (\(\left |\bar {k}\right |=0\) denotes the Gaussian distribution)

Fig. 14
figure 14

DL Model Accuracy (Generated; Deterministic Process): DL model predictive accuracy with different sample sizes and underlying distributions of w, applied to the generalized, abstract data. Various distributions for w are denoted by the approximate average of absolute shape factors k (\(\left |\bar {k}\right |=0\) denotes the Gaussian distribution) In this test, decisions were simulated with deterministic version of Eq. 2

Fig. 15
figure 15

Model Performances (Generated): Comparison of model performances (DL trained with 2,000 participants involved) over datasets with various underlying distributions of w, denoted by the approximate average of absolute shape factors k (\(\left |\bar {k}\right |=0\) denotes the Gaussian distribution), as applied to the generalized, abstract data. ‘GT’ is the predictive model in which the exact w for each test participant is known

Fig. 16
figure 16

Model Performances (Generated; Deterministic Process): Comparison of model performances (DL trained with 2,000 participants involved) over datasets with various underlying distributions of w, denoted by the approximate average of absolute shape factors k (\(\left |\bar {k}\right |=0\) denotes the Gaussian distribution), as applied to the generalized, abstract data. In this test, decisions were simulated with the deterministic version of Eq. 2


Overall, we have demonstrated that a deep learning based model can be effective in learning both moral values and making moral decisions in a data-driven fashion. Furthermore, the deep learning model is highly adaptive to training examples, requiring no assumption regarding the distribution of moral values in a population, or the decision process as a function of moral values. Given sufficient training data, this deep learning approach has a distinct advantage, since underlying moral value distributions and decision processes are generally unknown. Based on our initial findings, we are confident that this work inspires much needed confidence in deep neural networks for creating moral agents, given the robust results that deep neural networks show in this example. We are also confident that the parameterization of a moral dilemma used in this experiment can be applied to other, more complex moral scenarios and network architectures.

Future work on machine learning of individual moral decision making should apply the basic method to deep learning with morality shown here to more complicated models involving multiple AI agents, such as the approach suggested in [4] Other research could also leverage a deep learning model combined with a maximum likelihood component to better extract both group trends and individual specific information from limited data, or could train deep neural networks to weight both moral and legal considerations, which is an issue explored in [11]. We are particularly interested in two important applications of this proposed deep learning approach for decision making: 1) democratized re-opening decisions in a pandemic situation, and 2) AI-aided consent processes in healthcare. Further research opportunities are numerous.

Availability of data and materials

The datasets generated and/or analysed during the current study are available in the WANG-AXIS / Modeling-of-Moral-Decisions-with-Deep-Learning repository,



Artificial intelligence


Deep learning


Probability density function


Hierarchical Bayesian


Maximum likelihood


Ground truth


  1. Bigman YE, Gray K (2018) People are averse to machines making moral decisions. Cognition 181:21–34.

    Article  Google Scholar 

  2. Kramer MF, Schaich Borg J, Conitzer V, Sinnott-Armstrong W (2018) When do people want ai to make decisions? In: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society. AIES ’18, 204–209.. Association for Computing Machinery, New York, NY, USA.

    Chapter  Google Scholar 

  3. Marcus G (2020) The Next Decade in AI: Four Steps Towards Robust Artificial Intelligence. arXiv:2002.06177.

  4. Shaw NP, Stöckel A, Orr RW, Lidbetter TF, Cohen R (2018) Towards provably moral ai agents in bottom-up learning frameworks. AIES ’18, 271–277.. Association for Computing Machinery, New York, NY, USA.

  5. Bentham J (1789) An introduction to the principles of morals and legislation. Oxford University Press, Oxford.

    Book  Google Scholar 

  6. Mikhail J (2007) Universal moral grammar: Theory, evidence, and the future. Trends Cogn Sci 11:143–52.

    Article  Google Scholar 

  7. Awad E, Dsouza S, Kim R, Schulz J, Henrich J, Shariff A, Bonnefon J-F, Rahwan I (2018) The moral machine experiment. Nature 563(7729):59–64. Accessed 04 Dec 2019.

    Article  Google Scholar 

  8. Moral machine. Accessed 20 Mar 2020.

  9. Kim R, Kleiman-Weiner M, Abeliuk A, Awad E, Dsouza S, Tenenbaum J, Rahwan I (2018) A computational model of commonsense moral decision making. CoRR abs/1801.04346.

  10. Hosking JRM, Wallis JR (1997) Regional frequency analysis: an approach based on l-moments. Cambridge University Press, Cambridge.

    Book  Google Scholar 

  11. Eliot L (2020) The neglected dualism of artificial moral agency and artificial legal reasoning in ai for social good In: Harvard University, Harvard Center for Research on Computation and Society, AI for Social Good Conference, Cambridge.

Download references


Not applicable.


Not applicable.

Author information

Authors and Affiliations



GW and CW designed the research; CW created the deep learning model, ran all experiments, and drafted the paper; GW and UK revised the paper; UK added considerable material in the introduction and conclusion regarding current research. All author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Ge Wang.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wiedeman, C., Wang, G. & Kruger, U. Modeling of moral decisions with deep learning. Vis. Comput. Ind. Biomed. Art 3, 27 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Artificial intelligence
  • Deep learning
  • Bayesian method
  • Moral machine experiment