Mario Vela https://www.mariovela.com Principal Data Scientist Thu, 06 Apr 2023 14:58:55 +0000 en-US hourly 1 https://wordpress.org/?v=6.6.2 https://www.mariovela.com/wp-content/uploads/2021/02/cropped-Mario_TAW-1-32x32.jpg Mario Vela https://www.mariovela.com 32 32 A reflection on the evolution of thought and the current discussion on AI concerns. https://www.mariovela.com/2023/04/04/a-reflection-on-the-evolution-of-thought-and-the-current-discussion-on-ai-concerns/ Tue, 04 Apr 2023 21:17:43 +0000 https://www.mariovela.com/?p=180 Continue reading "A reflection on the evolution of thought and the current discussion on AI concerns."

]]>

“technology sufficiently advanced is indistinguishable from magic.” Arthur C. Clarke

ChatGPT is the center of so much controversy today and brings excitement and concern. Some people see a world of possibilities, while others see a world of dangers. I see both worlds happening simultaneously, but more importantly, I think this dilemma has a deeper meaning that reflects a potential issue in the way science and knowledge have been approached for a long time. Curiously, people are now concerned about the ethics, transparency, interpretability, morals, conscience and values of these new machines and the people using them. However, it seems paradoxical that these characteristics correspond to the metaphysics that was rejected when we adopted the “enlightenment” agenda a few centuries ago, in which Descartes was one of the essential proponents and some of his ideas behind the current mindset of today’s positivist scientific inquiry.

This contrast made me realize a few historical trends. In the ancient world and middle ages, being itself was truth. Things were because they were thought. God, pure intellect, made things by thinking them; hence all beings are therefore thought into being. Because of that we can think about our own being, know something about it and discern what was supposed to be our role in this world. This concept was considered the wisdom we wanted to attain and the purpose of science. Also, it is important to note that, from such a perspective, man’s work was considered secondary, contingent, transitory and never considered real cognition but only “techne”, manual skills and no real science. Only later, with Descartes and others, knowledge is redefined as the things that we can explain with mathematics and what can be made: the facts. Knowledge becomes only what can be explained, observed and causes understood, and hence it is basically the call for the death of the metaphysics that, curiously, we are trying to resurrect now in an interesting contradiction to what seems to be the agenda of the advocates of positivism and the new “scientific” community. Another important consequence of this period is that man is reduced to a mere historical event, a product of a sequence of historical steps that, supposedly, could be totally described and explained eventually.

The situation gets even more challenging when philosophers like Carl Marx propose even a more aggressive agenda. Marx offers that it is not enough for man to know what he made or understood, but he needs to change it. The truth with which man is concerned is not the truth of being, nor even the truths of what he has made or the deeds he accomplished but the truth of changing and molding the world to his desires. A truth based on change and action. Technology, in this scenario, becomes the primary tool to get to that supposedly truth and purpose of man. What has been done is giving entry to what can be done. Man is no longer satisfied with what he is or has done; he needs to change it. And that is the new agenda.

In this new paradigm, technology is no longer a subordinate, but it is the primary focus for the realization of man’s purpose: an ever-changing world based on repeatable facts that could be fully understood as long is subject to experimentation and in which man is just another experimental subject in the physical reality we need to understand and change.

I believe that these recent AI concerns are representative of an underlying cry for a reconsideration of the “change” agenda today. We need to recover the wisdom of studying what it means to be a human first and not just change it for the sake of progress. This scenario implies opening our minds to the possibility that Intelligence and Consciousness are not merely properties of matter but the properties of a soul. A soul that was thought into existence by God. Finally, we will be surprised that we might not need Artificial General Intelligence (AGI) to find the purpose of man, but just a good, simple, and natural intelligence to focus on the question of what we are and not on what we can do.

]]>
Reproducibility in Keras https://www.mariovela.com/2021/05/07/reproducibility-in-keras/ Fri, 07 May 2021 13:09:03 +0000 https://www.mariovela.com/?p=149 Continue reading "Reproducibility in Keras"

]]>

I bet I am not the only one that tried to figure this out! Of course I google’d around but really could not find a solution to make a Keras model expel the same results every time I execute the notebook cell with the compile and fit of the model. I read and applied the recipe in the official Keras documentation but still my code was returning different results. The solution came when I read the details in the Tensorflow documentation for tf.random.set_seed. It turns out that in order to make a Keras model compilation and fit methods reproducible we need to wrap our model in a function to leverage what the TensorFlow documentation says about functions:

Note that tf.function acts like a re-run of a program in this case. When the global seed is set but operation seeds are not set, the sequence of random numbers are the same for each tf.function

The solution is to reset the seed in the cell that calls the function that instantiate the model as described in the example below!

Let’s start with seeding all the generators at the beginning of a notebook…

# for reproducability use this block at the beginning of any notebook!!
# following this https://keras.io/getting_started/faq/#how-can-i-obtain-reproducible-results-using-keras-during-development
# VERY IMPORTANT.. these 2 lines mustbe before importing Tensorflow
# There are 4 different random number generators that need to be "seeded"
import os
os.environ['PYTHONHASHSEED'] = '0'

import numpy as np
import tensorflow as tf
import random as rn
# The below is necessary for starting Numpy generated random numbers
# in a well-defined initial state.
np.random.seed(123)

# The below is necessary for starting core Python generated random numbers
# in a well-defined state.
rn.seed(123)

# The below set_seed() will make random number generation
# in the TensorFlow backend have a well-defined initial state.
# For further details, see:
# https://www.tensorflow.org/api_docs/python/tf/random/set_seed
tf.random.set_seed(1234)

Now lets create a simple Keras Sequential Model and data

from tensorflow import keras
from tensorflow.keras import layers

# create a simple regression model
model = keras.Sequential()
model.add(layers.Dense(2, activation="relu"))
model.add(layers.Dense(3, activation="relu"))
model.add(layers.Dense(1, activation="sigmoid"))
model.build(input_shape=[10,3])

# some data
x = tf.ones((3, 3))
y = np.array([3,2,1])

To monitor the randomization of the weights, I reused a code I found in an excellent repo from the NVIDIA folks to track how the seeds impact the weights.

# this function is used to track the value of the weights
# captured from https://github.com/NVIDIA/framework-determinism
def summarize_keras_trainable_variables(model, message):
  s = sum(map(lambda x: x.sum(), model.get_weights()))
  print("summary of trainable variables %s: %.13f" % (message, s))
  return s

Now if we monitor what happens with the weights during compilation and fit, we will see the following results

tf.random.set_seed(1234)
adm = keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08)
model.compile(loss='mse',
               optimizer=adm,
               metrics=['accuracy'])
summarize_keras_trainable_variables(model, "before training")
history = model.fit(x,y, epochs=10, batch_size=1,
                     validation_split=0.1, verbose=0)
summarize_keras_trainable_variables(model, "after training")
summary of trainable variables before training: -1.3297259211540
summary of trainable variables after training: -1.2190857985988

if we repeat the code again we will get the following results

summary of trainable variables before training: -1.2190857985988
summary of trainable variables after training: -1.0630745515227

Notice how the las value of the first run became the first value of the second run. This is expected as described in the TensorFlow documentation. Obviously we do not want this so let’s wrap the model into a function and see what happens!

# wrap the model in a function!
def MLP():
    # create a simple regression model
    model = keras.Sequential()
    model.add(layers.Dense(2, activation="relu"))
    model.add(layers.Dense(3, activation="relu"))
    model.add(layers.Dense(1, activation="sigmoid"))
    model.build(input_shape=[10,3])
    return model

and run a new block of code that include the instantiation of the model using the function

# compile and fit model and observe weigths
tf.random.set_seed(1234)
model = MLP()
adm = keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08)
model.compile(loss='mse',
               optimizer=adm,
               metrics=['accuracy'])
summarize_keras_trainable_variables(model, "before training")
history = model.fit(x,y, epochs=10, batch_size=1,
                     validation_split=0.1, verbose=0)
summarize_keras_trainable_variables(model, "after training")

This time the results are:

summary of trainable variables before training: -1.3297259211540
summary of trainable variables after training: -1.2190857985988

and if we run the code again, we should get the same values!!!

summary of trainable variables before training: -1.3297259211540
summary of trainable variables after training: -1.2190857985988

This is what I was looking for and in a nutshell the recipe consist on:
1) Use the code suggested by the Keras documentation at the beginning of the notebook.
2) Wrap the models in a function.
3) Reset the TF seed before instantiating the Model, compiling and fitting it.

I hope this will help you to get your TF models reproducible. Have a great day!!

]]>
Foundations: is low R-squared a problem? https://www.mariovela.com/2021/03/11/foundations-is-low-r-squared-a-problem/ Thu, 11 Mar 2021 02:57:12 +0000 https://mariovela.com/?p=119 Continue reading "Foundations: is low R-squared a problem?"

]]>

R-squared nowadays is a strange metric that is not even mentioned in the ML community anymore because it is not used to evaluate modern ML models. However, understanding the intuition behind it and its proper interpretation is helpful to understand how noise in the data can impact model parameters. This post is from a white paper I wrote several years ago but I think it is still a good piece to reflect on how noise in the data can impact model parameters.

R-squared is a typical metric used to assess the performance of a Regression model. However, sometimes the analyst can experience model results with a very low R-squared value while coefficients statistically significant. Should the analyst proceed and trust the significant coefficients and ignore the low R-squared value? or Should the analysts discard completely the model?. This post try to address this apparent dilemma and provide some insights of how to face the challenge.

R-Squared is usually understood as the “percentage of variance explained” by the independent variables in the model. Since the definition is talking about explanatory power of the model to describe a “change” in the dependent variable, R-squared has a lot of implications on the prediction performance of the model. A low R-squared will mean a large residual standard error and a large confidence interval for prediction purposes while a high R-squared usually means small standard error and smaller confidence intervals for prediction which ultimately means more accurate ones.

Furthermore, these relationships are only relevant if Regression Assumptions are valid. In particular, the residual assumptions related with Normality, constant mean and constant variance.

One thing that is not so evident is that it is possible to capture accurately the linear model structure within a very noisy data set (low R-squared) if we are careful to validate that residuals comply with regression assumptions. However, if residuals are not complying with normality assumptions, a low R-squared can be a warning sign to the potential of capturing a complete different Linear model than the real one hiding behind the noise which ultimately can take the analyst to incorrect conclusions.

Let’s explore a few scenarios to understand the implications of Noise in R-squared and the model structure. First, I will demonstrate that if there exist an intrinsic Linear Pattern in the data and this model is affected by white noise, linear regression can still detect the linear model structure with strong confidence regardless of the R-squared value. Secondly, in contrast, I will show how noise that is not normal can trick the regression process to return an incorrect significant model.

A Linear Model + White Noise

To understand the meaning of the R-squared metric, let’s assume we have a line and add white noise to each point in the line to create a dataset of signal+noise. I will change the noise level to demonstrate that under this condition (i.e. strong underlying linear model + white noise) linear regression can return a valid model structure regardless of the R-squared value and noise level.

Linear Model Y= 5000 + 3X and Small White Noise

A good start is to look into a Linear Model with a small amount of White Noise. We should expect a good regression model.

# Creating a dataset of 1000 points from a normal distribution
set.seed(500)
noise_small



# Let's perform the regression...
data|t|)    
## (Intercept) 4.97e+03   3.16e+01   157.2   



It is important to pay attention that the intercept and coefficient are 4972.0053, 3.0072 respectively. The model is fitting nicely the underlying line as reflected in the significant coefficients and the close match in the picture shown below. This result is what we would expect if the noise is really white noise.

plot(x,y, main="Linear Model + Small Noise")
lines(line, col="red", lwd=2)
lines(predict(model,data.frame(x=x)), col="blue", lwd=2) 
legend ("topleft", legend=c("original line", "model fit"), col=c("red","blue"), lty=1, lwd=2)
The linear model captures properly the underlying line because the added noise is white noise.

Linear Model Y= 5000 + 3X and Large White Noise

Now, Let’s compare how regression performs with a large White Noise component

set.seed(500)
noise_large



# Let's perform the regression...
data|t|)    
## (Intercept) 4832.032    189.777   25.46   



Notice how the coefficients (4832.0315, 3.0434) still fit very closely the original model parameters as reflected in the significant coefficients and the close match in the picture shown below. But what is more interesting is that the R-squared is degraded considerable (~ 8% with large noise vs ~75% with small noise). The analyst still can draw important conclusions from the coefficients of the model but probably can not use the model for prediction purposes because of the low R-squared and large standard error of the model which will put the predictions with a high level of uncertainty. This scenario is something I always verify in my ML models: if the error values of my model behave like white noise, then I know the model picked the "signal" from the data correctly and at least I know I can use my model for explanatory purposes.

The model continues to capture the underlying line even with a higher level of noise because it is white noise.

A Linear Model and Non-White Noise…

Now, assume we have a linear model Y=X and we add noise around the line but we constraint the noise to a shape of an ellipse that has one of its axles in the line Y=X as well. This time the noise will not be white noise as its distribution is altered. We will notice how things can go wrong here..

# the ellipse of noise will be centered at (5000,5000) but rotated 45 degrees to match 
# the underlying linear model
angle



# Executing the regression
model|t|)    
## (Intercept) 2.41e+03   1.58e+02    15.3   



This time the coefficients (2414.3008, 0.5176) are not capturing very well the underlying structure of the line Y=X. This is because of the weak directionality of the dataset imposed by the noise that “confuses” the regression process. The model coefficients are highly significant but the R-squared is considerable lower than in the previous case. This combination should trigger a warning sign to the analyst that should take the proper steps to verify residual assumptions. This model might not be a good model for either predicting “Y" or getting insights from the coefficients.

# Looking at the underlying line and model fit... not lucky this time...
plot(points, main="Line + Ellipse Noise")
lines(seq(1:noisepoints),seq(1:noisepoints), col="red", lwd=2) # the real axle
lines(seq(1:noisepoints),predict(model,data.frame(x=seq(1:noisepoints))), col="blue", lwd=2) 
legend ("topleft", legend=c("original line", "model fit"), col=c("red","blue"), lty=1, lwd=2, bg="white")
This time the model does not follow the underlying line because of the violation of residual assumptions.

This is a good example where the combination of large noise (captured by the low R-squared value) and the violation of residual assumptions make the model unreliable for statistical insights even if the model and coefficients are significant.

Conclusion

is low R-squared a problem?. The answer is: “It depends”. The real answer is that as long as we verify Regression assumptions, in particular Residual assumptions, and use good judgment at the time of verifying them, we should be able to get insights from the significant coefficients of a regression model even if the R-squared is low. If assumptions are violated and R-squared is low then most probably the model is not reliable even with significant coefficients. The practical implication for ML practitioners is that we should also look at the properties of the error distributions to verify they are not showing any patterns that could mean our model did not capture the "signal" in the data. Buenas Noches!!

]]>
Data Science Ethics: What is the foundational standard? https://www.mariovela.com/2021/03/08/data-science-ethics-what-is-the-foundational-standard/ Mon, 08 Mar 2021 19:50:14 +0000 http://mariovela.com/?p=100 Continue reading "Data Science Ethics: What is the foundational standard?"

]]>

To address the question of ethics in any arena, including Data Science, first, we need to question ourselves what is the standard used to define what is “good” and “bad.” The importance of knowing such a standard is fundamental since choosing the wrong standard can generate false definitions for what is “good” and “bad” with a variety of consequences in society and, in this case, in the practice and use of Data Science. Hence, the standard must be absolute because if it changes, then the meaning of “good” and “bad” is lost, and we fall into a moral relativism situation. 

Kreeft (2004) suggests that to talk about ethics, we must ask ourselves, which is the moral standard that we use in our daily lives. If we cannot answer such a question, we should embark on the search for the answer using logic and reason. Kreeft (2004) argued that to answer such a question we have two options: either our core moral values are objective or subjective, they are discovered as scientists discover the laws of physics, or they are created as the rules of a game or a piece of art. He also noticed that pre-modern cultures believe that core moral values were objective and it is only in recent times that society started to believe that those core moral values are subjective, human-made and can be changed over time. The latter scenario is what is called moral relativism, a very common and dangerous ideology in modern times.

Regardless of which option we believe, there are significant consequences to achieve a good set of moral rules for Data Science. For instance, If we believe moral values are objective, we should “find” them, but if we believe they are subjective, then we must “create” them.

For Data Science, the practical implications are that we should take a position about ethics regarding objective or subjective moral values. If we decide that moral values are objective, we should identify the unchanging core moral values and build our analytics ethical practices around them. In contrast, if we say that our moral values are subjective, then we need to create those moral values and agree on using them among the community.

Each of them has challenges but we know only one must be true. Subjective Moral values immediately move us into a dangerous moral relativism, which can be abused by interested groups and expose adoption issues since not all interested parties might agree on them. On the other hand, Objective Moral values present the challenge that, in order to not fall in a subjective approach, these moral principles need to be discovered and cannot be created by humans. They necessarily need to exist independent of us, and because of that, they present the benefit of being unquestionable and provide less resistance for adoption. Hence, the search should take us into a metaphysical research and inquiry.

I want to propose that it is in this metaphysical inquiry that we will not only find the necessary objective ethical standard for our Data Science practice, but a beautiful and fulfilling encounter. An encounter that will transform our lives and provide clarity on topics as complex as Ethics in Data Science. Let’s search for those core moral values and meet the One that provided them!!!

References

Kreeft, Peter. Ethics: A History of Moral Thought. Recorded books, LLC, 2004.

]]>
ETSI GANA https://www.mariovela.com/2021/03/07/etsi-gana/ Sun, 07 Mar 2021 22:51:34 +0000 http://mariovela.com/?p=97 Continue reading "ETSI GANA"

]]>

ETSI provides a great framework for guiding the strategy of wireless operators towards the vision of autonomous networks through the GANA framework. The major contribution of this framework is the inclusion of the “Decision Engine” (DE) concept that enables the integration of AI components in different planes of the future wireless networks architectures. Check the whitepaper for far more details here https://www.etsi.org/images/files/etsiwhitepapers/etsi_wp16_gana_ed1_20161011.pdf

]]>