Learn

lassobbn.learn.do_learn(df_path: str, nodes: List[str], seen: Dict[str, List[str]], ordering_map: Dict[str, List[str]], n_way=3, ignore_neg_gt=- 0.1, ignore_pos_lt=0.1, n_regressions=10, solver='liblinear', penalty='l1', C=0.2, robust_threshold=0.9) → None

Recursively learns parents or robust independent variables associated with each variable.

Parameters

df_path – CSV path.
nodes – List of variables.
seen – Dictionary storing processed/seen variables.
ordering_map – Ordering map.
n_way – Number of n-way interactions. Default is 3.
ignore_neg_gt – Threshold for ignoring negative coefficients.
ignore_pos_lt – Threshold for ignoring positive coefficients.
n_regressions – The number of regressions to do. Default is 10.
solver – Solver. Default is liblinear.
penalty – Penalty. Default is l1.
C – Regularization strength. Default is 0.2.
robust_threshold – Robustness threshold. Default is 0.9.

Returns

None.

lassobbn.learn.do_regression(X_cols: List[str], y_col: str, df: pandas.core.frame.DataFrame, solver='liblinear', penalty='l1', C=0.2) → sklearn.linear_model._logistic.LogisticRegression

Performs regression.

Parameters

X_cols – Independent variables.
y_col – Dependent variable.
df – Data frame.
solver – Solver. Default is liblinear.
penalty – Penalty. Default is l1.
C – Strength of regularlization. Default is 0.2.

Returns

Logistic regression model.

lassobbn.learn.do_robust_regression(X_cols: List[str], y_col: str, df_path: str, n_way=3, ignore_neg_gt=- 0.1, ignore_pos_lt=0.1, n_regressions=10, solver='liblinear', penalty='l1', C=0.2, robust_threshold=0.9) → Dict[str, Union[str, List]]

Performs robust regression.

Parameters

X_cols – List of independent variables.
y_col – Dependent variable.
df_path – Path of CSV file.
n_way – Number of n-way interactions. Default is 3.
ignore_neg_gt – Threshold for ignoring negative coefficients.
ignore_pos_lt – Threshold for ignoring positive coefficients.
n_regressions – The number of regressions to do. Default is 10.
solver – Solver. Default is liblinear.
penalty – Penalty. Default is l1.
C – Regularization strength. Default is 0.2.
robust_threshold – Robustness threshold. Default is 0.9.

Returns

A dictionary storing parents of a child. The parents are said to be robust.

lassobbn.learn.expand_data(df_path: str, parents: Dict[str, List[str]]) → pandas.core.frame.DataFrame

Expands data with additional columns defined by parent-child relationships.

Parameters

df_path – CSV path.
parents – Parent-child relationships.

Returns

Data frame.

lassobbn.learn.extract_meta(meta_path: str) → Tuple[Dict[str, List[str]], List[str]]: Extracts meta data. :param meta_path: Metadata path (JSON file). :return: Tuple; (ordering map, start nodes).

lassobbn.learn.extract_model_params(independent_cols: List[str], y_col: str, model: sklearn.linear_model._logistic.LogisticRegression) → Dict[str, Union[str, float]]

Extracts parameters from models (e.g. coefficients).

Parameters

independent_cols – List of independent variables.
y_col – Dependent variable.
model – Logistic regression model.

Returns

Parameters (e.g. coefficients of each independent variable).

lassobbn.learn.get_data(df_path: str, X_cols: List[str], y_col: str, n_way=3) → pandas.core.frame.DataFrame

Gets a data frame with additional columns representing the n-way interactions.

Parameters

df_path – Path to CSV file.
X_cols – List of variables.
y_col – The dependent variable.
n_way – Number of n-way interactions. Default is 3.

Returns

Data frame.

lassobbn.learn.get_graph(parents: Dict[str, List[str]]) → networkx.classes.digraph.DiGraph

Gets a graph nx.DiGraph.

Parameters: parents – Dictionary; keys are children, values are list of parents.
Returns: Graph.

lassobbn.learn.get_n_way(X_cols: List[str], n_way=3) → List[Tuple[str, ...]]

Gets up to all n-way interactions.

Parameters

X_cols – List of variables.
n_way – Maximum n-way interactions. Default is 3.

Returns

List of n-way interactions.

lassobbn.learn.get_ordering_map(meta: Dict[str, any]) → Dict[str, List[str]]

Gets a dictionary specifying ordering. A key is a variable, a value is a list of variables that comes before.

Parameters: meta – Metadata.
Returns: Ordering.

lassobbn.learn.get_robust_stats(robust: pandas.core.frame.DataFrame, robust_threshold=0.9) → pandas.core.frame.DataFrame

Computes the robustness statistics.

Parameters

robust – Data frame of robustness indicators.
robust_threshold – Threshold for robustness. Default is 0.9.

Returns

Data frame of variables that are robust.

lassobbn.learn.get_start_nodes(meta: Dict[str, any]) → List[str]

Gets a list of start variables/nodes to kick off the algorithm.

Parameters: meta – Metadata.
Returns: Start nodes.

lassobbn.learn.learn_parameters(df_path: str, pas: Dict[str, List[str]]) → Tuple[Dict[str, List[str]], networkx.classes.digraph.DiGraph, Dict[str, List[float]]]

Gets the parameters.

Parameters

df_path – CSV file.
pas – Parent-child relationships (structure).

Returns

Tuple; first item is dictionary of domains; second item is a graph; third item is dictionary of probabilities.

lassobbn.learn.learn_structure(df_path: str, meta_path: str, n_way=3, ignore_neg_gt=- 0.1, ignore_pos_lt=0.1, n_regressions=10, solver='liblinear', penalty='l1', C=0.2, robust_threshold=0.9) → Dict[str, List[str]]

Kicks off the learning process.

Parameters

df_path – CSV path.
meta_path – Metadata path.
n_way – Number of n-way interactions. Default is 3.
ignore_neg_gt – Threshold for ignoring negative coefficients.
ignore_pos_lt – Threshold for ignoring positive coefficients.
n_regressions – The number of regressions to do. Default is 10.
solver – Solver. Default is liblinear.
penalty – Penalty. Default is l1.
C – Regularization strength. Default is 0.2.
robust_threshold – Robustness threshold. Default is 0.9.

Returns

Dictionary where keys are children and values are list of parents.

lassobbn.learn.posteriors_to_df(jt: pybbn.graph.jointree.JoinTree) → pandas.core.frame.DataFrame

Converts posteriors to data frame.

Parameters: jt – Join tree.
Returns: Data frame.

lassobbn.learn.to_bbn(d: Dict[str, List[str]], s: networkx.classes.digraph.DiGraph, p: Dict[str, List[float]]) → pybbn.graph.dag.Bbn

Converts the structure and parameters to a BBN.

Parameters

d – Domain of each variable.
s – Structure.
p – Parameter.

Returns

BBN.

lassobbn.learn.to_join_tree(bbn: pybbn.graph.dag.Bbn) → pybbn.graph.jointree.JoinTree

Converts a BBN to a Join Tree.

Parameters: bbn – BBN.
Returns: Join Tree.

lassobbn.learn.to_robustness_indication(params: pandas.core.frame.DataFrame, ignore_neg_gt=- 0.1, ignore_pos_lt=0.1) → pandas.core.frame.DataFrame

Checks if each coefficient value is “robust”. A coefficient is NOT robust if it is less ignore_neg_gt or if it is less than ignore_pos_lt.

Parameters

params – Data frame of parameters.
ignore_neg_gt – Threshold. Default is -0.1.
ignore_pos_lt – Threshold. Default is 0.1.

Returns

Data frame (all 1’s and 0’s) indicating robustness.

lassobbn.learn.trim_parents(parents: List[str]) → List[str]

Prunes or trims down the list of parents. There might be duplicates as a result of compound or n-way interactions.

Parameters: parents – List of parents.
Returns: List of (pruned/trimmed) parents.

lassobbn.learn.trim_relationships(rels: Dict[str, List[str]]) → Dict[str, List[str]]

Trims/prune parent-child relationships.

Parameters: rels – Dictionary of parent-child relationships.
Returns: Dictionary of trimmed parent-child relationships.