Learn

lassobbn.learn.do_learn(df_path: str, nodes: List[str], seen: Dict[str, List[str]], ordering_map: Dict[str, List[str]], n_way=3, ignore_neg_gt=- 0.1, ignore_pos_lt=0.1, n_regressions=10, solver='liblinear', penalty='l1', C=0.2, robust_threshold=0.9) None

Recursively learns parents or robust independent variables associated with each variable.

Parameters
  • df_path – CSV path.

  • nodes – List of variables.

  • seen – Dictionary storing processed/seen variables.

  • ordering_map – Ordering map.

  • n_way – Number of n-way interactions. Default is 3.

  • ignore_neg_gt – Threshold for ignoring negative coefficients.

  • ignore_pos_lt – Threshold for ignoring positive coefficients.

  • n_regressions – The number of regressions to do. Default is 10.

  • solver – Solver. Default is liblinear.

  • penalty – Penalty. Default is l1.

  • C – Regularization strength. Default is 0.2.

  • robust_threshold – Robustness threshold. Default is 0.9.

Returns

None.

lassobbn.learn.do_regression(X_cols: List[str], y_col: str, df: pandas.core.frame.DataFrame, solver='liblinear', penalty='l1', C=0.2) sklearn.linear_model._logistic.LogisticRegression

Performs regression.

Parameters
  • X_cols – Independent variables.

  • y_col – Dependent variable.

  • df – Data frame.

  • solver – Solver. Default is liblinear.

  • penalty – Penalty. Default is l1.

  • C – Strength of regularlization. Default is 0.2.

Returns

Logistic regression model.

lassobbn.learn.do_robust_regression(X_cols: List[str], y_col: str, df_path: str, n_way=3, ignore_neg_gt=- 0.1, ignore_pos_lt=0.1, n_regressions=10, solver='liblinear', penalty='l1', C=0.2, robust_threshold=0.9) Dict[str, Union[str, List]]

Performs robust regression.

Parameters
  • X_cols – List of independent variables.

  • y_col – Dependent variable.

  • df_path – Path of CSV file.

  • n_way – Number of n-way interactions. Default is 3.

  • ignore_neg_gt – Threshold for ignoring negative coefficients.

  • ignore_pos_lt – Threshold for ignoring positive coefficients.

  • n_regressions – The number of regressions to do. Default is 10.

  • solver – Solver. Default is liblinear.

  • penalty – Penalty. Default is l1.

  • C – Regularization strength. Default is 0.2.

  • robust_threshold – Robustness threshold. Default is 0.9.

Returns

A dictionary storing parents of a child. The parents are said to be robust.

lassobbn.learn.expand_data(df_path: str, parents: Dict[str, List[str]]) pandas.core.frame.DataFrame

Expands data with additional columns defined by parent-child relationships.

Parameters
  • df_path – CSV path.

  • parents – Parent-child relationships.

Returns

Data frame.

lassobbn.learn.extract_meta(meta_path: str) Tuple[Dict[str, List[str]], List[str]]

Extracts meta data. :param meta_path: Metadata path (JSON file). :return: Tuple; (ordering map, start nodes).

lassobbn.learn.extract_model_params(independent_cols: List[str], y_col: str, model: sklearn.linear_model._logistic.LogisticRegression) Dict[str, Union[str, float]]

Extracts parameters from models (e.g. coefficients).

Parameters
  • independent_cols – List of independent variables.

  • y_col – Dependent variable.

  • model – Logistic regression model.

Returns

Parameters (e.g. coefficients of each independent variable).

lassobbn.learn.get_data(df_path: str, X_cols: List[str], y_col: str, n_way=3) pandas.core.frame.DataFrame

Gets a data frame with additional columns representing the n-way interactions.

Parameters
  • df_path – Path to CSV file.

  • X_cols – List of variables.

  • y_col – The dependent variable.

  • n_way – Number of n-way interactions. Default is 3.

Returns

Data frame.

lassobbn.learn.get_graph(parents: Dict[str, List[str]]) networkx.classes.digraph.DiGraph

Gets a graph nx.DiGraph.

Parameters

parents – Dictionary; keys are children, values are list of parents.

Returns

Graph.

lassobbn.learn.get_n_way(X_cols: List[str], n_way=3) List[Tuple[str, ...]]

Gets up to all n-way interactions.

Parameters
  • X_cols – List of variables.

  • n_way – Maximum n-way interactions. Default is 3.

Returns

List of n-way interactions.

lassobbn.learn.get_ordering_map(meta: Dict[str, any]) Dict[str, List[str]]

Gets a dictionary specifying ordering. A key is a variable, a value is a list of variables that comes before.

Parameters

meta – Metadata.

Returns

Ordering.

lassobbn.learn.get_robust_stats(robust: pandas.core.frame.DataFrame, robust_threshold=0.9) pandas.core.frame.DataFrame

Computes the robustness statistics.

Parameters
  • robust – Data frame of robustness indicators.

  • robust_threshold – Threshold for robustness. Default is 0.9.

Returns

Data frame of variables that are robust.

lassobbn.learn.get_start_nodes(meta: Dict[str, any]) List[str]

Gets a list of start variables/nodes to kick off the algorithm.

Parameters

meta – Metadata.

Returns

Start nodes.

lassobbn.learn.learn_parameters(df_path: str, pas: Dict[str, List[str]]) Tuple[Dict[str, List[str]], networkx.classes.digraph.DiGraph, Dict[str, List[float]]]

Gets the parameters.

Parameters
  • df_path – CSV file.

  • pas – Parent-child relationships (structure).

Returns

Tuple; first item is dictionary of domains; second item is a graph; third item is dictionary of probabilities.

lassobbn.learn.learn_structure(df_path: str, meta_path: str, n_way=3, ignore_neg_gt=- 0.1, ignore_pos_lt=0.1, n_regressions=10, solver='liblinear', penalty='l1', C=0.2, robust_threshold=0.9) Dict[str, List[str]]

Kicks off the learning process.

Parameters
  • df_path – CSV path.

  • meta_path – Metadata path.

  • n_way – Number of n-way interactions. Default is 3.

  • ignore_neg_gt – Threshold for ignoring negative coefficients.

  • ignore_pos_lt – Threshold for ignoring positive coefficients.

  • n_regressions – The number of regressions to do. Default is 10.

  • solver – Solver. Default is liblinear.

  • penalty – Penalty. Default is l1.

  • C – Regularization strength. Default is 0.2.

  • robust_threshold – Robustness threshold. Default is 0.9.

Returns

Dictionary where keys are children and values are list of parents.

lassobbn.learn.posteriors_to_df(jt: pybbn.graph.jointree.JoinTree) pandas.core.frame.DataFrame

Converts posteriors to data frame.

Parameters

jt – Join tree.

Returns

Data frame.

lassobbn.learn.to_bbn(d: Dict[str, List[str]], s: networkx.classes.digraph.DiGraph, p: Dict[str, List[float]]) pybbn.graph.dag.Bbn

Converts the structure and parameters to a BBN.

Parameters
  • d – Domain of each variable.

  • s – Structure.

  • p – Parameter.

Returns

BBN.

lassobbn.learn.to_join_tree(bbn: pybbn.graph.dag.Bbn) pybbn.graph.jointree.JoinTree

Converts a BBN to a Join Tree.

Parameters

bbn – BBN.

Returns

Join Tree.

lassobbn.learn.to_robustness_indication(params: pandas.core.frame.DataFrame, ignore_neg_gt=- 0.1, ignore_pos_lt=0.1) pandas.core.frame.DataFrame

Checks if each coefficient value is “robust”. A coefficient is NOT robust if it is less ignore_neg_gt or if it is less than ignore_pos_lt.

Parameters
  • params – Data frame of parameters.

  • ignore_neg_gt – Threshold. Default is -0.1.

  • ignore_pos_lt – Threshold. Default is 0.1.

Returns

Data frame (all 1’s and 0’s) indicating robustness.

lassobbn.learn.trim_parents(parents: List[str]) List[str]

Prunes or trims down the list of parents. There might be duplicates as a result of compound or n-way interactions.

Parameters

parents – List of parents.

Returns

List of (pruned/trimmed) parents.

lassobbn.learn.trim_relationships(rels: Dict[str, List[str]]) Dict[str, List[str]]

Trims/prune parent-child relationships.

Parameters

rels – Dictionary of parent-child relationships.

Returns

Dictionary of trimmed parent-child relationships.