sevnpy.io.LogSchema

class sevnpy.io.logschema.LogSchema(schema: Dict[str, str | Type[int] | Type[float] | Tuple[str | Type[int] | Type[float], str]] = {})[source]

Bases: object

This class is used to define and handle the structure of a given log file and produce the correspondent matching pattern There are four important keyword used in the class:

name: is the label used to identify the data stored in a given position of the log structure, this can be
used as the name of the column on a dataframe.

description: short description of the data stored at a given name

kind: is the type of the data identified by the name, it could be “int”,”id”,”float”,”name”,int,str,float
or a string. If it is a string (but not “int”,”id”,”float”,”name”,”type”), it is called named_kind and the kind specifies exactly the searching pattern otherwise the searching pattern will defined by the value stored in kind. For example if kind=”S”, the searching pattern will contain exactly “S”, but if kind=”id” the searching pattern will be “[0-9]+”

pattern: is the regex searching pattern associated with name and kind

type: data type associated with each name, it depends on the kind value:

“id”, “type”, “int” or int -> int

“float” or float -> float

“name” -> str
In all the other cases, i.e. when kind is a string (but not “int”,”id”,”float”,”name”,”type”), the matching pattern will be inferred:

If the string can be transformed to an integer -> int

If the string can be transformed to a float -> float

Otherwise -> str

So, each element in a LogSchema is defined by the name, kind and pattern. Name and kind are defined by the user at the class instantiation or using the method add_item (see example below). The pattern and the type is instead generated by the class based on the kind value (see above). Actually there is a fifth element that is the regex_pattern that is estimated only when the method regex_pattern is called and it includes the pattern + the extra paranthesis needed to create a capturing or non capturing group based on method input

Examples

The class is used to define an obejct containing the information about the structure of a SEVN-like logfile, for example assume that we have the following log structure B;<name>;<id>;CIRC;<tiem>;<semimajor_axis_ini>:<eccentricity_ini>:<semimajor_axis_post>:eccentricity_post> e.g.: B;857175750378006;0;CIRC;1.874849e+01;38.2411:0.000633313:38.2411:0

We can use initialize a LogSchema for the header like

>>> header=LogSchema({"logtype":("B",""), "name":("name","unique identifier"), "ID":("id",""),"event":("CIRC",""),"Worldtime":(float,"time in Myr")})

While for the body we can use another LogSchema, let’start from a empty initilisation

>>> body=LogSchema()
>>> body.add_item("semimajor_axis_ini",float,"pre circularisation semimajor axis Rsun")
>>> body.add_item("eccentriciy_ini",float,"pre circularisation eccentricity")
>>> body.add_item("semimajor_axis_post",float,"post circularisation semimajor axis Rsun")
>>> body.add_item("eccentriciy_post",float,"pre circularisation eccentricity")

If we want to check the different schema

>>> body.kind_schema
>>> {'semimajor_axis_ini': <class 'float'>, 'eccentriciy_ini': <class 'float'>, 'semimajor_axis_post': <class 'float'>, 'eccentriciy_post': <class 'float'>}
>>> body.type_schema
>>> {'semimajor_axis_ini': <class 'float'>, 'eccentriciy_ini': <class 'float'>, 'semimajor_axis_post': <class 'float'>, 'eccentriciy_post': <class 'float'>}
>>> body.pattern_schema
>>> {'semimajor_axis_ini': '[+|-]?[0-9]+\.?[0-9]*(?i:e)?[+|-]?[0-9]*|(?i:nan)', 'eccentriciy_ini': '[+|-]?[0-9]+\.?[0-9]*(?i:e)?[+|-]?[0-9]*|(?i:nan)', 'semimajor_axis_post': '[+|-]?[0-9]+\.?[0-9]*(?i:e)?[+|-]?[0-9]*|(?i:nan)', 'eccentriciy_post': '[+|-]?[0-9]+\.?[0-9]*(?i:e)?[+|-]?[0-9]*|(?i:nan)'}

Finally, if we want to get the regex pattern considering only the semimajor axis properteis as capturing items

>>> body.regex_pattern(capturing_names=('semimajor_axis_ini','semimajor_axis_post'))
>>> {'semimajor_axis_ini': '([+|-]?[0-9]+\.?[0-9]*(?i:e)?[+|-]?[0-9]*|(?i:nan))', 'eccentriciy_ini': '(?:[+|-]?[0-9]+\.?[0-9]*(?i:e)?[+|-]?[0-9]*|(?i:nan))', 'semimajor_axis_post': '([+|-]?[0-9]+\.?[0-9]*(?i:e)?[+|-]?[0-9]*|(?i:nan))', 'eccentriciy_post': '(?:[+|-]?[0-9]+\.?[0-9]*(?i:e)?[+|-]?[0-9]*|(?i:nan))'}

Methods

__init__

add_item

Add an item to the Schema

column_schema

Return a dictionary to map the index of the column in the logschema to the correspondent name :param offset: Offset to use to the define the columns :type offset: int

default_capturing_names

Get the default capturing name, i.e. the item with a kind that is a string but not "name","id","str","float","int".

full_schema

Create and return a dictionary containing all the information about the schema

pop

Remove an item of given name form the schema

regex_pattern

reset

Reset the schema

update

__init__(schema: Dict[str, str | Type[int] | Type[float] | Tuple[str | Type[int] | Type[float], str]] = {})[source]

Parameters:: schema – dictionary containing the schema with the triple name:kind:description

__repr__() → str[source]: Return repr(self).

__str__() → str[source]: Return str(self).

__weakref__: list of weak references to the object

add_item(name: str, kind: str | Type[int] | Type[float], description: str = '')[source]

Add an item to the Schema

Parameters:

name – name of the new item
kind – kind of the new item
description – a short description of the item

column_schema(offset: int = 0) → Dict[int, str][source]

Return a dictionary to map the index of the column in the logschema to the correspondent name :param offset: Offset to use to the define the columns :type offset: int

Returns:: column_schema – A Dictionary containg index:name pairs
Return type:: Dictionary

default_capturing_names() → List[str][source]: Get the default capturing name, i.e. the item with a kind that is a string but not “name”,”id”,”str”,”float”,”int”. :returns: capturing_names_temp – A list with the default capturing names (already sorted) :rtype: List

property description_schema: Dict

description

Type:: d Dctionary containing the pair name

property descriptions: List[str]: List of all the descriptions

full_schema() → Dict[str, _ItemDict][source]

Create and return a dictionary containing all the information about the schema

Returns:: full_schema – A dictionary in which each item is a pair name:dictionary structured as follow {<item_name>: {“name”:<item_name>, “description”:<item_description>, “kind”:<item_kind>, “type”:<item_type>, “pattern”:<item_pattern>}}
Return type:: Dictionary

property kind_schema: Dict

kind

Type:: dictionary containing the pair name

property kinds: List[str | Type[int] | Type[float]]: List of all the kinds

property names: List[str]: List of al the names

property pattern_schema

type

Type:: Dictionary containing the pair name

property patterns: List[str]: List of all the patterns

pop(name: str)[source]

Remove an item of given name form the schema

Parameters:: name – Name of the item in the schema to remove

regex_pattern(capturing_names: Literal['default', 'all'] | List[str] | Tuple[str, ...] = 'default') → Tuple[Dict[str, str], List[str]][source]

Parameters:

capturing_names ("default","all", iterable) – A list of names to be included as capturing members. If the string “default” is used all the items in the schema will be captured except for the named item, i.e. the item with a kind that is a string but not “name”,”id”,”str”,”float”,”int”. If the string “all” is used all the items in the schema will be captured

Returns:

regex_pattern (Dictionary) – a string containing the pair name:regex_pattern
capturing_names (List) – A sorted list of the capturing names

reset()[source]: Reset the schema

property schema: Dict[str, str | Type[int] | Type[float] | Tuple[str | Type[int] | Type[float], str]]: The schema of the log reader

property types: List[Type[int] | Type[float]]: List of all the types