sevnpy.io.LogSchema
- class sevnpy.io.logschema.LogSchema(schema: Dict[str, str | Type[int] | Type[float] | Tuple[str | Type[int] | Type[float], str]] = {})[source]
Bases:
objectThis class is used to define and handle the structure of a given log file and produce the correspondent matching pattern There are four important keyword used in the class:
- name: is the label used to identify the data stored in a given position of the log structure, this can be
used as the name of the column on a dataframe.
description: short description of the data stored at a given name
- kind: is the type of the data identified by the name, it could be “int”,”id”,”float”,”name”,int,str,float
or a string. If it is a string (but not “int”,”id”,”float”,”name”,”type”), it is called named_kind and the kind specifies exactly the searching pattern otherwise the searching pattern will defined by the value stored in kind. For example if kind=”S”, the searching pattern will contain exactly “S”, but if kind=”id” the searching pattern will be “[0-9]+”
pattern: is the regex searching pattern associated with name and kind
type: data type associated with each name, it depends on the kind value:
“id”, “type”, “int” or int -> int
“float” or float -> float
- “name” -> str
In all the other cases, i.e. when kind is a string (but not “int”,”id”,”float”,”name”,”type”), the matching pattern will be inferred:
If the string can be transformed to an integer -> int
If the string can be transformed to a float -> float
Otherwise -> str
So, each element in a LogSchema is defined by the name, kind and pattern. Name and kind are defined by the user at the class instantiation or using the method add_item (see example below). The pattern and the type is instead generated by the class based on the kind value (see above). Actually there is a fifth element that is the regex_pattern that is estimated only when the method regex_pattern is called and it includes the pattern + the extra paranthesis needed to create a capturing or non capturing group based on method input
Examples
The class is used to define an obejct containing the information about the structure of a SEVN-like logfile, for example assume that we have the following log structure B;<name>;<id>;CIRC;<tiem>;<semimajor_axis_ini>:<eccentricity_ini>:<semimajor_axis_post>:eccentricity_post> e.g.: B;857175750378006;0;CIRC;1.874849e+01;38.2411:0.000633313:38.2411:0
We can use initialize a LogSchema for the header like
>>> header=LogSchema({"logtype":("B",""), "name":("name","unique identifier"), "ID":("id",""),"event":("CIRC",""),"Worldtime":(float,"time in Myr")})
While for the body we can use another LogSchema, let’start from a empty initilisation
>>> body=LogSchema() >>> body.add_item("semimajor_axis_ini",float,"pre circularisation semimajor axis Rsun") >>> body.add_item("eccentriciy_ini",float,"pre circularisation eccentricity") >>> body.add_item("semimajor_axis_post",float,"post circularisation semimajor axis Rsun") >>> body.add_item("eccentriciy_post",float,"pre circularisation eccentricity")
If we want to check the different schema
>>> body.kind_schema >>> {'semimajor_axis_ini': <class 'float'>, 'eccentriciy_ini': <class 'float'>, 'semimajor_axis_post': <class 'float'>, 'eccentriciy_post': <class 'float'>} >>> body.type_schema >>> {'semimajor_axis_ini': <class 'float'>, 'eccentriciy_ini': <class 'float'>, 'semimajor_axis_post': <class 'float'>, 'eccentriciy_post': <class 'float'>} >>> body.pattern_schema >>> {'semimajor_axis_ini': '[+|-]?[0-9]+\.?[0-9]*(?i:e)?[+|-]?[0-9]*|(?i:nan)', 'eccentriciy_ini': '[+|-]?[0-9]+\.?[0-9]*(?i:e)?[+|-]?[0-9]*|(?i:nan)', 'semimajor_axis_post': '[+|-]?[0-9]+\.?[0-9]*(?i:e)?[+|-]?[0-9]*|(?i:nan)', 'eccentriciy_post': '[+|-]?[0-9]+\.?[0-9]*(?i:e)?[+|-]?[0-9]*|(?i:nan)'}Finally, if we want to get the regex pattern considering only the semimajor axis properteis as capturing items
>>> body.regex_pattern(capturing_names=('semimajor_axis_ini','semimajor_axis_post')) >>> {'semimajor_axis_ini': '([+|-]?[0-9]+\.?[0-9]*(?i:e)?[+|-]?[0-9]*|(?i:nan))', 'eccentriciy_ini': '(?:[+|-]?[0-9]+\.?[0-9]*(?i:e)?[+|-]?[0-9]*|(?i:nan))', 'semimajor_axis_post': '([+|-]?[0-9]+\.?[0-9]*(?i:e)?[+|-]?[0-9]*|(?i:nan))', 'eccentriciy_post': '(?:[+|-]?[0-9]+\.?[0-9]*(?i:e)?[+|-]?[0-9]*|(?i:nan))'}
Methods
Add an item to the Schema
Return a dictionary to map the index of the column in the logschema to the correspondent name :param offset: Offset to use to the define the columns :type offset: int
Get the default capturing name, i.e. the item with a kind that is a string but not "name","id","str","float","int".
Create and return a dictionary containing all the information about the schema
Remove an item of given name form the schema
Reset the schema
update- __init__(schema: Dict[str, str | Type[int] | Type[float] | Tuple[str | Type[int] | Type[float], str]] = {})[source]
- Parameters:
schema – dictionary containing the schema with the triple name:kind:description
- __weakref__
list of weak references to the object
- add_item(name: str, kind: str | Type[int] | Type[float], description: str = '')[source]
Add an item to the Schema
- Parameters:
name – name of the new item
kind – kind of the new item
description – a short description of the item
- column_schema(offset: int = 0) Dict[int, str][source]
Return a dictionary to map the index of the column in the logschema to the correspondent name :param offset: Offset to use to the define the columns :type offset: int
- Returns:
column_schema – A Dictionary containg index:name pairs
- Return type:
Dictionary
- default_capturing_names() List[str][source]
Get the default capturing name, i.e. the item with a kind that is a string but not “name”,”id”,”str”,”float”,”int”. :returns: capturing_names_temp – A list with the default capturing names (already sorted) :rtype: List
- property description_schema: Dict
description
- Type:
d Dctionary containing the pair name
- property descriptions: List[str]
List of all the descriptions
- full_schema() Dict[str, _ItemDict][source]
Create and return a dictionary containing all the information about the schema
- Returns:
full_schema – A dictionary in which each item is a pair name:dictionary structured as follow {<item_name>: {“name”:<item_name>, “description”:<item_description>, “kind”:<item_kind>, “type”:<item_type>, “pattern”:<item_pattern>}}
- Return type:
Dictionary
- property kind_schema: Dict
kind
- Type:
dictionary containing the pair name
- property kinds: List[str | Type[int] | Type[float]]
List of all the kinds
- property names: List[str]
List of al the names
- property pattern_schema
type
- Type:
Dictionary containing the pair name
- property patterns: List[str]
List of all the patterns
- pop(name: str)[source]
Remove an item of given name form the schema
- Parameters:
name – Name of the item in the schema to remove
- regex_pattern(capturing_names: Literal['default', 'all'] | List[str] | Tuple[str, ...] = 'default') Tuple[Dict[str, str], List[str]][source]
- Parameters:
capturing_names ("default","all", iterable) – A list of names to be included as capturing members. If the string “default” is used all the items in the schema will be captured except for the named item, i.e. the item with a kind that is a string but not “name”,”id”,”str”,”float”,”int”. If the string “all” is used all the items in the schema will be captured
- Returns:
regex_pattern (Dictionary) – a string containing the pair name:regex_pattern
capturing_names (List) – A sorted list of the capturing names
- property schema: Dict[str, str | Type[int] | Type[float] | Tuple[str | Type[int] | Type[float], str]]
The schema of the log reader
- property types: List[Type[int] | Type[float]]
List of all the types