Documente Academic
Documente Profesional
Documente Cultură
ChoETL is an open source ETL (extract, transform and load) framework for .NET. It is a code based library for extracting data from
multiple sources, transforming, and loading into your very own data warehouse in .NET environment. You can have data in your data
warehouse in no time.
Contents
1. Introduction
2. Requirement
3. "Hello World!" Sample
8.1. DefaultValue
8.2. ChoFallbackValue
8.3. Type Converters
8.4. Validations
8.5. ChoIgnoreMember
8.6. StringLength
8.6. Display
8.7. DisplayName
10.8 RecordWriteFieldError
11. Customization
12. Using Dynamic Object
13. Exceptions
15. Using MetadataType Annotation
16. Configuration Choices
16.1 Manual Configuration
16.2 Auto Map Configuration
16.3 Attaching MetadataType class
21.1. NullValueHandling
21.2. Formatting
21.3 WithFields
21.4 WithField
21.5. IgnoreFieldValueMode
21.6 ColumnCountStrict
21.7. Configure
21.8. Setup
22. FAQ
1. Introduction
ChoETL is an open source ETL (extract, transform and load) framework for .NET. It is a code based library for extracting data from
multiple sources, transforming, and loading into your very own data warehouse in .NET environment. You can have data in your data
warehouse in no time.
Apache Parquet, an open source file format for Hadoop. Parquet stores nested data structures in a flat columnar format.
Compared to a traditional approach where data is stored in row-oriented approach, parquet is more efficient in terms of storage
and performance.
This article talks about using ChoParquetWriter component offered by ChoETL framework. It is a simple utility class to save
Parquet data to a file / external data source.
Features:
https://www.codeproject.com/Articles/5271468/Cinchoo-ETL-Parquet-Writer?display=Print 2/35
19/06/2020 Cinchoo ETL - Parquet Writer - CodeProject
Uses Parquet.NET library under the hood, to generate Parquet file in seconds.
Supports culture specific date, currency and number formats while generating files.
Provides fine control of date, currency, enum, boolean, number formats when writing files.
Detailed and robust error handling, allowing you to quickly find and fix problems.
Shorten your development time.
2. Requirement
This framework library is written in C# using .NET 4.5 Framework / .NET core 2.x.
Install-Package ChoETL.Parquet
Let's begin by looking into a simple example of generating the below Parquet file having 2 columns
There are number of ways you can get the Parquet file be parsed with minimal setup.
rec2.Id = 2;
rec2.Name = "Jason";
objs.Add(rec2);
In the above sample, we give the list of dynamic objects to ParquetWriter at one pass to write them to Parquet file.
In the above sample, we take control of constructing, passing each and individual dynamic record to the ParquetWriter to generate
the Parquet file using Write overload.
In above, the POCO class defines two properties matching the sample Parquet file template.
Above sample shows how to create Parquet file from typed POCO class objects.
https://www.codeproject.com/Articles/5271468/Cinchoo-ETL-Parquet-Writer?display=Print 4/35
19/06/2020 Cinchoo ETL - Parquet Writer - CodeProject
3.3. Configuration First Approach
In this model, we define the Parquet configuration with all the necessary parameters along with Parquet columns required to
generate the sample Parquet file.
In above, the class defines two Parquet properties matching the sample Parquet file template.
The above sample code shows how to generate Parquet file from list of dynamic objects using predefined Parquet configuration
setup. In the ParquetWriter constructor, we specified the Parquet configuration configuration object to obey the Parquet layout
schema while creating the file. If there are any mismatch in the name or count of Parquet columns, will be reported as error and
stops the writing process.
Above sample code shows how to generate Parquet file from list of POCO objects with Parquet configuration object. In the
ParquetWriter constructor, we specified the Parquet configuration configuration object.
https://www.codeproject.com/Articles/5271468/Cinchoo-ETL-Parquet-Writer?display=Print 5/35
19/06/2020 Cinchoo ETL - Parquet Writer - CodeProject
[ChoParquetRecordField]
[DefaultValue("XXXX")]
public string Name
{
get;
set;
}
The code above illustrates about defining POCO object with nessasary attributes required to generate Parquet file. First thing
defines property for each record field with ChoParquetRecordFieldAttribute to qualify for Parquet record mapping.
Id is a required property. We decorated it with RequiredAttribute. Name is given default value using
DefaultValueAttribute. It means that if the Name value is not set in the object, ParquetWriter spits the default value
'XXXX' to the file.
We start by creating a new instance of ChoParquetWriter object. That's all. All the heavy lifting of genering Parquet data from
the objects is done by the writer under the hood.
By default, ParquetWriter discovers and uses default configuration parameters while saving Parquet file. These can be
overridable according to your needs. The following sections will give you in-depth details about each configuration attributes.
or:
This model keeps your code elegant, clean, easy to read and maintain.
5. Write Records Manually
This is an alternative way to write each and individual record to Parquet file in case when the POCO objects are constructed in a
disconnected way.
writer.Write(rec1);
writer.Write(rec2);
[ChoParquetRecordObject]
public class EmployeeRec
{
[ChoParquetRecordField]
public int Id { get; set; }
[ChoParquetRecordField]
[Required]
[DefaultValue("ZZZ")]
https://www.codeproject.com/Articles/5271468/Cinchoo-ETL-Parquet-Writer?display=Print 7/35
19/06/2020 Cinchoo ETL - Parquet Writer - CodeProject
Here are the available attributes to carry out customization of Parquet load operation on a file.
IgnoreAndContinue - Ignore the error, record will be skipped and continue with next.
ReportAndContinue - Report the error to POCO entity if it is of IChoNotifyRecordWrite type
ThrowAndStop - Throw the error and stop the execution
ObjectValidationMode - A flag to let the reader know about the type of validation to be performed with record
object. Possible values are:
Here are the available members to add some customization to it for each property:
FieldName - Parquet field name. If not specified, POCO object property name will be used as field name.
Size - Size of Parquet column value.
NullValue - Special null value text expect to be treated as null value from Parquet file at the field level.
ErrorMode - This flag indicates if an exception should be thrown if writing and an expected field failed to convert and
write. Possible values are:
IgnoreAndContinue - Ignore the error and continue to load other properties of the record.
ReportAndContinue - Report the error to POCO entity if it is of IChoRecord type.
ThrowAndStop - Throw the error and stop the execution.
8.1. DefaultValue
Any POCO entity property can be specified with default value using
System.ComponentModel.DefaultValueAttribute. It is the value used to write when the Parquet value null
(controlled via IgnoreFieldValueMode).
8.2. ChoFallbackValue
https://www.codeproject.com/Articles/5271468/Cinchoo-ETL-Parquet-Writer?display=Print 8/35
19/06/2020 Cinchoo ETL - Parquet Writer - CodeProject
Any POCO entity property can be specified with fallback value using ChoETL.ChoFallbackValueAttribute. It is the
value used when the property is failed to writer to Parquet. Fallback value only set when ErrorMode is either
IgnoreAndContinue or ReportAndContinue.
There are couple of ways you can specify the converters for each field
Declarative Approach
Configuration Approach
This model is applicable to POCO entity object only. If you have POCO class, you can specify the converters to each property to
carry out necessary conversion on them. Samples below shows the way to do it.
In the example above, we defined custom IntConverter class. And showed how to format 'Id' Parquet property with leading
zeros.
This model is applicable to both dynamic and POCO entity object. This gives freedom to attach the converters to each property at
runtime. This takes the precedence over the declarative converters on POCO classes.
idConfig.AddConverter(new IntConverter());
config.ParquetRecordFieldConfigurations.Add(idConfig);
config.ParquetRecordFieldConfigurations.Add(new ChoParquetRecordFieldConfiguration("Name"));
config.ParquetRecordFieldConfigurations.Add(new ChoParquetRecordFieldConfiguration("Name1"));
In above, we construct and attach the IntConverter to 'Id' field using AddConverter helper method in
ChoParquetRecordFieldConfiguration object.
Likewise, if you want to remove any converter from it, you can use RemoveConverter on ChoParquetRecordFieldConfiguration
object.
This approach allows to attach value converter to each Parquet member using Fluenrt API. This is quick way to handle any odd
conversion process and avoid creating value converter class.
With the fluent API, sample below shows how to attach value converter to Id column
8.4. Validations
ParquetWriter leverages both System.ComponentModel.DataAnnotations and Validation
Block validation attributes to specify validation rules for individual fields of POCO entity. Refer to the MSDN site for a list of
available DataAnnotations validation attributes.
[ChoParquetRecordObject]
public partial class EmployeeRec
{
[ChoParquetRecordField(1, FieldName = "id")]
[ChoTypeConverter(typeof(IntConverter))]
[Range(1, int.MaxValue, ErrorMessage = "Id must be > 0.")]
[ChoFallbackValue(1)]
public int Id { get; set; }
[ChoParquetRecordField]
[Required]
[DefaultValue("ZZZ")]
[ChoFallbackValue("XXX")]
public string Name { get; set; }
}
https://www.codeproject.com/Articles/5271468/Cinchoo-ETL-Parquet-Writer?display=Print 10/35
19/06/2020 Cinchoo ETL - Parquet Writer - CodeProject
Some cases, you may want to take control and perform manual self validation within the POCO entity class. This can be achieved
by inheriting POCO object from IChoValidatable interface.
[ChoParquetRecordObject]
public partial class EmployeeRec : IChoValidatable
{
[ChoParquetRecordField]
[ChoTypeConverter(typeof(IntConverter))]
[Range(1, int.MaxValue, ErrorMessage = "Id must be > 0.")]
[ChoFallbackValue(1)]
public int Id { get; set; }
[ChoParquetRecordField]
[Required]
[DefaultValue("ZZZ")]
[ChoFallbackValue("XXX")]
public string Name { get; set; }
TryValidate - Validate entire object, return true if all validation passed. Otherwise return false.
TryValidateFor - Validate specific property of the object, return true if all validation passed. Otherwise return false.
8.5. ChoIgnoreMember
If you want to ignore a POCO class member from Parquet parsing in OptOut mode, decorate them with
ChoIgnoreMemberAttribute. Sample below shows Title member is ignored from Parquet loading process.
Hide Copy Code
8.6. StringLength
In OptOut mode, you can specify the size of the Parquet column by using
System.ComponentModel.DataAnnotations.StringLengthAttribute.
https://www.codeproject.com/Articles/5271468/Cinchoo-ETL-Parquet-Writer?display=Print 11/35
19/06/2020 Cinchoo ETL - Parquet Writer - CodeProject
Hide Copy Code
8.6. Display
In OptOut mode, you can specify the name of Parquet column mapped to member using
System.ComponentModel.DataAnnotations.DisplayAttribute.
Listing 8.6.1 Specifying name of Parquet field
8.7. DisplayName
In OptOut mode, you can specify the name of Parquet column mapped to member using
System.ComponentModel.DataAnnotations.DisplayNameAttribute.
Listing 8.7.1 Specifying name of Parquet field
https://www.codeproject.com/Articles/5271468/Cinchoo-ETL-Parquet-Writer?display=Print 12/35
19/06/2020 Cinchoo ETL - Parquet Writer - CodeProject
In order to participate in the callback mechanism, Either POCO entity object or DataAnnotation's MetadataType type object must be
inherited by IChoNotifyRecordWrite interface.
Tip: Any exceptions raised out of these interface methods will be ignored.
FileHeaderArrange - Raised before Parquet file header is written to file, an opportunity to rearrange the Parquet
columns
FileHeaderWrite - Raised before Parquet file header is written to file, an opportunity to customize the header.
Sample below shows how to use the BeforeRecordLoad callback method to skip lines stating with '%' characters.
parser.Write(rec);
https://www.codeproject.com/Articles/5271468/Cinchoo-ETL-Parquet-Writer?display=Print 13/35
19/06/2020 Cinchoo ETL - Parquet Writer - CodeProject
}
}
Likewise you can use other callback methods as well with ParquetWriter.
[ChoParquetRecordObject]
public partial class EmployeeRec : IChoNotifyrRecordWrite
{
[ChoParquetRecordField]
[ChoTypeConverter(typeof(IntConverter))]
[Range(1, int.MaxValue, ErrorMessage = "Id must be > 0.")]
[ChoFallbackValue(1)]
public int Id { get; set; }
[ChoParquetRecordField]
[Required]
[DefaultValue("ZZZ")]
[ChoFallbackValue("XXX")]
public string Name { get; set; }
public bool RecordWriteError(object target, int index, object source, Exception ex)
{
throw new NotImplementedException();
}
}
Sample below shows how to attach Metadata class to POCO class by using MetadataTypeAttribute on it.
[ChoParquetRecordObject]
public class EmployeeRecMeta : IChoNotifyRecordWrite
{
[ChoParquetRecordField]
[ChoTypeConverter(typeof(IntConverter))]
[Range(1, int.MaxValue, ErrorMessage = "Id must be > 0.")]
[ChoFallbackValue(1)]
public int Id { get; set; }
[ChoParquetRecordField]
[Required]
[DefaultValue("ZZZ")]
[ChoFallbackValue("XXX")]
public string Name { get; set; }
{
throw new NotImplementedException();
}
public bool RecordWriteError(object target, int index, object source, Exception ex)
{
throw new NotImplementedException();
}
}
[MetadataType(typeof(EmployeeRecMeta))]
public partial class EmployeeRec
{
public int Id { get; set; }
public string Name { get; set; }
}
Sample below shows how to attach Metadata class for sealed or third party POCO class by using ChoMetadataRefTypeAttribute on
it.
[ChoMetadataRefType(typeof(EmployeeRec))]
[ChoParquetRecordObject]
public class EmployeeRecMeta : IChoNotifyRecordWrite
{
[ChoParquetRecordField]
[ChoTypeConverter(typeof(IntConverter))]
[Range(1, int.MaxValue, ErrorMessage = "Id must be > 0.")]
[ChoFallbackValue(1)]
public int Id { get; set; }
[ChoParquetRecordField]
[Required]
[DefaultValue("ZZZ")]
[ChoFallbackValue("XXX")]
public string Name { get; set; }
public bool RecordWriteError(object target, int index, object source, Exception ex)
{
throw new NotImplementedException();
}
}
10.1 BeginWrite
This callback invoked once at the beginning of the Parquet file write. source is the Parquet file stream object. In here you have
chance to inspect the stream, return true to continue the Parquet generation. Return false to stop the generation.
https://www.codeproject.com/Articles/5271468/Cinchoo-ETL-Parquet-Writer?display=Print 15/35
19/06/2020 Cinchoo ETL - Parquet Writer - CodeProject
10.2 EndWrite
This callback invoked once at the end of the Parquet file generation. source is the Parquet file stream object. In here you have
chance to inspect the stream, do any post steps to be performed on the stream.
10.3 BeforeRecordWrite
This callback invoked before each POCO record object is written to Parquet file. target is the instance of the POCO record
object. index is the line index in the file. source is the Parquet record line. In here you have chance to inspect the POCO object, and
generate the Parquet record line if needed.
Tip: If you want to skip the record from writing, set the source to null.
Return true to continue the load process, otherwise return false to stop the process.
10.4 AfterRecordWrite
This callback invoked after each POCO record object is written to Parquet file. target is the instance of the POCO record object.
index is the line index in the file. source is the Parquet record line. In here you have chance to do any post step operation with the
record line.
Return true to continue the load process, otherwise return false to stop the process.
10.5 RecordWriteError
This callback invoked if error encountered while writing POCO record object. target is the instance of the POCO record object. index
is the line index in the file. source is the Parquet record line. ex is the exception object. In here you have chance to handle the
exception. This method invoked only when Configuration.ErrorMode is ReportAndContinue.
Return true to continue the load process, otherwise return false to stop the process.
https://www.codeproject.com/Articles/5271468/Cinchoo-ETL-Parquet-Writer?display=Print 16/35
19/06/2020 Cinchoo ETL - Parquet Writer - CodeProject
public bool RecordLoadError(object target, int index, object source, Exception ex)
{
return true;
}
10.6 BeforeRecordFieldWrite
This callback invoked before each Parquet record column is written to Parquet file. target is the instance of the POCO record object.
index is the line index in the file. propName is the Parquet record property name. value is the Parquet column value. In here, you
have chance to inspect the Parquet record property value and perform any custom validations etc.
Return true to continue the load process, otherwise return false to stop the process.
public bool BeforeRecordFieldWrite(object target, int index, string propName, ref object value)
{
return true;
}
10.7 AfterRecordFieldWrite
This callback invoked after each Parquet record column value is written to Parquet file. target is the instance of the POCO record
object. index is the line index in the file. propName is the Parquet record property name. value is the Parquet column value. Any post
field operation can be performed here, like computing other properties, validations etc.
Return true to continue the load process, otherwise return false to stop the process.
public bool AfterRecordFieldWrite(object target, int index, string propName, object value)
{
return true;
}
10.8 RecordWriteFieldError
This callback invoked when error encountered while writing Parquet record column value. target is the instance of the POCO record
object. index is the line index in the file. propName is the Parquet record property name. value is the Parquet column value. ex is the
exception object. In here you have chance to handle the exception. This method invoked only after the below two sequences of
steps performed by the ParquetWriter
ParquetWriter looks for FallbackValue value of each Parquet property. If present, it tries to use it to write.
If the FallbackValue value not present and the Configuration.ErrorMode is specified as ReportAndContinue., this callback will
be executed.
Return true to continue the load process, otherwise return false to stop the process.
11. Customization
ParquetWriter automatically detects and loads the configuration settings from POCO entity. At runtime, you can customize
and tweak these parameters before Parquet generation. ParquetWriter exposes Configuration property, it is of
https://www.codeproject.com/Articles/5271468/Cinchoo-ETL-Parquet-Writer?display=Print 17/35
19/06/2020 Cinchoo ETL - Parquet Writer - CodeProject
ChoParquetRecordConfiguration object. Using this property, you can perform the customization.
Listing 11.1 Customizing ParquetWriter at run-time
class Program
{
static void Main(string[] args)
{
List<ExpandoObject> objs = new List<ExpandoObject>();
dynamic rec1 = new ExpandoObject();
rec1.Id = 1;
rec1.Name = "Mark";
objs.Add(rec1);
class Program
{
static void Main(string[] args)
{
List<ExpandoObject> objs = new List<ExpandoObject>();
dynamic rec1 = new ExpandoObject();
rec1.Id = 1;
rec1.Name = "Mark";
objs.Add(rec1);
13. Exceptions
https://www.codeproject.com/Articles/5271468/Cinchoo-ETL-Parquet-Writer?display=Print 18/35
19/06/2020 Cinchoo ETL - Parquet Writer - CodeProject
[MetadataType(typeof(EmployeeRecMeta))]
public class EmployeeRec
{
public int Id { get; set; }
public string Name { get; set; }
}
[ChoParquetRecordObject]
public class EmployeeRecMeta : IChoNotifyRecordWrite, IChoValidatable
{
[ChoParquetRecordField]
[ChoTypeConverter(typeof(IntConverter))]
[Range(1, 1, ErrorMessage = "Id must be > 0.")]
[ChoFallbackValue(1)]
public int Id { get; set; }
[ChoParquetRecordField]
[StringLength(1)]
[DefaultValue("ZZZ")]
[ChoFallbackValue("XXX")]
public string Name { get; set; }
public bool RecordWriteError(object target, int index, object source, Exception ex)
{
throw new NotImplementedException();
}
https://www.codeproject.com/Articles/5271468/Cinchoo-ETL-Parquet-Writer?display=Print 19/35
19/06/2020 Cinchoo ETL - Parquet Writer - CodeProject
}
}
In above EmployeeRec is the data class. Contains only domain specific properties and operations. Mark it very simple class to look at
it.
We separate the validation, callback mechanism, configuration etc into metadata type class, EmployeeRecMeta.
Manual Configuration
Auto Map Configuration
Attaching MetadataType class
I'm going to show you how to configure the below POCO entity class on each approach
16.1 Manual Configuration
Define a brand new configuration object from scratch and add all the necessary Parquet fields to the
ChoParquetConfiguration.ParquetRecordFieldConfigurations collection property. This option gives you greater flexibility to control
the configuration of Parquet parsing. But the downside is that possibility of making mistakes and hard to manage them if the
Parquet file layout is large,
16.2 Auto Map Configuration
This is an alternative approach and very less error-prone method to auto map the Parquet columns for the POCO entity class.
First define a schema class for EmployeeRec POCO entity class as below
[ChoParquetRecordField]
public string Name { get; set; }
}
Then you can use it to auto map Parquet columns by using ChoParquetRecordConfiguration.MapRecordFields method
https://www.codeproject.com/Articles/5271468/Cinchoo-ETL-Parquet-Writer?display=Print 20/35
19/06/2020 Cinchoo ETL - Parquet Writer - CodeProject
This model, accounts for everything by defining MetadataType class and specifying the Parquet configuration parameters
declaratively. This is useful when your POCO entity is sealed and not partial class. Also it is one of favorable and less error-
prone approach to configure Parquet parsing of POCO entity.
[ChoParquetRecordObject]
public class EmployeeRecMeta : IChoNotifyRecordWrite, IChoValidatable
{
[ChoParquetRecordField]
[ChoTypeConverter(typeof(IntConverter))]
[Range(1, 1, ErrorMessage = "Id must be > 0.")]
public int Id { get; set; }
[ChoParquetRecordField]
[StringLength(1)]
[DefaultValue("ZZZ")]
[ChoFallbackValue("XXX")]
public string Name { get; set; }
public bool RecordWriteError(object target, int index, object source, Exception ex)
{
throw new NotImplementedException();
}
https://www.codeproject.com/Articles/5271468/Cinchoo-ETL-Parquet-Writer?display=Print 21/35
19/06/2020 Cinchoo ETL - Parquet Writer - CodeProject
//Attach metadata
ChoMetadataObjectCache.Default.Attach<EmployeeRec>(new EmployeeRecMeta());
https://www.codeproject.com/Articles/5271468/Cinchoo-ETL-Parquet-Writer?display=Print 22/35
19/06/2020 Cinchoo ETL - Parquet Writer - CodeProject
1. ChoParquetRecordConfiguration.CultureInfo - Represents information about a specific culture including the names of the
culture, the writing system, and the calendar used, as well as access to culture-specific objects that provide information for
common operations, such as formatting dates and sorting strings. Default is 'en-US'.
2. ChoTypeConverterFormatSpec - It is global format specifier class holds all the intrinsic .NET types formatting specs.
In this section, I'm going to talk about changing the default format specs for each .NET intrinsic data types according to parsing
needs.
ChoTypeConverterFormatSpec is singleton class, the instance is exposed via 'Instance' static member. It is thread local, means that
there will be separate instance copy kept on each thread.
There are 2 sets of format specs members given to each intrinsic type, one for loading and another one for writing the value, except
for Boolean, Enum, DataTime types. These types have only one member for both loading and writing operations.
Specifying each intrinsic data type format specs through ChoTypeConverterFormatSpec will impact system wide. ie. By
setting ChoTypeConverterFormatSpec.IntNumberStyle = NumberStyles.AllowParentheses, will impact all integer members of Parquet
objects to allow parentheses. If you want to override this behavior and take control of specific Parquet data member to handle its
own unique parsing of Parquet value from global system wide setting, it can be done by specifying TypeConverter at the Parquet
field member level. Refer section 13.4 for more information.
Listing 20.1.1 ChoTypeConverterFormatSpec Members
https://www.codeproject.com/Articles/5271468/Cinchoo-ETL-Parquet-Writer?display=Print 23/35
19/06/2020 Cinchoo ETL - Parquet Writer - CodeProject
Sample below shows how to load Parquet data stream having 'se-SE' (Swedish) culture specific data using ParquetWriter. Also the
input feed comes with 'EmployeeNo' values containing parentheses. In order to make the load successful, we have to set the
ChoTypeConverterFormatSpec.IntNumberStyle to NumberStyles.AllowParenthesis.
20.2 Currency Support
Cinchoo ETL provides ChoCurrency object to read and write currency values in Parquet files. ChoCurrency is a wrapper class to hold
the currency value in decimal type along with support of serializing them in text format during Parquet load.
https://www.codeproject.com/Articles/5271468/Cinchoo-ETL-Parquet-Writer?display=Print 24/35
19/06/2020 Cinchoo ETL - Parquet Writer - CodeProject
parser.Write(objs);
}
}
Sample above shows how to output currency values using dynamic object model. As the currency output will have thousand comma
separator, this will fail to generate Parquet file. To overcome this issue, we specify the writer to quote all fields.
PS: The format of the currency value is figured by ParquetWriter through ChoRecordConfiguration.Culture and
ChoTypeConverterFormatSpec.CurrencyFormat.
Sample below shows how to use ChoCurrency Parquet field in POCO entity class.
20.3 Enum Support
Cinchoo ETL implicitly handles parsing/writing of enum column values from Parquet files. If you want to fine control the parsing of
these values, you can specify them globally via ChoTypeConverterFormatSpec.EnumFormat. Default
is ChoEnumFormatSpec.Value
20.4 Boolean Support
Cinchoo ETL implicitly handles parsing/writing of boolean Parquet column values from Parquet files. If you want to fine control the
parsing of these values, you can specify them globally via ChoTypeConverterFormatSpec.BooleanFormat. Default value
is ChoBooleanFormatSpec.ZeroOrOne
https://www.codeproject.com/Articles/5271468/Cinchoo-ETL-Parquet-Writer?display=Print 26/35
19/06/2020 Cinchoo ETL - Parquet Writer - CodeProject
rec2.Status = EmployeeType.Contract;
objs.Add(rec2);
20.5 DateTime Support
Cinchoo ETL implicitly handles parsing/writing of datetime Parquet column values from Parquet files using system Culture or custom
set culture. If you want to fine control the parsing of these values, you can specify them globally
via ChoTypeConverterFormatSpec.DateTimeFormat. Default value is 'd'.
You can use any valid standard or custom datetime .NET format specification to parse the datetime Parquet values from the file.
Note: As the datetime values contains Parquet separator, we instruct the writer to quote all fields.
21.1. NullValueHandling
Specifies null value handling options for the ChoParquetWriter
21.2. Formatting
Specifies formatting options for the ChoParquetWriter
21.3 WithFields
This API method specifies the list of Parquet fields to be considered for writing Parquet file. Other fields will be discarded. Field
names are case-insensitive.
21.4 WithField
This API method used to add Parquet column with specific date type, quote flag, and/or quote character. This method helpful in
dynamic object model, by specifying each and individual Parquet column with appropriate datatype.
.WithField("Id", typeof(int))
.WithField("Name"))
)
{
parser.Write(objs);
}
}
21.5. IgnoreFieldValueMode
Specifies ignore field value for the ChoParquetWriter
21.6 ColumnCountStrict
This API method used to set the ParquetWriter to perform check on column countnness before writing Parquet file.
21.7. Configure
This API method used to configure all configuration parameters which are not exposed via fluent API.
https://www.codeproject.com/Articles/5271468/Cinchoo-ETL-Parquet-Writer?display=Print 29/35
19/06/2020 Cinchoo ETL - Parquet Writer - CodeProject
rec2.Id = 200;
rec2.Name = "Lou";
objs.Add(rec2);
21.8. Setup
This API method used to setup the writer's parameters / events via fluent API.
22. FAQ
https://www.codeproject.com/Articles/5271468/Cinchoo-ETL-Parquet-Writer?display=Print 30/35
19/06/2020 Cinchoo ETL - Parquet Writer - CodeProject
https://www.codeproject.com/Articles/5271468/Cinchoo-ETL-Parquet-Writer?display=Print 31/35
19/06/2020 Cinchoo ETL - Parquet Writer - CodeProject
}
});
Name = "Tom",
EmpType = EmployeeType.Permanent
});
}
});
}
});
}
}
}
}
License
This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)
https://www.codeproject.com/Articles/5271468/Cinchoo-ETL-Parquet-Writer?display=Print 35/35