Packages

case class DeltaLakeTableDataObject(id: DataObjectId, path: Option[String] = None, partitions: Seq[String] = Seq(), options: Map[String, String] = Map(), schemaMin: Option[GenericSchema] = None, table: Table, constraints: Seq[Constraint] = Seq(), expectations: Seq[Expectation] = Seq(), preReadSql: Option[String] = None, postReadSql: Option[String] = None, preWriteSql: Option[String] = None, postWriteSql: Option[String] = None, saveMode: SDLSaveMode = SDLSaveMode.Overwrite, allowSchemaEvolution: Boolean = false, retentionPeriod: Option[Int] = None, acl: Option[AclDef] = None, connectionId: Option[ConnectionId] = None, expectedPartitionsCondition: Option[String] = None, housekeepingMode: Option[HousekeepingMode] = None, metadata: Option[DataObjectMetadata] = None)(implicit instanceRegistry: InstanceRegistry) extends TransactionalTableDataObject with CanMergeDataFrame with CanEvolveSchema with CanHandlePartitions with HasHadoopStandardFilestore with ExpectationValidation with CanCreateIncrementalOutput with Product with Serializable

DataObject of type DeltaLakeTableDataObject. Provides details to access Tables in delta format to an Action.

Delta format maintains a transaction log in a separate _delta_log subfolder. The schema is registered in Metastore by DeltaLakeTableDataObject.

The following anomalies might occur: - table is registered in metastore but path does not exist -> table is dropped from metastore - table is registered in metastore but path is empty -> error is thrown. Delete the path to clean up - table is registered and path contains parquet files, but _delta_log subfolder is missing -> path is converted to delta format - table is not registered but path contains parquet files and _delta_log subfolder -> Table is registered - table is not registered but path contains parquet files without _delta_log subfolder -> path is converted to delta format and table is registered - table is not registered and path does not exists -> table is created on write

* DeltaLakeTableDataObject implements - CanMergeDataFrame by using DeltaTable.merge API. - CanEvolveSchema by using mergeSchema option. - Overwriting partitions is implemented by replaceWhere option in one transaction.

id

unique name of this data object

path

Optional hadoop directory for this table. If path is not defined, table is handled as a managed table. If it doesn't contain scheme and authority, the connections pathPrefix is applied. If pathPrefix is not defined or doesn't define scheme and authority, default schema and authority is applied.

partitions

partition columns for this data object

options

Options for Delta Lake tables see: https://docs.delta.io/latest/delta-batch.html and org.apache.spark.sql.delta.DeltaOptions

schemaMin

An optional, minimal schema that this DataObject must have to pass schema validation on reading and writing. Define schema by using a DDL-formatted string, which is a comma separated list of field definitions, e.g., a INT, b STRING.

table

DeltaLake table to be written by this output

constraints

List of row-level Constraints to enforce when writing to this data object.

expectations

List of Expectations to enforce when writing to this data object. Expectations are checks based on aggregates over all rows of a dataset.

preReadSql

SQL-statement to be executed in exec phase before reading input table. If the catalog and/or schema are not explicitly defined, the ones present in the configured "table" object are used.

postReadSql

SQL-statement to be executed in exec phase after reading input table and before action is finished. If the catalog and/or schema are not explicitly defined, the ones present in the configured "table" object are used.

preWriteSql

SQL-statement to be executed in exec phase before writing output table. If the catalog and/or schema are not explicitly defined, the ones present in the configured "table" object are used.

postWriteSql

SQL-statement to be executed in exec phase after writing output table. If the catalog and/or schema are not explicitly defined, the ones present in the configured "table" object are used.

saveMode

SDLSaveMode to use when writing files, default is "overwrite". Overwrite, Append and Merge are supported for now.

allowSchemaEvolution

If set to true schema evolution will automatically occur when writing to this DataObject with different schema, otherwise SDL will stop with error.

retentionPeriod

Optional delta lake retention threshold in hours. Files required by the table for reading versions younger than retentionPeriod will be preserved and the rest of them will be deleted.

acl

override connection permissions for files created tables hadoop directory with this connection

connectionId

optional id of io.smartdatalake.workflow.connection.HiveTableConnection

expectedPartitionsCondition

Optional definition of partitions expected to exist. Define a Spark SQL expression that is evaluated against a PartitionValues instance and returns true or false Default is to expect all partitions to exist.

housekeepingMode

Optional definition of a housekeeping mode applied after every write. E.g. it can be used to cleanup, archive and compact partitions. See HousekeepingMode for available implementations. Default is None.

metadata

meta data

Annotations
@Scaladoc()
Linear Supertypes
Serializable, Product, Equals, CanCreateIncrementalOutput, ExpectationValidation, HasHadoopStandardFilestore, CanHandlePartitions, CanEvolveSchema, CanMergeDataFrame, TransactionalTableDataObject, CanWriteSparkDataFrame, CanWriteDataFrame, CanCreateSparkDataFrame, TableDataObject, SchemaValidation, CanCreateDataFrame, DataObject, AtlasExportable, SmartDataLakeLogger, ParsableFromConfig[DataObject], SdlConfigObject, ConfigHolder, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DeltaLakeTableDataObject
  2. Serializable
  3. Product
  4. Equals
  5. CanCreateIncrementalOutput
  6. ExpectationValidation
  7. HasHadoopStandardFilestore
  8. CanHandlePartitions
  9. CanEvolveSchema
  10. CanMergeDataFrame
  11. TransactionalTableDataObject
  12. CanWriteSparkDataFrame
  13. CanWriteDataFrame
  14. CanCreateSparkDataFrame
  15. TableDataObject
  16. SchemaValidation
  17. CanCreateDataFrame
  18. DataObject
  19. AtlasExportable
  20. SmartDataLakeLogger
  21. ParsableFromConfig
  22. SdlConfigObject
  23. ConfigHolder
  24. AnyRef
  25. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Instance Constructors

  1. new DeltaLakeTableDataObject(id: DataObjectId, path: Option[String] = None, partitions: Seq[String] = Seq(), options: Map[String, String] = Map(), schemaMin: Option[GenericSchema] = None, table: Table, constraints: Seq[Constraint] = Seq(), expectations: Seq[Expectation] = Seq(), preReadSql: Option[String] = None, postReadSql: Option[String] = None, preWriteSql: Option[String] = None, postWriteSql: Option[String] = None, saveMode: SDLSaveMode = SDLSaveMode.Overwrite, allowSchemaEvolution: Boolean = false, retentionPeriod: Option[Int] = None, acl: Option[AclDef] = None, connectionId: Option[ConnectionId] = None, expectedPartitionsCondition: Option[String] = None, housekeepingMode: Option[HousekeepingMode] = None, metadata: Option[DataObjectMetadata] = None)(implicit instanceRegistry: InstanceRegistry)

    id

    unique name of this data object

    path

    Optional hadoop directory for this table. If path is not defined, table is handled as a managed table. If it doesn't contain scheme and authority, the connections pathPrefix is applied. If pathPrefix is not defined or doesn't define scheme and authority, default schema and authority is applied.

    partitions

    partition columns for this data object

    options

    Options for Delta Lake tables see: https://docs.delta.io/latest/delta-batch.html and org.apache.spark.sql.delta.DeltaOptions

    schemaMin

    An optional, minimal schema that this DataObject must have to pass schema validation on reading and writing. Define schema by using a DDL-formatted string, which is a comma separated list of field definitions, e.g., a INT, b STRING.

    table

    DeltaLake table to be written by this output

    constraints

    List of row-level Constraints to enforce when writing to this data object.

    expectations

    List of Expectations to enforce when writing to this data object. Expectations are checks based on aggregates over all rows of a dataset.

    preReadSql

    SQL-statement to be executed in exec phase before reading input table. If the catalog and/or schema are not explicitly defined, the ones present in the configured "table" object are used.

    postReadSql

    SQL-statement to be executed in exec phase after reading input table and before action is finished. If the catalog and/or schema are not explicitly defined, the ones present in the configured "table" object are used.

    preWriteSql

    SQL-statement to be executed in exec phase before writing output table. If the catalog and/or schema are not explicitly defined, the ones present in the configured "table" object are used.

    postWriteSql

    SQL-statement to be executed in exec phase after writing output table. If the catalog and/or schema are not explicitly defined, the ones present in the configured "table" object are used.

    saveMode

    SDLSaveMode to use when writing files, default is "overwrite". Overwrite, Append and Merge are supported for now.

    allowSchemaEvolution

    If set to true schema evolution will automatically occur when writing to this DataObject with different schema, otherwise SDL will stop with error.

    retentionPeriod

    Optional delta lake retention threshold in hours. Files required by the table for reading versions younger than retentionPeriod will be preserved and the rest of them will be deleted.

    acl

    override connection permissions for files created tables hadoop directory with this connection

    connectionId

    optional id of io.smartdatalake.workflow.connection.HiveTableConnection

    expectedPartitionsCondition

    Optional definition of partitions expected to exist. Define a Spark SQL expression that is evaluated against a PartitionValues instance and returns true or false Default is to expect all partitions to exist.

    housekeepingMode

    Optional definition of a housekeeping mode applied after every write. E.g. it can be used to cleanup, archive and compact partitions. See HousekeepingMode for available implementations. Default is None.

    metadata

    meta data

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. val acl: Option[AclDef]
  5. def addFieldIfNotExisting(writeSchema: GenericSchema, colName: String, dataType: GenericDataType): GenericSchema
    Attributes
    protected
    Definition Classes
    CanCreateDataFrame
  6. val allowSchemaEvolution: Boolean
    Definition Classes
    DeltaLakeTableDataObject → CanEvolveSchema
  7. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  8. def atlasName: String
    Definition Classes
    TableDataObject → DataObject → AtlasExportable
  9. def atlasQualifiedName(prefix: String): String
    Definition Classes
    TableDataObject → AtlasExportable
  10. def calculateMetrics(df: GenericDataFrame, aggExpressions: Seq[GenericColumn], scope: ExpectationScope): Map[String, _]
    Definition Classes
    ExpectationValidation
  11. def checkFilesExisting(implicit context: ActionPipelineContext): Boolean

    Check if the input files exist.

    Check if the input files exist.

    Attributes
    protected
    Annotations
    @Scaladoc()
    Exceptions thrown

    IllegalArgumentException if failIfFilesMissing = true and no files found at path.

  12. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
  13. val connectionId: Option[ConnectionId]
  14. val constraints: Seq[Constraint]
    Definition Classes
    DeltaLakeTableDataObject → ExpectationValidation
  15. def createReadSchema(writeSchema: GenericSchema)(implicit context: ActionPipelineContext): GenericSchema
    Definition Classes
    CanCreateDataFrame
    Annotations
    @Scaladoc()
  16. def deduplicate(aggExpressions: Seq[GenericColumn]): Seq[GenericColumn]
    Definition Classes
    ExpectationValidation
  17. def deletePartitions(partitionValues: Seq[PartitionValues])(implicit context: ActionPipelineContext): Unit

    Note that we will not delete the whole partition but just the data of the partition because delta lake keeps history

    Note that we will not delete the whole partition but just the data of the partition because delta lake keeps history

    Definition Classes
    DeltaLakeTableDataObject → CanHandlePartitions
    Annotations
    @Scaladoc()
  18. def deltaTable(implicit session: SparkSession): DeltaTable
  19. def dropTable(implicit context: ActionPipelineContext): Unit
    Definition Classes
    DeltaLakeTableDataObject → TableDataObject
  20. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  21. val expectations: Seq[Expectation]
    Definition Classes
    DeltaLakeTableDataObject → ExpectationValidation
  22. val expectedPartitionsCondition: Option[String]
    Definition Classes
    DeltaLakeTableDataObject → CanHandlePartitions
  23. def factory: FromConfigFactory[DataObject]
    Definition Classes
    DeltaLakeTableDataObject → ParsableFromConfig
  24. def failIfFilesMissing: Boolean

    Configure whether io.smartdatalake.workflow.action.Actions should fail if the input file(s) are missing on the file system.

    Configure whether io.smartdatalake.workflow.action.Actions should fail if the input file(s) are missing on the file system.

    Default is false.

    Annotations
    @Scaladoc()
  25. val filetype: String
  26. def forceGenericObservation: Boolean
    Attributes
    protected
    Definition Classes
    ExpectationValidation
  27. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  28. def getColumnStats(update: Boolean, lastModifiedAt: Option[Long])(implicit context: ActionPipelineContext): Map[String, Map[String, Any]]
    Definition Classes
    DeltaLakeTableDataObject → TableDataObject
  29. def getConnection[T <: Connection](connectionId: ConnectionId)(implicit registry: InstanceRegistry, ct: ClassTag[T], tt: scala.reflect.api.JavaUniverse.TypeTag[T]): T
    Attributes
    protected
    Definition Classes
    DataObject
    Annotations
    @Scaladoc()
  30. def getConnectionReg[T <: Connection](connectionId: ConnectionId, registry: InstanceRegistry)(implicit ct: ClassTag[T], tt: scala.reflect.api.JavaUniverse.TypeTag[T]): T
    Attributes
    protected
    Definition Classes
    DataObject
  31. def getDataFrame(partitionValues: Seq[PartitionValues], subFeedType: scala.reflect.api.JavaUniverse.Type)(implicit context: ActionPipelineContext): GenericDataFrame
    Definition Classes
    CanCreateSparkDataFrame → CanCreateDataFrame
  32. def getDetails(implicit session: SparkSession): DataFrame
  33. def getPKduplicates(subFeedType: scala.reflect.api.JavaUniverse.Type)(implicit context: ActionPipelineContext): GenericDataFrame
    Definition Classes
    TableDataObject
  34. def getPKnulls(subFeedType: scala.reflect.api.JavaUniverse.Type)(implicit context: ActionPipelineContext): GenericDataFrame
    Definition Classes
    TableDataObject
  35. def getPKviolators(subFeedType: scala.reflect.api.JavaUniverse.Type)(implicit context: ActionPipelineContext): GenericDataFrame
    Definition Classes
    TableDataObject
  36. def getScopeAllAggMetrics(dfAll: GenericDataFrame, expectationsToValidate: Seq[BaseExpectation])(implicit context: ActionPipelineContext): Map[String, _]
    Definition Classes
    ExpectationValidation
    Annotations
    @Scaladoc()
  37. def getScopeJobPartitionAggMetrics(subFeedType: scala.reflect.api.JavaUniverse.Type, dfJob: Option[GenericDataFrame], partitionValues: Seq[PartitionValues], expectationsToValidate: Seq[BaseExpectation])(implicit context: ActionPipelineContext): Map[String, _]
    Definition Classes
    ExpectationValidation
    Annotations
    @Scaladoc()
  38. def getSparkDataFrame(partitionValues: Seq[PartitionValues] = Seq())(implicit context: ActionPipelineContext): DataFrame
    Definition Classes
    DeltaLakeTableDataObject → CanCreateSparkDataFrame
  39. def getState: Option[String]

    Return the last table version

    Return the last table version

    Definition Classes
    DeltaLakeTableDataObject → CanCreateIncrementalOutput
    Annotations
    @Scaladoc()
  40. def getStats(update: Boolean = false)(implicit context: ActionPipelineContext): Map[String, Any]
    Definition Classes
    DeltaLakeTableDataObject → DataObject
  41. def hadoopPath(implicit context: ActionPipelineContext): Path
    Definition Classes
    DeltaLakeTableDataObject → HasHadoopStandardFilestore
  42. val housekeepingMode: Option[HousekeepingMode]
    Definition Classes
    DeltaLakeTableDataObject → DataObject
  43. val id: DataObjectId
    Definition Classes
    DeltaLakeTableDataObject → DataObject → SdlConfigObject
  44. def init(df: GenericDataFrame, partitionValues: Seq[PartitionValues], saveModeOptions: Option[SaveModeOptions])(implicit context: ActionPipelineContext): Unit
    Definition Classes
    CanWriteSparkDataFrame → CanWriteDataFrame
  45. def initSparkDataFrame(df: DataFrame, partitionValues: Seq[PartitionValues], saveModeOptions: Option[SaveModeOptions] = None)(implicit context: ActionPipelineContext): Unit
    Definition Classes
    DeltaLakeTableDataObject → CanWriteSparkDataFrame
  46. implicit val instanceRegistry: InstanceRegistry
  47. def isDbExisting(implicit context: ActionPipelineContext): Boolean
    Definition Classes
    DeltaLakeTableDataObject → TableDataObject
  48. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  49. def isTableExisting(implicit context: ActionPipelineContext): Boolean
    Definition Classes
    DeltaLakeTableDataObject → TableDataObject
  50. def listPartitions(implicit context: ActionPipelineContext): Seq[PartitionValues]

    List partitions.

    List partitions. Note that we need a Spark SQL statement as there might be partition directories with no current data inside

    Definition Classes
    DeltaLakeTableDataObject → CanHandlePartitions
    Annotations
    @Scaladoc()
  51. lazy val logger: Logger
    Attributes
    protected
    Definition Classes
    SmartDataLakeLogger
    Annotations
    @transient()
  52. def mergeDataFrameByPrimaryKey(df: DataFrame, saveModeOptions: SaveModeMergeOptions)(implicit context: ActionPipelineContext): MetricsMap

    Merges DataFrame with existing table data by using DeltaLake Upsert-statement.

    Merges DataFrame with existing table data by using DeltaLake Upsert-statement.

    Table.primaryKey is used as condition to check if a record is matched or not. If it is matched it gets updated (or deleted), otherwise it is inserted.

    This all is done in one transaction.

    Annotations
    @Scaladoc()
  53. val metadata: Option[DataObjectMetadata]
    Definition Classes
    DeltaLakeTableDataObject → DataObject
  54. def movePartitions(partitionValues: Seq[(PartitionValues, PartitionValues)])(implicit context: ActionPipelineContext): Unit
    Definition Classes
    DeltaLakeTableDataObject → CanHandlePartitions
  55. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  56. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  57. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  58. val options: Map[String, String]
    Definition Classes
    DeltaLakeTableDataObject → TransactionalTableDataObject → CanWriteSparkDataFrame → CanCreateSparkDataFrame
  59. def partitionLayout(): Option[String]
    Definition Classes
    HasHadoopStandardFilestore
    Annotations
    @Scaladoc()
  60. val partitions: Seq[String]
    Definition Classes
    DeltaLakeTableDataObject → CanHandlePartitions
  61. val path: Option[String]
  62. def postRead(partitionValues: Seq[PartitionValues])(implicit context: ActionPipelineContext): Unit
    Definition Classes
    TransactionalTableDataObject → DataObject
  63. val postReadSql: Option[String]
    Definition Classes
    DeltaLakeTableDataObject → TransactionalTableDataObject
  64. def postWrite(partitionValues: Seq[PartitionValues])(implicit context: ActionPipelineContext): Unit
    Definition Classes
    TransactionalTableDataObject → DataObject
  65. val postWriteSql: Option[String]
    Definition Classes
    DeltaLakeTableDataObject → TransactionalTableDataObject
  66. def preRead(partitionValues: Seq[PartitionValues])(implicit context: ActionPipelineContext): Unit
    Definition Classes
    TransactionalTableDataObject → DataObject
  67. val preReadSql: Option[String]
    Definition Classes
    DeltaLakeTableDataObject → TransactionalTableDataObject
  68. def preWrite(implicit context: ActionPipelineContext): Unit
    Definition Classes
    DeltaLakeTableDataObject → TransactionalTableDataObject → DataObject
  69. val preWriteSql: Option[String]
    Definition Classes
    DeltaLakeTableDataObject → TransactionalTableDataObject
  70. def prepare(implicit context: ActionPipelineContext): Unit
    Definition Classes
    DeltaLakeTableDataObject → DataObject
  71. def prepareAndExecSql(sqlOpt: Option[String], configName: Option[String], partitionValues: Seq[PartitionValues])(implicit context: ActionPipelineContext): Unit
    Definition Classes
    DeltaLakeTableDataObject → TransactionalTableDataObject
  72. def productElementNames: Iterator[String]
    Definition Classes
    Product
  73. val retentionPeriod: Option[Int]
  74. val saveMode: SDLSaveMode
  75. val schemaMin: Option[GenericSchema]
    Definition Classes
    DeltaLakeTableDataObject → SchemaValidation
  76. val separator: Char
    Attributes
    protected
  77. def setState(state: Option[String])(implicit context: ActionPipelineContext): Unit

    To implement incremental processing this function is called to initialize the DataObject with its state from the last increment.

    To implement incremental processing this function is called to initialize the DataObject with its state from the last increment. The state is just a string. It's semantics is internal to the DataObject. Note that this method is called on initializiation of the SmartDataLakeBuilder job (init Phase) and for streaming execution after every execution of an Action involving this DataObject (postExec).

    state

    Internal state of last increment. If None then the first increment (may be a full increment) is delivered.

    Definition Classes
    DeltaLakeTableDataObject → CanCreateIncrementalOutput
    Annotations
    @Scaladoc()
  78. def setupConstraintsAndJobExpectations(df: GenericDataFrame, defaultExpectationsOnly: Boolean, pushDownTolerant: Boolean, additionalJobAggExpressionColumns: Seq[GenericColumn], forceGenericObservation: Boolean)(implicit context: ActionPipelineContext): (GenericDataFrame, Seq[DataFrameObservation])
    Definition Classes
    ExpectationValidation
    Annotations
    @Scaladoc()
  79. def streamingOptions: Map[String, String]
    Definition Classes
    CanWriteDataFrame
  80. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  81. var table: Table
    Definition Classes
    DeltaLakeTableDataObject → TableDataObject
  82. def toStringShort: String
    Definition Classes
    DataObject
  83. def vacuum(implicit context: ActionPipelineContext): Unit
  84. def validateExpectations(subFeedType: scala.reflect.api.JavaUniverse.Type, dfJob: Option[GenericDataFrame], dfAll: GenericDataFrame, partitionValues: Seq[PartitionValues], scopeJobAndInputMetrics: Map[String, _], additionalExpectations: Seq[BaseExpectation], enrichmentFunc: (Map[String, _]) => Map[String, _])(implicit context: ActionPipelineContext): (Map[String, _], Seq[ExpectationValidationException])
    Definition Classes
    ExpectationValidation
  85. def validateSchema(schema: GenericSchema, schemaExpected: GenericSchema, role: String): Unit
    Definition Classes
    SchemaValidation
    Annotations
    @Scaladoc()
  86. def validateSchemaHasPartitionCols(df: DataFrame, role: String): Unit
    Definition Classes
    CanHandlePartitions
    Annotations
    @Scaladoc()
  87. def validateSchemaHasPrimaryKeyCols(df: DataFrame, primaryKeyCols: Seq[String], role: String): Unit
    Definition Classes
    CanHandlePartitions
    Annotations
    @Scaladoc()
  88. def validateSchemaMin(schema: GenericSchema, role: String): Unit
    Definition Classes
    SchemaValidation
    Annotations
    @Scaladoc()
  89. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  90. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  91. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  92. def writeDataFrame(df: DataFrame, createTableOnly: Boolean, partitionValues: Seq[PartitionValues], saveModeOptions: Option[SaveModeOptions])(implicit context: ActionPipelineContext): MetricsMap

    Writes DataFrame to HDFS/Parquet and creates DeltaLake table.

    Writes DataFrame to HDFS/Parquet and creates DeltaLake table. DataFrames are repartitioned in order not to write too many small files or only a few HDFS files that are too large.

    Annotations
    @Scaladoc()
  93. def writeDataFrame(df: GenericDataFrame, partitionValues: Seq[PartitionValues], isRecursiveInput: Boolean, saveModeOptions: Option[SaveModeOptions])(implicit context: ActionPipelineContext): MetricsMap
    Definition Classes
    CanWriteSparkDataFrame → CanWriteDataFrame
  94. def writeSparkDataFrame(df: DataFrame, partitionValues: Seq[PartitionValues] = Seq(), isRecursiveInput: Boolean = false, saveModeOptions: Option[SaveModeOptions] = None)(implicit context: ActionPipelineContext): MetricsMap
    Definition Classes
    DeltaLakeTableDataObject → CanWriteSparkDataFrame
  95. def writeStreamingDataFrame(df: GenericDataFrame, trigger: Trigger, options: Map[String, String], checkpointLocation: String, queryName: String, outputMode: OutputMode, saveModeOptions: Option[SaveModeOptions])(implicit context: ActionPipelineContext): StreamingQuery
    Definition Classes
    CanWriteSparkDataFrame → CanWriteDataFrame

Deprecated Value Members

  1. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable]) @Deprecated
    Deprecated

    (Since version 9)

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from CanCreateIncrementalOutput

Inherited from ExpectationValidation

Inherited from HasHadoopStandardFilestore

Inherited from CanHandlePartitions

Inherited from CanEvolveSchema

Inherited from CanMergeDataFrame

Inherited from TransactionalTableDataObject

Inherited from CanWriteSparkDataFrame

Inherited from CanWriteDataFrame

Inherited from CanCreateSparkDataFrame

Inherited from TableDataObject

Inherited from SchemaValidation

Inherited from CanCreateDataFrame

Inherited from DataObject

Inherited from AtlasExportable

Inherited from SmartDataLakeLogger

Inherited from ParsableFromConfig[DataObject]

Inherited from SdlConfigObject

Inherited from ConfigHolder

Inherited from AnyRef

Inherited from Any

Ungrouped