Deploy a Custom Data Lake Solution with ConvergDB

3 Days - $8,000.00 USD

This Quick Start is intended to demonstrate the combined strength of ConvergDB and the Amazon Web Services portfolio of analytic products to easily deploy and manage a foundational Serverless Data Lake.
Our Objective
  • Prove the value and benefits of deploying and managing a Serverless Data Lake in AWS
  • Demonstrate the ease of integrating 1 to 5 tables of REAL data and build a sophisticated Data Lake on Amazon S3
  • Demonstrate how ConvergDB is used to manage changes to your S3 Data Lake
  • Demonstrate how quickly ConvergDB is able to provide immediate usability for technologies like Redshift Spectrum, Athena, Glue and Quicksight
What we will do
  • Real world consulting to understand current business scenario and near term business goals
  • Support to identify a subset of your data that will be used in this Quick Start
  • If necessary we can extract source data from a JDBC compatible relational database
  • Support to identify or create 3 to 5 real world SQL queries that will test the Data Lake performance
  • Establish source control based workflow for managing your Data Lake
  • We will demonstrate how ConvergDB manages the Data Lake infrastructure in a single configuration file
What you will own
  • A high performance data lake that is completely serverless
  • A repeatable data process and workflow
  • A single configuration that orchestrates:
    • AWS S3
    • AWS Athena
    • AWS Glue
  • Data Lake Attributes:
    • Apache Parquet columnar storage file format providing higher performance at query time
    • An AWS Glue data transformation job that will load your data from source files into an S3 Data Lake
    • AWS Glue catalog which allows for easier integration with analytic tools
    • A Data dictionary which provides the same benefit as traditional documentation but for your data
  • A Data Lake that is immediately accessible using various SQL engines like:
    • Amazon Athena
    • Amazon Redshift Spectrum
    • Amazon QuickSight
    • Presto
    • Apache Spark
    • Apache Hive
    • Columnar Databases

Amazon S3 Data Lakes using ConvergDB on AWS

What is ConvergDB
ConvergDB is open source software for creation and management of serverless data lakes with a DevOps friendly workflow. Users describe the structure and behavior of their data, then ConvergDB creates the infrastructure and scripts to do the heavy lifting of optimization and transformation.
The Cloud Advantage: Decoupling Data from Compute
Cloud infrastructure allows separation of data storage and compute. This liberates you from using a single data warehousing technology, opening you up to the portfolio of analytics tools provided by AWS. Decoupled data architecture also help to reduce TCO through lower compute costs. In order to enable lasting value from your data lake, it must be structured using best practices. ConvergDB provides these best practices for you, allowing your organization to focus on extracting insights from your data.
Overcoming Challenges
Organizations using legacy data warehouse to manage large datasets will run into serious performance challenges in the future when attempting to access and analyze all their growing business and customer data. Even though a data lake may be a simple solution, many organizations will continue operating with traditional and costly data solutions. While perception may be that data lakes are complex to develop and deploy, this is not actually the case. ConvergDB has proven to circumvent all of the most common obstacles when deploying and managing data lakes.
ConvergDB for Data Warehouse Modernization
Serverless
  • Zero cost to run ConvergDB
  • AWS charges only apply when processing data
  • Inherent Scaling to match data volume requirements
Auto Batching Large Data
  • Automatically batches large data sets which is ideal for initial conversion of historic data
  • Resilience to redundant compute costs associated with job failures
Future Proof
  • Sustainable software development and practices
  • As AWS adds new analytic tools to its portfolio ConvergDB users can quickly introduce them to their cloud environment
  • Safeguards cloud investments by virtually preventing costly migrations from one technology to another

How ConvergDB Works

Features
  • Open source software - Free to use
  • Leverage cloud services such as Amazon Glue, Amazon Athena, Amazon Redshift Spectrum
  • Retain total control and ownership of data in your own cloud environment
  • Securing data lake is easier with serverless architecture
  • Automatic batching of large data sets to mitigate the cost impact of failures
  • Cloudwatch metrics, alerts, and monitoring dashboards are created automatically

* These fields are required.