IST722 Data Warehousing - Syracuse University

IST722 Data Warehousing - Syracuse University

IST722 Data Warehousing Components of the Data Warehouse Michael A. Fudge, Jr. Project: NopCommerce Discuss NopCommerce and Project Teams

Recall: Inmons CIF The CIF is a reference architecture Understanding the Diagram Data Stores

Components Processes The CIF is a reference architecture Applications CIF Components

External World & Applications The CIF is a reference architecture External World & Applications External World the people and systems that generate operational data.

Applications the systems which provide the source for the operational data. Examples: ERPs, Business Applications, Internet data, external data streams. These are the inputs and data sources for the CIF. OLTP Systems Operational data, transaction-oriented. Integration & Transformation Layer

The CIF is a reference architecture Integration & Transformation Layer I&T layer takes un-integrated data from multiple sources and integrates and consolidates it. Computer programs are written to transform data from the external world into corporate data. The data come from a variety of sources and in both structured and unstructured formats. Todays Database Management Systems provide tooling to assist with this

process. This is the most difficult and time-consuming component of the CIF. Two approaches: ETL and ELT ETL Extract Transform Load The data transformation occurs over staged data. The source data is not stored in the warehouse. ELT Extract Load Transform

The data transformation occurs over warehoused data. The staged data is stored in the warehouse. Operational Data Store The CIF is a reference architecture Operational Data Store

Integrated, detailed, and current data from the External World and Applications. Consolidated from disparate sources. Does not grow over time. Performs similarly to a transactional database. Structured differently than a data warehouse, and therefore should be stored as a separate database. Receives data from I&T layer sends data to the data warehouse. The data warehouse can populate it, too. Think of it as a consolidated operational database.

Enterprise Data Warehouse The CIF is a reference architecture Enterprise Data Warehouse Subject-oriented, integrated, summarized, and current data from the External World and Applications.

Optimized for query performance. Structured differently than operational data, typically in a dimensional model. Receives data from I&T layer and the ODS. Use as a source for data marts and decision support systems. Grows in size over time due to historical data. The heart of the CIF. ODS vs. EDW Characteristic

Primary Purpose Design Goal Primary Users Subject-Oriented Integrated Detailed Data Summary Data Time of Data Updates Queries

Operational Data Store Run the business on a current basis Performance throughput, availability Clerks, salespersons, administrators Yes Yes

Yes No Current data Frequent small updates Simple queries on a few rows Data Warehouse Support managerial decision making Easy reporting and analytics

Managers, business analysis, customers Yes Yes Yes Yes Historical snapshots Periodic batch updates Complex queries on several rows

Why No ODS in the EDW? I need fast updates! I need query performance! You cant have both! (Think of the Index!)

Data Marts The CIF is a reference architecture Data Marts A collection of data tailored to the informational needs of a department or business process. Easy to control, low cost, and customizable due to their limited scope. Receive their inputs from the Enterprise Data Warehouse.

Are source data for Online Analytical Processing (OLAP) engines. OLAP ROLAP Uses a Relational Database Management System Data design is the Star Schema Built on well-known relational concepts In the EDW.

MOLAP Uses a Multi-Dimensional Database Management System Data design is the Cube Highly flexible, includes Metadata. Data Marts Typical implementations have the

ROLAP star schema feed the MOLAP cube ROLAP Star Schema Stored in a relational DBMS Fact table is M-M relationship among dimensions. We saw this last week!

MOLAP - Cube Stored in a MultiDimensional DBMS Facts are preaggregated across all dimensions for improved performance. Metadata: Drill down hierarchy and Identified Facts

DSS Applications The CIF is a reference architecture Decision-Support Systems Business Intelligence. Front-ends to ROLAP and MOLAP Engines. Help us explore and visualize information at a high

level Cross-Media Storage The CIF is a reference architecture Cross-Media Storage Manager Stores historical data which is infrequently accessed.

Moved out of the EDW, which has high-end, performant storage into more affordable storage with less performant access times. A process exists to enable some transparency in the retrieval process. Group Activity Please assemble into your project groups A through H. You will work in your teams on a group activity involving product evaluation.

Skill: Evaluating CIF Components Activity: Research the following products. Match each to the CIF components it was designed to support. Justify your reasoning with sources. Groups will be called upon to present their findings. Name of Product A. B. C. D. E.

F. G. H. Informatica ILM PostgreSQL Pentaho Data Integration Birst Tableau Server Oracle Essbase

Microsoft Dynamics GP IBM Informix 1. 2. 3. 4. 5. 6. 7.

CIF Components Corporate / External World Application ETL System Data Mart / MOLAP Decision Support System Enterprise Data Warehouse Operational Data Store Cross-Media Storage

In Summary The CIF is a reference architecture for building out an information ecosystem. Applications from the external world are inputs into the CIF. The Integration & Transformation Layer transforms transactional data into corporate data. The Operational Data Store contains consolidated, non-historical data. The Enterprise Data Warehouse contains consolidated historical data. Data marts are tailored to the informational needs of a department or business process.

IST722 Data Warehousing Components of the Data Warehouse Michael A. Fudge, Jr.

Recently Viewed Presentations

  • Proving the benefits of installing LeakSafe A property

    Proving the benefits of installing LeakSafe A property

    A LeakSafe system was installed post-completion of the main contract, delayed by two weeks, by LeakSafe's Approved Installers who never the less completed the works on time and on budget so there were no delays to students arriving.
  • Writing in Key Stage One How is writing

    Writing in Key Stage One How is writing

    They learn what a sentence is using 'Rainbow Grammar.' They learn common word types such as nouns, pronouns, adjectives and verbs. They use the connectives and, so and but to make compound sentences. They use adjectives to create noun phrases....
  • MKTG 4.12 & 4.13 - Quia

    MKTG 4.12 & 4.13 - Quia

    MKTG 4.12 - 4.16. Data Collection Methods. Foundational Knowledge of Selling. SURVEYS. Marketing researchers can use many types of itemized rating scales on a survey. An Itemized scale provides respondents with a set of options from which they must choose...
  • What are the "entities that matter?" And how much should we ...

    What are the "entities that matter?" And how much should we ...

    software platform, users could interact with a richly featured environment for viewing, editing, and creating linked data descriptions. Users also expressed appreciation for the close interaction with OCLC staff. ... Lakeside Ballroom, Glenwood, Minn., July 27, 1965.
  • Human Genetics - Chapter 2

    Human Genetics - Chapter 2

    Define stem and progenitor cell. How can the bacteria that live in and on our bodies affect our health? ... Cytoskeleton Functions. Maintain cell shape. Connect cells to each other. Transport organelles, cellular contents, and motor molecules. Cytoskeleton.
  • A simplified TCA cycle Pyruvate (3C; from glycolysis)

    A simplified TCA cycle Pyruvate (3C; from glycolysis)

    ½ teaspoon baking soda. ½ teaspoon salt. 1 cup plus 2 tablespoons (145 g) all-purpose flour. 1 teaspoon vanilla extract. 6 oz (170 g) semisweet chocolate chips. Method. 1. Preheat oven to 375 deg F (190 deg C). 2. Cream...
  • Renaissance: Beginning of the Modern Age

    Renaissance: Beginning of the Modern Age

    Political Party Eras: New Deal Democrats 1932 - 1968. The Progressive Party helped bring focus back to the Democratic Party after the stock market crash of 1929/Depression. President Franklin D Roosevelt passed "New Deal" legislation uniting laborers, minorities, southerners, an...
  • Phase 1 Tollgate Review Discussion Template - ussm.gsa.gov

    Phase 1 Tollgate Review Discussion Template - ussm.gsa.gov

    GSA, Unified Shared Services Management. Month, Year. ... [email protected]) to schedule a Tollgate review meeting ... and how IV&V support will interact with the provider and USSM) Source Documentation from Playbook: IV&V Plan. Describe the program's governance model. Program Governance...