Data Architecture: A Primer for the Data Scientist: A Primer for the Data Scientist

Ebook869 pages8 hours

Data Architecture: A Primer for the Data Scientist: A Primer for the Data Scientist

Name: Data Architecture: A Primer for the Data Scientist: A Primer for the Data Scientist
Brand: Academic Press
Rating: 4.7 (3 reviews)

By W.H. Inmon, Daniel Linstedt and Mary Levins

Rating: 4.5 out of 5 stars

4.5/5

()

Read preview

About this ebook

Over the past 5 years, the concept of big data has matured, data science has grown exponentially, and data architecture has become a standard part of organizational decision-making. Throughout all this change, the basic principles that shape the architecture of data have remained the same. There remains a need for people to take a look at the "bigger picture" and to understand where their data fit into the grand scheme of things.

Data Architecture: A Primer for the Data Scientist, Second Edition addresses the larger architectural picture of how big data fits within the existing information infrastructure or data warehousing systems. This is an essential topic not only for data scientists, analysts, and managers but also for researchers and engineers who increasingly need to deal with large and complex sets of data. Until data are gathered and can be placed into an existing framework or architecture, they cannot be used to their full potential. Drawing upon years of practical experience and using numerous examples and case studies from across various industries, the authors seek to explain this larger picture into which big data fits, giving data scientists the necessary context for how pieces of the puzzle should fit together.

New case studies include expanded coverage of textual management and analytics
New chapters on visualization and big data
Discussion of new visualizations of the end-state architecture

Skip carousel

LanguageEnglish

PublisherAcademic Press

Release dateApr 30, 2019

ISBN9780128169179

Author

W.H. Inmon

Best known as the “Father of Data Warehousing," Bill Inmon has become the most prolific and well-known author worldwide in the big data analysis, data warehousing and business intelligence arena. In addition to authoring more than 50 books and 650 articles, Bill has been a monthly columnist with the Business Intelligence Network, EIM Institute and Data Management Review. In 2007, Bill was named by Computerworld as one of the “Ten IT People Who Mattered in the Last 40 Years of the computer profession. Having 35 years of experience in database technology and data warehouse design, he is known globally for his seminars on developing data warehouses and information architectures. Bill has been a keynote speaker in demand for numerous computing associations, industry conferences and trade shows. Bill Inmon also has an extensive entrepreneurial background: He founded Pine Cone Systems, later named Ambeo in 1995, and founded, and took public, Prism Solutions in 1991. Bill consults with a large number of Fortune 1000 clients, and leading IT executives on Data Warehousing, Business Intelligence, and Database Management, offering data warehouse design and database management services, as well as producing methodologies and technologies that advance the enterprise architectures of large and small organizations world-wide. He has worked for American Management Systems and Coopers & Lybrand. Bill received his Bachelor of Science degree in Mathematics from Yale University, and his Master of Science degree in Computer Science from New Mexico State University.

Related to Data Architecture

Related ebooks

Skip carousel

Data Mapping for Data Warehouse Design
Ebook
Data Mapping for Data Warehouse Design
byQamar Shahbaz
Rating: 5 out of 5 stars
5/5
Building a Scalable Data Warehouse with Data Vault 2.0
Ebook
Building a Scalable Data Warehouse with Data Vault 2.0
byDaniel Linstedt
Rating: 4 out of 5 stars
4/5
Database Modeling and Design: Logical Design
Ebook
Database Modeling and Design: Logical Design
byToby J. Teorey
Rating: 0 out of 5 stars
0 ratings
Developing High Quality Data Models
Ebook
Developing High Quality Data Models
byMatthew West
Rating: 0 out of 5 stars
0 ratings
Data Lake Development with Big Data
Ebook
Data Lake Development with Big Data
byPasupuleti Pradeep
Rating: 0 out of 5 stars
0 ratings
Data Virtualization for Business Intelligence Systems: Revolutionizing Data Integration for Data Warehouses
Ebook
Data Virtualization for Business Intelligence Systems: Revolutionizing Data Integration for Data Warehouses
byRick van der Lans
Rating: 4 out of 5 stars
4/5
Architecting Big Data & Analytics Solutions - Integrated with IoT & Cloud
Ebook
Architecting Big Data & Analytics Solutions - Integrated with IoT & Cloud
byDr Mehmet Yildiz
Rating: 5 out of 5 stars
5/5
Big Data Analytics: From Strategic Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph
Ebook
Big Data Analytics: From Strategic Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph
byDavid Loshin
Rating: 5 out of 5 stars
5/5
Data Mining: Concepts and Techniques
Ebook
Data Mining: Concepts and Techniques
byJiawei Han
Rating: 4 out of 5 stars
4/5
Relational Database Design and Implementation
Ebook
Relational Database Design and Implementation
byJan L. Harrington
Rating: 5 out of 5 stars
5/5
Big Data for Enterprise Architects
Ebook
Big Data for Enterprise Architects
byDr Mehmet Yildiz
Rating: 5 out of 5 stars
5/5
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
Ebook
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
byAJIT DASH
Rating: 3 out of 5 stars
3/5
Big Data Analytics
Ebook
Big Data Analytics
byVenkat Ankam
Rating: 0 out of 5 stars
0 ratings
Master Data Management
Ebook
Master Data Management
byBinayaka Mishra
Rating: 0 out of 5 stars
0 ratings
Software Architecture for Big Data and the Cloud
Ebook
Software Architecture for Big Data and the Cloud
byIvan Mistrik
Rating: 0 out of 5 stars
0 ratings
Microsoft SQL Server 2014 Business Intelligence Development Beginner’s Guide
Ebook
Microsoft SQL Server 2014 Business Intelligence Development Beginner’s Guide
byReza Rad
Rating: 0 out of 5 stars
0 ratings
Managing Data in Motion: Data Integration Best Practice Techniques and Technologies
Ebook
Managing Data in Motion: Data Integration Best Practice Techniques and Technologies
byApril Reeve
Rating: 0 out of 5 stars
0 ratings
Relational Database Design and Implementation: Clearly Explained
Ebook
Relational Database Design and Implementation: Clearly Explained
byJan L. Harrington
Rating: 0 out of 5 stars
0 ratings
Spreadsheets To Cubes (Advanced Data Analytics for Small Medium Business): Data Science
Ebook
Spreadsheets To Cubes (Advanced Data Analytics for Small Medium Business): Data Science
byalasdair gilchrist
Rating: 0 out of 5 stars
0 ratings
Expert Cube Development with SSAS Multidimensional Models
Ebook
Expert Cube Development with SSAS Multidimensional Models
byMarco Russo
Rating: 0 out of 5 stars
0 ratings
Smarter Data Science: Succeeding with Enterprise-Grade Data and AI Projects
Ebook
Smarter Data Science: Succeeding with Enterprise-Grade Data and AI Projects
byNeal Fishman
Rating: 0 out of 5 stars
0 ratings
Business Intelligence: The Savvy Manager's Guide
Ebook
Business Intelligence: The Savvy Manager's Guide
byDavid Loshin
Rating: 4 out of 5 stars
4/5
Data Quality: Empowering Businesses with Analytics and AI
Ebook
Data Quality: Empowering Businesses with Analytics and AI
byPrashanth Southekal
Rating: 0 out of 5 stars
0 ratings
The Study of Building the Data Warehouse
Ebook
The Study of Building the Data Warehouse
byvenkateswara Rao
Rating: 0 out of 5 stars
0 ratings
Designing Cloud Data Platforms
Ebook
Designing Cloud Data Platforms
byDanil Zburivsky
Rating: 0 out of 5 stars
0 ratings
Practical Data Analysis - Second Edition
Ebook
Practical Data Analysis - Second Edition
byHector Cuesta
Rating: 0 out of 5 stars
0 ratings
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
Ebook
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
byalasdair gilchrist
Rating: 5 out of 5 stars
5/5
Learn Data Warehousing in 24 Hours
Ebook
Learn Data Warehousing in 24 Hours
byAlex Nordeen
Rating: 0 out of 5 stars
0 ratings
Principles of Data Integration
Ebook
Principles of Data Integration
byAnHai Doan
Rating: 5 out of 5 stars
5/5
Data Architecture: From Zen to Reality
Ebook
Data Architecture: From Zen to Reality
byCharles Tupper
Rating: 4 out of 5 stars
4/5

Databases For You

Skip carousel

SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
Blockchain Basics: A Non-Technical Introduction in 25 Steps
Ebook
Blockchain Basics: A Non-Technical Introduction in 25 Steps
byDaniel Drescher
Rating: 5 out of 5 stars
5/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
Practical Data Analysis
Ebook
Practical Data Analysis
byHector Cuesta
Rating: 4 out of 5 stars
4/5
100+ SQL Queries T-SQL for Microsoft SQL Server
Ebook
100+ SQL Queries T-SQL for Microsoft SQL Server
byIFS Harrison
Rating: 4 out of 5 stars
4/5
Summary of Building a Second Brain: by Tiago Forte - A Proven Method to Organize Your Digital Life and Unlock Your Creative Potential - A Comprehensive Summary
Ebook
Summary of Building a Second Brain: by Tiago Forte - A Proven Method to Organize Your Digital Life and Unlock Your Creative Potential - A Comprehensive Summary
byAlexander Cooper
Rating: 1 out of 5 stars
1/5
SQL Programming & Database Management For Absolute Beginners SQL Server, Structured Query Language Fundamentals: "Learn - By Doing" Approach And Master SQL
Ebook
SQL Programming & Database Management For Absolute Beginners SQL Server, Structured Query Language Fundamentals: "Learn - By Doing" Approach And Master SQL
byWilliam Sullivan
Rating: 5 out of 5 stars
5/5
Learn Git in a Month of Lunches
Ebook
Learn Git in a Month of Lunches
byRick Umali
Rating: 0 out of 5 stars
0 ratings
Excel 2021
Ebook
Excel 2021
byJIAYI SIMONDS
Rating: 4 out of 5 stars
4/5
Relational Database Design and Implementation
Ebook
Relational Database Design and Implementation
byJan L. Harrington
Rating: 5 out of 5 stars
5/5
Learn SQL in 24 Hours
Ebook
Learn SQL in 24 Hours
byAlex Nordeen
Rating: 5 out of 5 stars
5/5
Beginning Microsoft SQL Server 2012 Programming
Ebook
Beginning Microsoft SQL Server 2012 Programming
byPaul Atkinson
Rating: 1 out of 5 stars
1/5
Learning PostgreSQL
Ebook
Learning PostgreSQL
byJuba Salahaldin
Rating: 1 out of 5 stars
1/5
Business Intelligence Guidebook: From Data Integration to Analytics
Ebook
Business Intelligence Guidebook: From Data Integration to Analytics
byRick Sherman
Rating: 4 out of 5 stars
4/5
LINUX: Beginner's Crash Course. Your Step-By-Step Guide To Learning The Linux Operating System And Command Line Easy & Fast!
Ebook
LINUX: Beginner's Crash Course. Your Step-By-Step Guide To Learning The Linux Operating System And Command Line Easy & Fast!
byJeremy Li
Rating: 3 out of 5 stars
3/5
Learn SQL Server Administration in a Month of Lunches
Ebook
Learn SQL Server Administration in a Month of Lunches
byDon Jones
Rating: 0 out of 5 stars
0 ratings
Behind Every Good Decision: How Anyone Can Use Business Analytics to Turn Data into Profitable Insight
Ebook
Behind Every Good Decision: How Anyone Can Use Business Analytics to Turn Data into Profitable Insight
byPiyanka Jain
Rating: 5 out of 5 stars
5/5
COMPUTER SCIENCE FOR ROOKIES
Ebook
COMPUTER SCIENCE FOR ROOKIES
byAngel Bahabwa
Rating: 0 out of 5 stars
0 ratings
The Data and Analytics Playbook: Proven Methods for Governed Data and Analytic Quality
Ebook
The Data and Analytics Playbook: Proven Methods for Governed Data and Analytic Quality
byLowell Fryman
Rating: 5 out of 5 stars
5/5
Data Governance: How to Design, Deploy and Sustain an Effective Data Governance Program
Ebook
Data Governance: How to Design, Deploy and Sustain an Effective Data Governance Program
byJohn Ladley
Rating: 4 out of 5 stars
4/5
Python Projects for Everyone
Ebook
Python Projects for Everyone
byMohamad Charara
Rating: 0 out of 5 stars
0 ratings
Codeless Data Structures and Algorithms: Learn DSA Without Writing a Single Line of Code
Ebook
Codeless Data Structures and Algorithms: Learn DSA Without Writing a Single Line of Code
byArmstrong Subero
Rating: 0 out of 5 stars
0 ratings
Artificial Intelligence for Fashion: How AI is Revolutionizing the Fashion Industry
Ebook
Artificial Intelligence for Fashion: How AI is Revolutionizing the Fashion Industry
byLeanne Luce
Rating: 0 out of 5 stars
0 ratings
Access 2019 For Dummies
Ebook
Access 2019 For Dummies
byLaurie A. Ulrich
Rating: 0 out of 5 stars
0 ratings
SQL: Practical Guide for Developers
Ebook
SQL: Practical Guide for Developers
byMichael J. Donahoo
Rating: 2 out of 5 stars
2/5
SQL Clearly Explained
Ebook
SQL Clearly Explained
byJan L. Harrington
Rating: 5 out of 5 stars
5/5
Access 2010 All-in-One For Dummies
Ebook
Access 2010 All-in-One For Dummies
byAlison Barrows
Rating: 4 out of 5 stars
4/5
Query Store for SQL Server 2019: Identify and Fix Poorly Performing Queries
Ebook
Query Store for SQL Server 2019: Identify and Fix Poorly Performing Queries
byTracy Boggiano
Rating: 0 out of 5 stars
0 ratings
Beginning Microsoft Power BI: A Practical Guide to Self-Service Data Analytics
Ebook
Beginning Microsoft Power BI: A Practical Guide to Self-Service Data Analytics
byDan Clark
Rating: 0 out of 5 stars
0 ratings
The AI Bible, Making Money with Artificial Intelligence: Real Case Studies and How-To's for Implementation
Ebook
The AI Bible, Making Money with Artificial Intelligence: Real Case Studies and How-To's for Implementation
byJhon Dujardin
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

Introduction to Data Mesh
Podcast episode
Introduction to Data Mesh
byThe Cloudcast
0 ratings
0% found this document useful
SnowflakeDB: The Data Warehouse Built For The Cloud - Episode 110: An interview about how SnowflakeDB was built to provide a performant and flexible data platform for the cloud era
Podcast episode
SnowflakeDB: The Data Warehouse Built For The Cloud - Episode 110: An interview about how SnowflakeDB was built to provide a performant and flexible data platform for the cloud era
byData Engineering Podcast
0 ratings
0% found this document useful
A Multipurpose Database For Transactions And Analytics To Simplify Your Data Architecture With Singlestore: An interview with Shireesh Thota about how the Singlestore database engine allows you to reduce architectural sprawl in your data systems by combining performant and scalable transactional and analytical capabilities into a single platform
Podcast episode
A Multipurpose Database For Transactions And Analytics To Simplify Your Data Architecture With Singlestore: An interview with Shireesh Thota about how the Singlestore database engine allows you to reduce architectural sprawl in your data systems by combining performant and scalable transactional and analytical capabilities into a single platform
byData Engineering Podcast
0 ratings
0% found this document useful
Data Modeling That Evolves With Your Business Using Data Vault - Episode 119: An interview about the data vault method of data modeling and how it simplifies integrating the evolving data sources that you are dealing with in your enterprise data warehouse
Podcast episode
Data Modeling That Evolves With Your Business Using Data Vault - Episode 119: An interview about the data vault method of data modeling and how it simplifies integrating the evolving data sources that you are dealing with in your enterprise data warehouse
byData Engineering Podcast
0 ratings
0% found this document useful
Building An Enterprise Data Fabric At CluedIn - Episode 74: An interview about building an enterprise data fabric at scale to ease enterprise data integration
Podcast episode
Building An Enterprise Data Fabric At CluedIn - Episode 74: An interview about building an enterprise data fabric at scale to ease enterprise data integration
byData Engineering Podcast
0 ratings
0% found this document useful
Build Your Analytics With A Collaborative And Expressive SQL IDE Using Querybook: An interview about the Querybook SQL IDE for big data analytics and how you can use it to build more expressive and maintainable analytics.
Podcast episode
Build Your Analytics With A Collaborative And Expressive SQL IDE Using Querybook: An interview about the Querybook SQL IDE for big data analytics and how you can use it to build more expressive and maintainable analytics.
byData Engineering Podcast
0 ratings
0% found this document useful
#1 Data Science, Past, Present and Future: Hilary Mason talks about the past, present, and future of data science with Hugo. Hilary is the VP of Research at Cloudera Fast Forward, a machine intelligence research company, and the data scientist in residence at Accel. If you want to hear about wh...
Podcast episode
#1 Data Science, Past, Present and Future: Hilary Mason talks about the past, present, and future of data science with Hugo. Hilary is the VP of Research at Cloudera Fast Forward, a machine intelligence research company, and the data scientist in residence at Accel. If you want to hear about wh...
byDataFramed
100%
100% found this document useful
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
Podcast episode
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
byInvest Like the Best with Patrick O'Shaughnessy
0 ratings
0% found this document useful
Straining Your Data Lake Through A Data Mesh - Episode 90: An interview about how the data mesh architectural and organizational pattern can lead to a more maintainable data platform
Podcast episode
Straining Your Data Lake Through A Data Mesh - Episode 90: An interview about how the data mesh architectural and organizational pattern can lead to a more maintainable data platform
byData Engineering Podcast
0 ratings
0% found this document useful
2155: Databricks - The Story Behind the Lakehouse Company: Many are citing open source as the future. The UK Government's National Data Strategy even talks about the importance of opening public sector datasets to form the backbone of innovation, efficiency, and growth. This is a trend that Databricks...
Podcast episode
2155: Databricks - The Story Behind the Lakehouse Company: Many are citing open source as the future. The UK Government's National Data Strategy even talks about the importance of opening public sector datasets to form the backbone of innovation, efficiency, and growth. This is a trend that Databricks...
byThe Tech Talks Daily Podcast
0 ratings
0% found this document useful
Azure Databricks: I sat down with Ali Ghodsi, CEO and found of Databricks, and John Chirapurath, GM for Data Platform Marketing at Microsoft related to the recent announcement of Azure Databricks. When I heard about the announcement, my first thoughts were...
Podcast episode
Azure Databricks: I sat down with Ali Ghodsi, CEO and found of Databricks, and John Chirapurath, GM for Data Platform Marketing at Microsoft related to the recent announcement of Azure Databricks. When I heard about the announcement, my first thoughts were...
byData Skeptic
0 ratings
0% found this document useful
Reflections On Designing A Data Platform From Scratch: A monologue by Tobias Macey, the host of the show, about the design considerations involved in building a data platform and how the lessons learned from running the Data Engineering Podcast are influencing the choices made.
Podcast episode
Reflections On Designing A Data Platform From Scratch: A monologue by Tobias Macey, the host of the show, about the design considerations involved in building a data platform and how the lessons learned from running the Data Engineering Podcast are influencing the choices made.
byData Engineering Podcast
100%
100% found this document useful
Getting Technical about the Data Center Revolution with Jonathan Friedmann, CEO of Speedata
Podcast episode
Getting Technical about the Data Center Revolution with Jonathan Friedmann, CEO of Speedata
byMaking Data Simple
0 ratings
0% found this document useful
#70 Beyond the Language Wars: R & Python for the Modern Data Scientist
Podcast episode
#70 Beyond the Language Wars: R & Python for the Modern Data Scientist
byDataFramed
0 ratings
0% found this document useful
Maintaining Your Data Lake At Scale With Spark - Episode 85: A conversation with the architect of Delta Lake on the challenges of building a sustainable data lake at scale
Podcast episode
Maintaining Your Data Lake At Scale With Spark - Episode 85: A conversation with the architect of Delta Lake on the challenges of building a sustainable data lake at scale
byData Engineering Podcast
0 ratings
0% found this document useful
Stryker on How to Connect Data Strategy to Business Value: Modern data leaders know creating a data-informed culture requires cross-functional partnership and collaboration across the entire business. IT by themselves can’t do it. Nor can individual business departments. Both the IT and business strategy must be in lock step to achieve results. On this episode of The Data Chief, Dora Boussias, Senior Director of Data Strategy and Architecture at Stryker, discusses the role of modern data executives, three keys to creating a data-informed culture, and her approach to breaking down silos based on her own 28 years of experience building effective data strategies across industries.
Podcast episode
Stryker on How to Connect Data Strategy to Business Value: Modern data leaders know creating a data-informed culture requires cross-functional partnership and collaboration across the entire business. IT by themselves can’t do it. Nor can individual business departments. Both the IT and business strategy must be in lock step to achieve results. On this episode of The Data Chief, Dora Boussias, Senior Director of Data Strategy and Architecture at Stryker, discusses the role of modern data executives, three keys to creating a data-informed culture, and her approach to breaking down silos based on her own 28 years of experience building effective data strategies across industries.
byThe Data Chief
0 ratings
0% found this document useful
Data Operations vs. Data Analytics: Are we doing data and analytics correctly? Self service, centralization vs decentralization, analytics vs operations… so many aspects that data teams need to consider. Join this week’s episode of Catalog & Cocktails with hos...
Podcast episode
Data Operations vs. Data Analytics: Are we doing data and analytics correctly? Self service, centralization vs decentralization, analytics vs operations… so many aspects that data teams need to consider. Join this week’s episode of Catalog & Cocktails with hos...
byCatalog & Cocktails: The Honest, No-BS Data Podcast
0 ratings
0% found this document useful
An Agile Approach To Master Data Management with Mark Marinelli - Episode 46: Building A Master Data Catalog Using Machine Learning (Interview)
Podcast episode
An Agile Approach To Master Data Management with Mark Marinelli - Episode 46: Building A Master Data Catalog Using Machine Learning (Interview)
byData Engineering Podcast
100%
100% found this document useful
#122 How Organizations Can Bridge the Data Literacy Gap
Podcast episode
#122 How Organizations Can Bridge the Data Literacy Gap
byDataFramed
0 ratings
0% found this document useful
#69 Effective Data Storytelling: How to Turn Insights into Action
Podcast episode
#69 Effective Data Storytelling: How to Turn Insights into Action
byDataFramed
0 ratings
0% found this document useful
[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
Podcast episode
[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
byDataFramed
0 ratings
0% found this document useful
Unlocking The Power of Data Lineage In Your Platform with OpenLineage: An interview with Julien Le Dem about the OpenLineage specification and the opportunity that it offers for simplifying the tracking and analysis of data lineage across your data platform.
Podcast episode
Unlocking The Power of Data Lineage In Your Platform with OpenLineage: An interview with Julien Le Dem about the OpenLineage specification and the opportunity that it offers for simplifying the tracking and analysis of data lineage across your data platform.
byData Engineering Podcast
0 ratings
0% found this document useful
Build Your Data Analytics Like An Engineer - Episode 81: An interview about how dbt enables your data teams to build better analytics in your data warehouse
Podcast episode
Build Your Data Analytics Like An Engineer - Episode 81: An interview about how dbt enables your data teams to build better analytics in your data warehouse
byData Engineering Podcast
0 ratings
0% found this document useful
Strategies And Tactics For A Successful Master Data Management Implementation: An interview with Malcolm Hawker about the technical and organizational strategies that are required for a successful implementation of Master Data Management, as well as when and why it is necessary.
Podcast episode
Strategies And Tactics For A Successful Master Data Management Implementation: An interview with Malcolm Hawker about the technical and organizational strategies that are required for a successful implementation of Master Data Management, as well as when and why it is necessary.
byData Engineering Podcast
0 ratings
0% found this document useful
78: Mindset of a Rockstar Data Analyst w/ Trevor Tapscott: Our focus for this inspiring episode of AOF is mindset, especially if you want to be a standout data analyst! I have brought one of my first ever followers and day ones! Trevor Tapscott is a VP and Analytics Consultant at Wells Fargo and has been in...
Podcast episode
78: Mindset of a Rockstar Data Analyst w/ Trevor Tapscott: Our focus for this inspiring episode of AOF is mindset, especially if you want to be a standout data analyst! I have brought one of my first ever followers and day ones! Trevor Tapscott is a VP and Analytics Consultant at Wells Fargo and has been in...
byAnalytics on Fire
0 ratings
0% found this document useful
Delivering Data and Analytics Value: CEOs cite data and analytics as the top capability for enabling growth over the next two years. In this podcast, Gartner’s chief of research for data and analytics, Carlie Idoine, highlights the top issues facing chief data and analytics officers (CDAOs) and how to demonstrate value.
Podcast episode
Delivering Data and Analytics Value: CEOs cite data and analytics as the top capability for enabling growth over the next two years. In this podcast, Gartner’s chief of research for data and analytics, Carlie Idoine, highlights the top issues facing chief data and analytics officers (CDAOs) and how to demonstrate value.
byTechWave: A Gartner Podcast for IT Leaders
0 ratings
0% found this document useful
A Holistic Approach To Data Governance Through Self Reflection At Collibra: An interview with Stijn Christiaens about his experience building Collibra to address the complexities of data governance in the enterprise, and what he has learned from using his own product to run the business.
Podcast episode
A Holistic Approach To Data Governance Through Self Reflection At Collibra: An interview with Stijn Christiaens about his experience building Collibra to address the complexities of data governance in the enterprise, and what he has learned from using his own product to run the business.
byData Engineering Podcast
100%
100% found this document useful
#75 The Data Storytelling Skills Data Teams Need with Andy Cotgreave, Technical Evangelist at Tableau
Podcast episode
#75 The Data Storytelling Skills Data Teams Need with Andy Cotgreave, Technical Evangelist at Tableau
byDataFramed
0 ratings
0% found this document useful
#81 The Gradual Process of Building a Data Strategy
Podcast episode
#81 The Gradual Process of Building a Data Strategy
byDataFramed
100%
100% found this document useful
Building A Data Mesh Platform At PayPal: There has been a lot of discussion about the practical application of data mesh and how to implement it in an organization. Jean-Georges Perrin was tasked with designing a new data platform implementation at PayPal and wound up building a data mesh. In this episode he shares that journey and the combination of technical and organizational challenges that he encountered in the process.
Podcast episode
Building A Data Mesh Platform At PayPal: There has been a lot of discussion about the practical application of data mesh and how to implement it in an organization. Jean-Georges Perrin was tasked with designing a new data platform implementation at PayPal and wound up building a data mesh. In this episode he shares that journey and the combination of technical and organizational challenges that he encountered in the process.
byData Engineering Podcast
0 ratings
0% found this document useful

Skip carousel

Understanding ELT & ETL
Techfastly
Article
Understanding ELT & ETL
Apr 1, 2021
8 min read
The Future Of The Database
Linux Format
Article
The Future Of The Database
Aug 27, 2019
7 min read
Types Of Databases
Linux Format
Article
Types Of Databases
Aug 27, 2019
NoSQL databases provide the performance, scalability and stability that’s required by the modern data-driven apps we interact with these days. But that is where the similarity between NoSQL systems end. In fact, it wouldn’t be wrong to say that the o
1 min read
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Chicago Tribune
Article
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Jul 10, 2018
3 min read
How to Make Predictive Analytics Work for Your Business
Entrepreneur
Article
How to Make Predictive Analytics Work for Your Business
Jul 1, 2014
1 min read
What is ELT?
Techfastly
Article
What is ELT?
Apr 1, 2021
It stands for extract, load, and transform- the processes a data pipeline uses for replicating the data from a source system into a target system such as a cloud data warehouse. 1. Extraction is the first step in which data is copied from the source
6 min read
Why Is ELT Better For Cloud Data Warehousing?
Techfastly
Article
Why Is ELT Better For Cloud Data Warehousing?
Apr 1, 2021
2 min read
MARIADB Optimise And Control Your Databases
Linux Format
Article
MARIADB Optimise And Control Your Databases
Jul 30, 2019
9 min read
Where is Streaming Data Stored Temporarily? The Role of Storage in Streaming Media
Techfastly
Article
Where is Streaming Data Stored Temporarily? The Role of Storage in Streaming Media
May 1, 2022
4 min read
Edge Computing Ecosystem Architecture, Use Cases, and Examples
Techfastly
Article
Edge Computing Ecosystem Architecture, Use Cases, and Examples
Jun 1, 2022
6 min read
Browser wars 2020
APC
Article
Browser wars 2020
Nov 2, 2020
8 min read
Browser Wars 2020
Linux Format
Article
Browser Wars 2020
Jun 30, 2020
8 min read
Browser Wars 2020
Maximum PC
Article
Browser Wars 2020
May 26, 2020
8 min read
Facilities Systems
Facility Management
Article
Facilities Systems
Oct 21, 2018
5 min read
Browser Wars 2020
TechLife
Article
Browser Wars 2020
Aug 24, 2020
8 min read
Rack Servers For SMEs
PC Pro Magazine
Article
Rack Servers For SMEs
Feb 11, 2021
4 min read
Rediscover Speed With The Redis Revolution
Linux Format
Article
Rediscover Speed With The Redis Revolution
Jul 25, 2023
Credit: https://redis.io Redis is an open-source, in-memory data structure store that has gained popularity R as a highly efficient caching and messaging system. It prioritises speed, efficiency and versatility, making it a top choice for various ap
8 min read
Business NAS appliances 2023
PC Pro Magazine
Article
Business NAS appliances 2023
Apr 6, 2023
4 min read
The Network NAS appliances 2024
PC Pro Magazine
Article
The Network NAS appliances 2024
Apr 4, 2024
4 min read
Not End Of Life
Linux Format
Article
Not End Of Life
May 30, 2023
In case you haven’t heard, MySQL 5.7 is going end of life (EOL). The upstream project will stop updates in October and focus on MySQL 8.0. This is a logical decision and they’ve given users ample time to upgrade. But some users and organisations need
1 min read
Building Trends, Building Momentum
Facility Management
Article
Building Trends, Building Momentum
Oct 14, 2019
3 min read
Integrated Workplace Management Systems
Facility Management
Article
Integrated Workplace Management Systems
Dec 23, 2018
Property and facilities management are data-rich operating worlds. This is becoming even more complex as the Internet of Things (IoT) provides the capability to imbed sensors and diagnostic tools to monitor the use and performance of everything in re
4 min read
Database Control With C++ Tools
Linux Format
Article
Database Control With C++ Tools
Dec 17, 2019
10 min read
Your Next Steps
Linux Format
Article
Your Next Steps
Dec 15, 2020
There are many places you could take this going forwards. For reasons of space and readability, we’ve left out processing of other useful fields from the source XML file. As well as RatingValue , each business gets a score for ConfidenceInManagement
1 min read
Benchmark your SSD
APC
Article
Benchmark your SSD
Nov 2, 2020
4 min read
Create A RESTful Server In Go
Linux Format
Article
Create A RESTful Server In Go
Oct 19, 2021
8 min read
Mining Actionable Information with Smart Capture
The European Business Review
Article
Mining Actionable Information with Smart Capture
May 22, 2018
4 min read
2 The Use of Python in AI and ML
Techfastly
Article
2 The Use of Python in AI and ML
Nov 30, 2020
3 min read
Inform And Enhance Your Business With Open Data
PC Pro Magazine
Article
Inform And Enhance Your Business With Open Data
Jun 10, 2021
7 min read
ETL OR ELT: Build vs Buy
Techfastly
Article
ETL OR ELT: Build vs Buy
Apr 1, 2021
2 min read

Related categories

Skip carousel

Reviews for Data Architecture

Rating: 4.666666666666667 out of 5 stars

4.5/5

3 ratings0 reviews

Book preview

Data Architecture - W.H. Inmon

data.

Chapter 1.2

The Data Infrastructure

Abstract

Corporate data include everything found in the corporation in the way of data. The most basic division of corporate data is by structured data and unstructured data. As a rule, there are much more unstructured data than structured data. Unstructured data have two basic divisions—repetitive data and nonrepetitive data. Big data is made up of unstructured data. Nonrepetitive big data has a fundamentally different form than repetitive unstructured big data. In fact, the differences between nonrepetitive big data and repetitive big data are so large that they can be called the boundaries of the great divide. The divide is so large; many professionals are not even aware that there is this divide. As a rule, nonrepetitive big data has MUCH greater business value than repetitive big data.

Keywords

Structured data; Unstructured data; Corporate data; Repetitive data; Nonrepetitive data; Business value; The great divide of data; Big data

If there is any secret to data management and data architecture, it is understanding data in terms of its infrastructure. Stated differently, trying to understand the larger architecture under which data are managed and operate is almost impossible without understanding the underlying infrastructure, which surrounds data. Therefore, we shall spend some time understanding infrastructure.

Two Types of Repetitive Data

A good starting point for understanding infrastructure is to start with the observation that there are two types of repetitive data found in corporate data. In the structured side of corporate data, repetitive data are found. In the unstructured big data side of corporate data, repetitive data are also found. Despite the fact that the types of data sound the same, there are significant differences between the different types of repetitive data. When it comes to structured repetitive data, it is normal to have transactions as part of the repetitive data. There are sales transactions, stocking of SKU transactions, inventory replenishment transactions, payment transactions, and so forth. In the structured world, there are many of these transactions that find their way into the repetitive structured world.

The other kind of repetitive data is the repetitive data found in the unstructured big data world. In the unstructured big data world, we might have metering data, analog data, manufacturing data, clickstream data, and so forth.

There is the question then—are these types of repetitive data the same? They certainly are repetitive. But these different types of repetitive data are not the same. What is the difference then between these two types of repetitive data? Fig. 1.2.1 shows (symbolically) these two types of repetitive data.

Fig. 1.2.1 Two types of repetitive data.

Repetitive Structured Data

In order to understand the differences between these two types of repetitive data, it is necessary to understand each type of data individually. Let's start with repetitive structured data. Fig. 1.2.2 shows the repetitive structured data are broken into records and blocks.

Fig. 1.2.2 Repetitive data broken into blocks.

The most basic unit of information in the repetitive structured environment is a block of data. Inside each block of data are records of data.

Fig. 1.2.3 shows a simple record of data.

Fig. 1.2.3 Records inside a block.

Each record of data is (normally!) representative of a transaction. For example, there are records of data representing the sale of a product. Each record is representative of a single sale.

Inside each record are keys, attributes, and indexes. Fig. 1.2.4 shows the anatomy of a record.

Fig. 1.2.4 Attributes, keys, and indexes.

If a record is representative of a sale, the attributes might be information about the date of the sale, the item sold, the cost of the item, any tax on the item, who bought the item, and so forth. The key of the record is one or more attributes that uniquely define the record. The key for a sale might be the date of sale, item sold, and location of the sale.

The indexes that are attached to the record are on the attributes that are needed when there is a desire to have quick access to the record.

The infrastructure that is attached to structured repetitive data managed under a DBMS is seen in Fig. 1.2.5.

Fig. 1.2.5 A standard DBMS.

Repetitive Big Data

The other type of repetitive data is repetitive data found in big data. Fig. 1.2.6 depicts the repetitive data found in big data.

Fig. 1.2.6 Repetitive big data.

At first glance, there are just a lot of repetitive records seen in Fig. 1.2.6. But upon closer examination, it is seen that all of those repetitive big data records are packed away into a string of data and that string of data is stored inside a block of data, as seen in Fig. 1.2.7.

Fig. 1.2.7 A block of data.

The structured infrastructure seen in Fig. 1.2.7 is typical of an infrastructure managed under one of several DBMS such as Oracle, SQL Server, and DB2.

The infrastructure for big data is quite different than the infrastructure found in a standard DBMS. In the infrastructure for big data, there is a block. And in the block are found many repetitive records. Each record is merely concatenated to each other record. Fig. 1.2.8 is representative of a record that might be found in big data.

Fig. 1.2.8 Records inside the block.

In Fig. 1.2.8, it is seen that there is merely a long string of data, with records stacked one against the other. The system only sees the block and the long string of data. In order to find a record, the system needs to parse the string, as seen in Fig. 1.2.9.

Fig. 1.2.9 Parsing records inside the block.

Suppose the system wants to find a given record. The system needs to sequentially read the string of data until it recognizes that there is a record. Then, the system needs to go into the record and determine whether it is record B. This is how a search is conducted in the most primitive state in big data.

It doesn’t take much of an imagination to see that a lot of machine cycles are chewed up looking for data in big data. To this end, the big data environment employs a means of processing referred to as the Roman census approach. More will be described about the Roman census approach in the chapter on big data.

The Two Infrastructures

The two different infrastructures are contrasted in Fig. 1.2.10.

Fig. 1.2.10 Two different infrastructures.

Without much effort, it is seen that the infrastructures surrounding big data and structured data are quite different. The infrastructure surrounding big data is quite simple and streamlined. The infrastructure surrounding structured DBMS data is elaborate and anything but streamlined.

There is then no argument as to the fact that there are significant differences between the infrastructure of repetitive structured data and repetitive big data.

What's Being Optimized?

When looking at the two infrastructures, it is natural to ask—what is being optimized by the different infrastructures. In the case of big data, the optimization of the infrastructure is on the ability of the system to manage almost unlimited amounts of data. Fig. 1.2.11 shows that with the infrastructure of big data, adding new data is a very easy and streamlined thing to do.

Fig. 1.2.11 Optimal for storing massive amounts of data.

But the infrastructure behind a structured DBMS is optimized for something quite different than managing huge amounts of data. In the case of the structured DBMS environment, the optimization is on the ability to find any one given unit of data quickly and efficiently.

Fig. 1.2.12 shows the optimization of the infrastructure of a standard structured DBMS.

Fig. 1.2.12 Optimal for direct online access of data.

Comparing the Two Infrastructures

Another way to think of the different infrastructures is in terms of the amount of data and overhead required to find a given unit of data. In order to find a given unit of data, the big data environment has to search through a whole host of data. Many input/output operations (I/Os) have got to be done to find a given item. To find that same item in a structured DBMS environment, only a few I/Os need to be done. So if you want to optimize on the speed of access of data, the standard structured DBMS is the way to go.

On the other hand, in order to achieve the speed of access, an elaborate infrastructure for data is required by the standard structured DBMS. An infrastructure must be both built and maintained over time, as data change. A considerable amount of system resources is required for the building and maintenance of this infrastructure. But when it comes to big data, the infrastructure required to be built and maintained is nil. The big data infrastructure is built easily and maintained very easily.

This section began with the proposition that repetitive data can be found in both the structured and big data environment. At first glance, the repetitive data are the same or are very similar. But when you look at the infrastructure and the mechanics implied in the infrastructure, it is seen that the repetitive data in each of the environments are indeed very different.

Chapter 1.3

The Great Divide

Abstract

Corporate data include everything found in the corporation in the way of data. The most basic division of corporate data is by structured data and unstructured data. As a rule, there are much more unstructured data than structured data. Unstructured data have two basic divisions—repetitive data and nonrepetitive data. Big data is made up of unstructured data. Nonrepetitive big data has a fundamentally different form than repetitive unstructured big data. In fact, the differences between nonrepetitive big data and repetitive big data are so large that they can be called the boundaries of the great divide. The divide is so large that many professionals are not even aware that there is this divide. As a rule, nonrepetitive big data has MUCH greater business value than repetitive big data.

Keywords

Structured data; Unstructured data; Corporate data; Repetitive data; Nonrepetitive data; Business value; The great divide of data; Big data

Classifying Corporate Data

Corporate data can be classified in many different ways. One of the major classifications is by structured versus unstructured data. And unstructured data can be further broken into two categories—repetitive unstructured data and nonrepetitive unstructured data. This division of data is shown in Fig.

Enjoying the preview?

Page 1 of 1

Data Architecture: A Primer for the Data Scientist: A Primer for the Data Scientist

About this ebook

W.H. Inmon

Read more from W.H. Inmon

Related authors

Related to Data Architecture

Related ebooks

Databases For You

Related podcast episodes

Related articles

Related categories

Reviews for Data Architecture

What did you think?

Book preview

Data Architecture - W.H. Inmon

Abstract

Keywords

Structured data; Unstructured data; Corporate data; Repetitive data; Nonrepetitive data; Business value; The great divide of data; Big data

Two Types of Repetitive Data

Repetitive Structured Data

Repetitive Big Data

The Two Infrastructures

What's Being Optimized?

Comparing the Two Infrastructures

Abstract

Keywords

Structured data; Unstructured data; Corporate data; Repetitive data; Nonrepetitive data; Business value; The great divide of data; Big data

Classifying Corporate Data