PCDE Module 4 Discussions & Quizzes

Discussion 4.1 - Examples of Big Data Applications

Prompt

Do some research & identify a big data application that you find interesting. First, share your understanding of the application. Describe why you have chosen that particular application and make sure to include a link or citation to the application.

  1. How big is the data?
  2. What are the application objectives?
  3. What technologies are being used in the application?
  4. Can you brainstorm any potential ideas for further enhancements for the application or analytics?

Read the statements posted by your peers. Engage with them by responding with thoughtful comments and questions to deepen the discussion.

Answer

I've worked on software for retail companies before, mainly on the frontend. The data collected by retailers can be huge. You need the order history of customers. The supplier history of pricing. The customer habits on the store. There's many other sources but I'll focus on these.

The primary objectives of Big Data in retail is to analyze customer behavior and improves sales with that information. This data can then be used to gain insights in customer preferences, buying patterns, supplier pricing, supplier inventory and market trends. This data then gets used to: personalize marketing efforts, improve inventory management, and optimize pricing and sales strategies.

The store site, customer order system and inventory management side of the system will typically consist of some relational database. These datasets are usually highly related and need quick access via software. They might even include in memory key value databases to manage user sessions.

The Big Data will typically get warehoused by the likes of Hadoop, MongoDB, or Spark. This data is less live like the kinds stored in relational databases. Typically the data warehouses are used to periodically perform analytics whose results end up in the faster databases.

Since the lectures have mentioned the use of IoT for data collection, I personally brainstormed an idea of using more sensors in retail stores to get better insights into customer behavior to better optimize the physical stores. Does this new layout hurt the likelihood that someone will find a product they want? Do these displays improve the likelihood of a purchase? These are the kinds of retail applications that could be improved by employing Big Data analytics with data from new sources like IoT.

Discussion 4.2: Entities & Properties: Applications

Prompt

At the end of Video 4.5, Dr. Sanchez challenges you to come up with an example of a simple database with entities and properties that suits an area of your interest. Following the example presented in the video about colleges and students, find an application that you are interested in and come up with at least two entities, each with at least three corresponding properties. In your post, include:

  1. why the application interests you,
  2. a detailed description of the entities and properties, and
  3. a description of the potential relationships between these entities.

For example, you may want to define a Houses table with properties SquareFootage, NumOfBedrooms, and Basement that describe, respectively, the square footage of the house, how many bedrooms it has, and whether it has a basement or not. Next, you could define another table, Neighborhoods, with properties Houses, Schools, and DistanceFromDowntown. In this second table, each property describes a particular house that belongs to that neighborhood, how many schools are in the neighborhood, and the average distance of the house in the neighborhood to downtown. A potential application for this example would be to help a real estate agent keep track of the types of listings they have available.

Your initial response should be 100 to 125 words.

Read the statements posted by your peers. Engage with them by responding with thoughtful comments and questions to deepen the discussion.

Suggested Time: 45 minutes

Suggested Length: 100-125 words

This is a required activity and will count toward course completion.

Response

I have worked a lot on frontends in the e-commerce sector so my thoughts were on a set of database entities and properties that often get used there. So it's interesting to me to potentially get into the data layer of all the front end code I've written in this field over the years. Below is a nested list version of them for easier reading.

The three most important entities I thought of are: Customers, Products, and Orders. These three together can represent a lot about an online store.

Customers will hold information about the customer. Information like: Name, Address, and Email would be included as properties. This data would depend on which Customer is on that record.

Products would represent all the products in the store. Each product would have: an SKU, ProductName, Price, and Stock. An SKU is a Stock Keeping Unit and it is a common numbering system which gets used in retail. Almost any physical product in a store will have one or something like it. This will be important to track each individual product. The ProductName should be seperate from the SKU since it has a tendency to change, believe me it can be a pain when it does, so it should be a seperate property from SKU. The Price & Stock fields are fairly simple. Price is there to track a product's ever changing price. The Stock is there to track how many of them are available for order.

Quiz 4.1 - NoSQL vs SQL

Q1

Which industry in Video 4.1 proves to be data-rich but struggles to obtain good predictions based on data?

Q2

Which industry or domain is not mentioned in Video 4.1 that has many data applications?

Q3

If a website generates data in the form of JSON documents, what type of database would you use to store these JSON documents?

Q4

What type of database would you use to store airline data containing passangers' travel history information with columns showing date of travel origin, destination, etc...?

Q5

Which database can be used to store data with key-value pair format?

Quiz 4.2 - Entities, Properties, Primary Keys, & Relationships Between Tables

Q1

What are entities in a data model?

Q2

What are referred to as properties?

Q3

What is the purpose of a unique identifier?

Q4

What should you do if a table doesn't contain a unique identifier?

Q5

Suppose there are two entities called Employees and Organizations. The entity Employees has a primary key called OrganizationId. The OrganizationId column is also present in the entity Employees.

In the entity Employees, the OrganizationId column is treated as what?

Q6

What is the meaning of cardinality?

Q7

What are the possible relationships between two tables?

Q8

Suppose there are two tables called Employee and Organization. Suppose further that one employee can only work for a single company at a time. Employee has a primary key of EmployeeId and Organization has a primary key of OrganizationId. Additionally, OrganizationId is a foreign key in the Employee table.

What kind of relationship exists between the Organization and Employee tables?

Q9

Suppose you have two tables: Students and Results. Students has a primary key of roll_num and Results has a primary key of result_id. Additionally, result_id is a foreign key in the Students table. The result_id is unique for ea. student. No two students can have the same result_id.

Self Study Drag & Drop Activity 4.1: Database Design

Initial Statements to Be Filled in

One fundamental step when designing a database is to identify its entities and (?).

An entity can be thought of as (?) defined within a database. On the other hand, properties are simply defined as the (?), or attributes, that each table (?).

When defining the entities and properties of a database, it’s important to check if any table has been (?) in the design process and if all the properties are (?) to the correct table.

The next step in database design is to identify a (?) for each table. Primary keys are also called (?) and they are useful to access any (?) record in your table.

Defining a primary key is of paramount importance. In fact, failing to define one can cause problems in your database (?).

A strategy to ensure that a unique identifier is present is to add an extra (?) to your table that will act as a unique identifier.

Finally, before physically designing a database, it’s important to understand the (?) between the different entities in your database. This is particularly useful because it allows you to establish how (?) between the tables can be (?).

There are three types of relationships: 1-to-1 (one-to-one), (?) (one-to-many), and N-to-N (?).

Complete Statement with all Blanks Filled in

One fundamental step when designing a database is to identify its entities and properties.

An entity can be thought of as a table defined within a database. On the other hand, properties are simply defined as the columns, or attributes, that each table captures.

When defining the entities and properties of a database, it’s important to check if any table has been forgotten in the design process and if all the properties are attached to the correct table.

The next step in database design is to identify a primary key for each table. Primary keys are also called unique identifiers and they are useful to access any unique record in your table.

Defining a primary key is of paramount importance. In fact, failing to define one can cause problems in your database manipulation.

A strategy to ensure that a unique identifier is present is to add an extra column to your table that will act as a unique identifier.

Finally, before physically designing a database, it’s important to understand the relationships between the different entities in your database. This is particularly useful because it allows you to establish how records between the tables can be accessed.

There are three types of relationships: 1-to-1 (one-to-one), 1-to-N (one-to-many), and N-to-N (many-to-many).

Responses

Quiz 4.3: Naming Practices

Q1

In Video 4.10, what noun form is recommended for use in table names?

Q2

In Video 4.10, the professor suggested using what type of form for column names?

Q3

One of the table names in a database is Books & the db designer would like to have a book ID as the primary key of this table. Which of these names follow the naming convention that was reviewed in Video 4.10?

Q4

There are two tables in a database: Books & Authors. Which name is most likely to be a foreign key name in the Books table?

Mini Lesson 4.1: Normalization

As part of a self learning exercise from the subject of database normalization, Follow along this Mini Lesson.

What is Normalization?

Normalizing data involves a set of practices that primarily aim to avoid redundancy in the data, improve integrity, and eliminate duplicate records.

When building out the details of a table, you may run into anomaly issues whenever adding, updating, or deleting data. Normalization, with its set of database organization practices, can help prevent that.

A database that has been normalized is considered a Normalized Form.

What Types of Normalized Forms Exist?

There are three types of normalized forms: First Normal Form (1NF), Second Normal Form (2NF), Third Normal Form (3NF). Let's review ea. of them in more detail.

The 1NF ensures that any cell in your table contains only a single value. This can also be referred to as atomicity and it ensures that records can be accessed individually without the risk of unintentionally retrieving multiple values.

The 2NF builds on the first form. The 2NF can't be enforced w/o having implemented the 1NF. The goal of the 2NF is to guarantee that every non-key depends on the entire PK. In certain cases, one may want to separate a single table into smaller tables and assign a PK to ea. table. Doing so guarantees the dependence of ea. value on a key.

Like the relation between the 2NF & the 1NF, the 3NF builds on the 2NF. In other words, if a column, say A, depends on the PK column, then you cannot have another column, say B, also depend on column A & indirectly on the PK column. As with 2NF, a popular strategy to avoid this issue is to separate the original table into smaller ones and assign new PKs to guarantee independence.

Further Reading

More can be learned about the nuances of normalization in SQL and its many types, by reading this edureka article on the subject.

Quiz 4.4: Cardinality Between Tables and Normalization

Q1

Suppose that you have two tables called Students & Subjects, where Students has a primary key of studentId and Subjects has a primary key of subjectId. Each student can opt for multiple subjects and each subject can be opted for by multiple students.

What kind of relationship can be established between the tables Students and Subjects?

Q2

Suppose that you have two tables called Employees & Vehicles, where Employees has a PK of employeeId and Vehicles has a primary key of vehicleId. Each employee can have multiple vehicles & each vehicle belongs to only one employee. What kind of relationship can be established between the tables Employess & Vehicles?

Q3

Suppose you have two tables called Publishers & Books, with PK set as pubId & bookId respectively. Also, assume that a 1-to-N relationship exists between the two tables. How would you map and represent the data in these tables to reduce redundancy in the data?

Q4

What is a foreign key?

Q5

What is the goal of normalization?

Q6

Which normal form requires that each cell in the table should have an atomic value?

Q7

Which of the following is related to the Second Normal Form?

Q8

What does the Third Normal Form require?

Example Education Database

For the rest of the course, a simple example database for tracking students and colleges will be used.

Discussion 4.3: Database Application Design

Prompt

For this discussion, you will continue to add more complex design aspects to the database that you started in Discussion 4.2.

In your discussion post, identify and describe the relationship between the entities and properties in your database. Then define and include details about the primary keys and foreign keys of all of your tables.

Finally, identify and discuss three potential use cases of the database that you designed.

Read the statements posted by your peers. Engage with them by responding with thoughtful comments and questions to deepen the discussion.

Suggested Time: 60 minutes

Suggested Length: 150 words

This is a required activity and will count toward course completion.

Answer

I wrote about retail being a common use of big data, so I'll design a database around a theoretical online retailer. To me the most important tables or entities sounds like:

A quick summary about the entities, properties and relations. In the Customer entity, which represents each customer with an account. Its primary key is the CustomerID. Although the Email property is unique for every Customer, it makes sense to have a separate primary key in case they want to change email. Online retailers also often allow associating via multiple email addresses. This means the second normal form is broken because each primary key needs to be unique entirely to one entity. The rest of the properties are non-keys describing the Customer, that are entirely dependent on the primary key.

Next the Proucts entity, The first property is the SKU. In retail, almost everyone uses some SKU or Stock Keeping Unit system. Each SKU is unique by design so it only makes sense to make this the primary key for Products. The rest of the properties are non-key properties describing the product. There are no foreign keys because Products are many-to-many related to Orders which necessitates a join table ProductOrders. The relationship for a many-to-many entity and its join table is 1-to-many. Meaning the foreign key should reside in the join table.

Now for the Order entity, which describes any given order a customer has made. Its primary key is an OrderID. It has a 1-to-many relationship with customers. One CustomerID can have many orders, but single Orders don't make sense being related to multiple Customers. Because Orders is the many in 1-to-many it makes most sense to put the foreign key to Customers, CustomerID inside Orders to avoid duplication of data. Just like with Products which Orders has a many-to-many relation with, the relationship for an entity to its join table is 1-to-many. This means Orders only duplicates data by putting a Product related foreign key within it.

Finally, the ProductOrder join table. This is the join table that maps Products & Orders together. Because ProductID & OrderID are each unique within their tables, compositing their values by concatenation guarantees ProductOrderID, the primary key, is also unique. And since the relation between a join table and its joining entities is 1-to-many with the many on the join table side, this means the foreign keys in the relationship should reside here. So this table will have two foreign keys; the ProductID and OrderID being mapped together. There's also another piece of information that depends on both Products & Orders, Quantity. The Quantity of Products in Orders depends on both Entities. This means in the third normal form it makes most sense to place that property here.

There's tons more complexity involved in any e-commerce site. However, I thought these most simply describe some of the database modeling topics we've encountered so far.

Quiz 4.6: Datatypes

Q1

While creating a table in a database, which of the following needs to be defined?

Q2

What can happen if a field has the wrong datatype in a table?

Q3

Which datatype is the best when there are leading zeroes in the data, such as ZIP codes used in U.S. addresses?

Q4

Which keyword is used to order the dataset in ascending order in a MySQL query?

References

Web Links

Note Links

Referenced By