Developing data models from scenarios involves interpreting narrative descriptions to build accurate representations of data structures, relationships, and rules within a database system.
Understanding the process of data modelling from scenarios
To develop a reliable and robust data model from a scenario, it is essential to follow a clear, logical process. This ensures that the final data model accurately reflects the requirements of the system and can be used to store and manipulate data effectively. The process involves several stages: analysing the scenario, identifying entities and their attributes, determining relationships, constructing an Entity Relationship Diagram (ERD), writing textual descriptions, validating the model, and refining it as necessary.
Step 1: Reading and interpreting the scenario
Careful scenario analysis
The first step in developing a data model is to carefully read and analyse the scenario. This is more than just skimming through the text — it requires active engagement with the information presented. The scenario provides a description of a real-world situation or system, often in narrative form, and your job is to extract the essential components needed to construct a data model.
Key techniques:
Highlight nouns and noun phrases: These usually indicate possible entities. For example, in a sentence like “Each customer places several orders,” the words “customer” and “orders” may represent entities.
Practice Questions
FAQ
When a scenario presents something that could be interpreted as either an entity or an attribute, the decision depends on its significance, complexity, and relationships. Ask yourself: does this element have multiple attributes or participate in relationships with other entities? If yes, it should likely be modelled as an entity. For instance, if “Address” is just a single line, it could be an attribute. But if the scenario discusses the address in detail (Street, City, Postcode) or if different people/entities can share the same address, then model it as a separate entity. Also, consider reusability and data redundancy. If the same data appears across different entities, modelling it as a standalone entity helps with normalisation and avoids duplication. Attributes should be reserved for simple, descriptive data that doesn’t require further breakdown or connections. Make decisions based on the role the information plays in the system, not just how it's worded in the scenario.
A many-to-many relationship always requires a linking (or associative) entity to be properly implemented in a relational database. While ERDs can show a many-to-many relationship directly, databases cannot handle them without an intermediary table. The linking entity acts as a bridge between the two main entities and may also store data about the interaction. You recognise the need for one when both sides of the relationship allow multiple connections (e.g. students enrol in many courses, and each course has many students). The linking entity typically includes the primary keys of both related entities, often as a composite primary key, and may include additional attributes about the relationship itself. For example, in a Student–Course relationship, the linking entity Enrolment may store StudentID, CourseID, and extra attributes like EnrolmentDate or Grade. Always include the linking entity in your ERD and clearly describe its purpose in your textual descriptions.
Optional data in a scenario should be reflected by using optional participation in relationships. In ERDs, this is indicated through cardinality notation. For instance, if not every student has a locker, but each locker must be assigned to a student, you would model a one-to-one relationship where the Student side is optional, and the Locker side is mandatory. In crow’s foot notation, this would involve a zero or one symbol (often a circle and a line) near the optional side. Additionally, if an attribute is optional within an entity (e.g. “PhoneNumber” might be optional for Student), then it can simply be included in the entity without constraints, as long as the database allows null values. The key point is to ensure that optional elements are clearly represented and that the model doesn’t falsely imply that all instances must participate in that relationship or possess that attribute.
The key to avoiding overcomplication is to focus strictly on the data requirements, not every detail of the narrative. First, identify only the essential entities that represent core objects with multiple instances and interactions. Avoid turning every noun into an entity—many can be handled as attributes or even ignored if they aren’t part of the required data. Stick to the minimum number of entities and relationships needed to meet the scenario's functional requirements. Group related attributes logically and avoid excessive use of composite or derived fields unless the scenario demands it. If the scenario contains unnecessary background or information not related to data storage, ignore it for modelling purposes. Keep the ERD readable: use consistent naming, avoid duplicate relationships, and only include linking entities where necessary. The aim is to model the minimum viable structure that captures the scenario’s requirements accurately, without adding complexity that doesn’t provide value.
This is known as a recursive relationship, where an entity has a relationship with another instance of the same entity. In this case, you model the entity as normal, but the relationship line connects the entity to itself. For example, in an Employee entity where employees can manage other employees, you would draw a relationship from Employee back to itself. In the relationship, specify roles to distinguish the two sides — such as “Manager” and “Subordinate”. In the attributes, you might include a “ManagerID” that refers back to the EmployeeID. The cardinality would reflect how many employees one manager can manage (typically one-to-many) and whether every employee must have a manager (sometimes optional, for example if the employee is at the top of the hierarchy). Recursive relationships can be tricky to visualise but are handled like any other relationship in terms of structure, and they are essential when the scenario implies internal hierarchy or self-reference.
