Data Sources
To gather a large amount of data for a sustainable purpose, it is important to take into consideration various sources of data and to understand where that data comes from. Data can be classified in two categories: primary and secondary data.
When trying to get information out of data the first and arguably a very important step is to take a look at where the data is coming from. Generally, it can be split into primary and secondary data. Primary data is the data that we have generated ourselves (we conducted interviews, we measured something,…) and secondary data is data that we found somewhere else that we can use. This secondary data in turn can be internal or external depending on where we have it from.
Whenever we are analyzing primary data, it is very important to take into consideration how we got it and whether our setup could lead to any unwanted biases. This is even more important when looking at secondary data, especially external one.
- Data sources for data science encompass the various origins from which data scientists collect and obtain data for analysis and modeling. These sources can be broadly classified into two categories: internal and external.
- Internal data sources refer to the data generated and stored within an organization.
- External data sources, on the other hand, involve data that is acquired from outside the organization. These sources can include public datasets available from government agencies, research organizations, and open data initiatives.
- Data science for sustainability relies on diverse sources of data to analyze and address environmental challenges. Here are some key sources of data used in data science for sustainability:
- Environmental Monitoring Data: Data collected from monitoring systems, such as air quality sensors, water quality sensors, and climate monitoring stations, provides information on environmental parameters crucial for assessing sustainability efforts.
- Energy Consumption Data: Data on energy usage patterns, electricity grids, and renewable energy sources help in optimizing energy management, identifying areas for improvement, and promoting sustainable energy practices.
- Satellite and Remote Sensing Data: Satellite imagery and remote sensing data provide valuable insights into land use, deforestation, urban growth, and changes in ecosystems, supporting conservation efforts and sustainable land management.
- Social Media and Online Platforms: Data from social media platforms and online sources can be analyzed to understand public sentiment, consumer behavior, and attitudes towards sustainability. It helps in shaping sustainable messaging and behavior change campaigns.
- Supply Chain Data: Information related to supply chains, including product lifecycles, materials sourcing, transportation routes, and waste management, offers opportunities for optimizing resource use, reducing waste, and promoting sustainable practices.
- Geospatial Data: Geographic data, maps, and spatial datasets contribute to sustainability analysis by assessing urban planning, land conservation, biodiversity hotspots, and the impact of infrastructure projects on the environment.
- Government and Open Data: Publicly available data from government agencies and open data initiatives offer valuable insights into demographics, environmental policies, emissions data, and other sustainability-related information.
- Research Studies and Reports: Scientific studies, research papers, and reports provide data and findings on specific sustainability topics, enabling data scientists to build on existing knowledge and contribute to sustainable solutions.
https://www.analyticsvidhya.com/blog/2022/03/an-overview-of-data-collection-data-sources-and-data-mining/