Data Management

From acgg ilriwikis

ACGG Data management Plan

ACGG Overview

The ACGG project is being implemented across 3 countries, Ethiopia, Nigeria and Tanzania, with a projected participation of 7500 farmers (2500 farmers in each country) spread out across a total of 30 sites (10 in each country). Data will be collected periodically during the project with major points of data collection being baseline and cross sectional data collections. There will be intensive longitudinal data collection throughout the project to help in testing different chicken strains performance potential and preference under on-station and on-farm management condition in different sites in three of the countries. This huge number of targeted farmers across different countries and collecting multiple and different amounts of data at different points presents a challenge in terms of data collection and management. The types and amounts of data collected at each point have the effect of complicating the already complicated aspect of data management making data collection and management a very important task in the project. In light of these numbers, there is an imperative of developing data collection workflows and platforms that are uniform, solid and robust such that they are able to manage the huge amounts of data that will be collected but are also flexible enough to accommodate different scenarios which might arise from the different countries and sites where the project will be implemented in. We attempt to define these workflows and platforms here.

Data Management Overview

The ACGG project shall strive to employ only electronic means of data collection at all times. Any use of manual systems (pen and paper) MUST be approved expressively by the program management committee and will only be used in times where electronic means of data collection is impossible. Any manual system will be backed with a quick plan of digitizing the manual records within a time frame of 2 weeks after the end of the manual data collection exercise to avoid accumulation of data in papers as the data tend to lose integrity over time. In the ACGG program, data shall be extensively collected using the Open Data Kit (ODK) platform and maybe be supplemented by in-house or other commercial systems whenever necessary. The data which will be collected shall be managed using the Azizi data management platform (DMP) which will provide a centralized system for all the data in the project as well as a single point of reference for all the stakeholders in the project in terms of data management. In addition, the project will use the DMP to manage the process of creation of ODK forms, curation of data as well as visualization of the collected data.

Data collection

In the project, farm level data will be collected for a total ofthree years out of the 5 years during which the project will be implemented. The data collection process will be broken down in smaller components of:

  1. Baseline data collection
  2. Cross sectional data collection
  3. Longitudinal data collection
  4. …..

Different data points and samples will be collected during each component. It is expected that the following data points will be collected: a) Baseline data collection i. Social economic data including: 1.Weight 2.Pictures 3.Samples b) Cross sectional data collection i. Social economic data including: 1.Farmer preferences 2.…. c) Longitudinal data collection i. Production data including: 1. No of eggs 2. Farmer preference 3. ........ It is envisaged that for each data component, a data team will be set up composed of: - A focal person (component leaders) who will lead the specific component - Country specific focal persons (country component leaders), who will assist the component leader in implementing the exercise at the country level - An informatician who will help in encoding of the survey tools as well as downstream data management eg. Absolomon will be the informatician who will be assigned to this component. The data collection process can be broken down into:

Design of the survey tools The survey tools shall be designed the usual way (as word documents with all the questions and options as well as the constraints to be used). This is important since these are the documents that will be used in obtaining the necessary approvals, eg. IACUC, etc. The component leader for each component will be in charge of ensuring that the survey tools are well designed and are relevant to the project as well as capture the necessary data. This will be done well in advance before the commencement of the survey. In addition the country component leaders will assist in any translations which might be required.

Encoding of the survey tools After the survey tool(s) have been designed, they will be assigned to an informatician from the data management team to encode them to ODK forms. The encoding of the survey tools shall follow Azizi’s guidelines of encoding ODK forms. The encoded tools will be uploaded to the Azizi DMP for all the component leaders to verify them.

Verification of the ODK forms The verification of the ODK forms shall be done by all the component leaders and will be led by the respective component leader and the informatician. The aim of this step will be to ensure that the ODK forms are properly designed, are usable in the field and properly capture all the necessary pieces of data.

Training and pre-testing Training of enumerators is a very critical component of the data collection process and it must be allocated sufficient time. Training of enumerators will be country specific and will be led by the component leader with great assistance from the country component leaders. The country component leaders will assist in the logistics of the training, from identifying the enumerators to be trained to identifying farmers who will be used during the training and pre-testing of the forms. The training will be divided into three (3) components:

  1. a. Classroom training. This will be the first step where the enumerators will be taken through the use of the ODK system and administering the ODK form. This will be an intensive training conducted in a classroom setting and will be conducted by the component leaders (component lead and country component lead). The classroom training will culminate in role playing by the enumerators to ensure that they fully understand the essence of the survey and how to administer the ODK forms.
  2. b. Field pre-testing. Immediately after the classroom training, there will be a field pre-testing where the enumerators will go out to selected farmers and test/apply the knowledge acquired from the classroom training. These farmers will not necessarily be part of the project but will be used to gauge the success of the classroom training as well as identify any issue which might have been overlooked.
  3. c. Review. Time will be allocated for a review of the training process (classroom training and field pre-testing) and this will be done immediately after the field pre-testing. At this stage final changes will be deliberated and adopted before the commencement of the study.

Deployment Deployment of the survey will take place immediately after the training and pre-testing to ensure that there is immediate application of the knowledge acquired from the training. The informatician assigned to the component being deployed will participate in the first few days of the deployment process to ensure that all systems are running perfectly and be at hand to solve and fix any issues which might crop up while using the system. After these few days, the informatician will return to his duty station and continue monitor data as it will be coming from the field as well as solve any challenges. The informatician, in consultation with the component leader, will prepare interim reports of the received data during the duration of the data collection exercise.

Data curation

After the close of the data collection exercise, data curation will be carried on the Azizi data management platform. This process will be carried out by the informatician in collaboration with the component leaders. The raw data will be cleaned and curated and a final dataset generated which will serve as the final clean dataset and will be available on the DMP. Access to the different datasets on the DMP will follow the rules and conditions that will be set by the program management committee.

Data analysis and visualization

The DMP will have a web based section for analysis which will aim at doing a shallow and automatic analysis based on the clean and curated datasets generated from the data cleaning process. The output of this analysis will form input of the visualization section of the DMP. While there is no silver bullet for analysis and visualization, we shall have a team composed of talented system developers and user interface designers who will design and develop additional analysis and visualization systems on a need by need basis.

Data security

The Main DMP is a secure system hosted on servers located in physically in ILRI, Nairobi campus with mirror servers in TZ, Eth and NG and Addis with the project management team. The servers are behind a secure firewall which is maintained and tightly controlled by ILRI ICT team. The servers are part of the research computing infrastructure which is managed by the Research Methods Group (RMG) which ensures that the infrastructure is secure and that all systems and software are up to date.

Data sharing and access

Access to the data in the DMP will be highly controlled by the project management team, which will vet all users and grant access to the necessary people and to the necessary datasets. The following principles will be considered in granting access to the different datasets:

  1. All members must fully understand their responsibilities in terms of ethical handling of data
  2. Every country’s data will be encapsulated
  3. All country members will have free access to their respective country datasets, which will be not be anonymised to them
  4. The members of the project management will have full access to all the datasets in the DMP
  5. There will be limited public access to the visualization section of the DMP
  6. Any and all data which is to be shared with other members outside the project will be sufficiently anonymised to protect the identity of farmers
  7. The project will adhere to the [http:www.gatesfoundation.org/how-we-work/general-information/open-access-policy| foundation] and [http:library.cgiar.org/bitstream/handle/10947/2875/CGIAR%20OA%20Policy%20-%20October%202%202013%20-%20Approved%20by%20Consortium%20Board.pdf?sequence=1| CGIAR] policies of open access