A major concern facing website operators is the ability to extract information from natural language. To overcome this hurdle, they have utilized a combination of manual and automated strategies, albeit unsuccessfully. The most effective one has proven to be crowdsourcing. Looking at the other approaches, the manual approaches utilize simple and repeatable steps alongside coding frame used to classify texts. These manual approaches offer advantage of complex nuances and boundary cases that can be addressed using human-interpretable heuristics. Automated approaches do address these boundary cases as well. However, these manual approaches are associated with problems of scalability, i.e. considerable effort by just a few expert analysts (Breaux & Schaub, 2013). In this case, the main goal is to determine- for the sake of website operators- the key components of an effective information extraction framework, and crowdsourcing has provided a viable answer in this regard.
Especially, crowdsourcing is effective where other automated methods cannot work. For example, the automated methods are ineffective in the extraction of information from noisy images, analysis of political sentiments and translation of text, areas in which crowdsourcing has proved efficient. Typically, crowdsourcing tasks are divided into smaller units of tasks (i.e. microtasks), which are deemed to be manageable. The microtasks are distributed to the crowds, i.e. large number of autonomous workers, each offering their services via crowdsourcing platforms. CrowdFlower and Amazon Mechanical Turk are examples of crowds. The results of crowdsourcing are then combined to provide a solution for the bigger tasks. Despite these advantages, there are challenges with complex crowdsourcing tasks, such as designing task workflow, which is cost-effective and produces better quality results (Breaux & Schaub, 2013).
Use your promo and get a custom paper on
"Privacy Policy: Literature Review".
Zimmeck and Belovin (2014), on their part, focus on an actual solution, i.e. they suggest an automatic system for the analysis of web privacy policies. Especially, this solution is based on the fact that the solutions that have been suggested thus far have not gained widespread acceptance among practitioners and/or users. They call their system Privee concept, which they see as a solution to the limitations of the other systems, including combine classification of rules and machine learning with the crowdsourcing of privacy policy to attain seamless integration into the existing regime of web privacy and is to be implemented into a Google Chrome browser extension to attain crowd sourced outcomes of privacy policy. The analysis of Privee experimental results also reveal the ambiguities of privacy policies that make them hard to understand.
Sadeh et al. (2013), on their part, suggest a system that utilizes various approaches, particularly a combination of “crowdsourcing, machine learning and natural language processing” (p.3). Besides, as Zimmeck and Belovin (2014) observe, one of the reasons there still lacks a system that everyone (both practitioners and users) agree on and approve of is because of the limitations of the systems themselves- these systems have largely been one-dimensional, i.e. addressing one or just a few issues. Therefore, Sadeh et al.’s (2013) system utilizes various approaches, which complement each other’s strengths even as they counter each other’s weaknesses. Sadeh et al. (2013) look to establish a semi-automated understanding of privacy policy. In this respect, their approach combines linguistic representations inherent in the analyzers of natural language semantics with the algorithms for statistical learning, and which generalize on the basis of human-labeled examples that result from crowdworkers.
- Breaux, T.D., & Schaub, F. (2014). Scaling requirements extraction to the crowd:
experiments with privacy policies. IEEE - Sadeh, N., Acquisti, A., Breaux, T.D., Cranor, L.F., McDonald, A.M., Reidenberg, J.R.,
Smith, N.A., Liu, F., Russell, N.C., Schaub, F., & Wilson, S. (2013). The usable privacy policy project: combining crowdsourcing, machine learning and natural language processing to semi-automatically answer those privacy questions users care about.School of Computer Science Carnegie Mellon University, CMU-ISR-13-119 - Zimmeck, S., & Bellovin, S.M. (2014). Privee: an architecture for automatically analyzing
web privacy policies. Proceedings of the 23rd USENIX Security Symposium, San Diego, CA, August 20-22