CAREER ASPIRATIONS
I am an information science (IS) major, with a concentration in health informatics and I plan to complete the AUD/PHD program if I get accepted after graduation. My major may seem disconnected from my future plans at first glance; however, I believe it will be incredibly beneficial to me, even when compared to the other health degrees. For example, in IS data analytics, and the conceptualization, organization/management, and synthesis of information are some of the key focuses. Audiology is a complicated field and has many unique variables that must be tracked; with my ability to manage information properly, it will not only help in research but ensure that every unique aspect of a person’s situation gets addressed, leading to higher satisfaction with diagnosis. Furthermore, as I have been involved in auditory neuroscience (ANS) research, I am well versed in the mechanisms and tests conducted and hope to use this knowledge to standardize hearing examinations, ensure proper frequency testing for children, notably that of low frequency hearing loss which is overlooked in comparison to high frequency hearing loss, often as part of age-related hearing loss. On a personal level, this field is very important to me as I personally require hearing aids for low frequency hearing loss, and was misdiagnose until I was 15; I want to be able to give children the benefits I received at a time that would be more beneficial to them, notably before they transition to middle school when class sizes and number of instructors increase, and the benefits of hearing aids become more apparent.
To ensure I was well versed and prepared, I began research in accessibility in high school, however it was primarily focused on oncology; then in university I was part of an ANS lab for two years, became a research assistant for the Risk to Resilience Trafficking in Persons (TiPs) Lab I am now conducting my own independent research in audiology, and IS. As I would approach the labs and entities I worked for directly, I have not used any career services except for the career fairs; I don’t have internship experience, as I prioritized research and building my skills through active involvement in my field of interest. Through my extensive research experience, and my personal attachment, the field of Audiology became the field that was closest to my heart.
To conclude, I hope to become someone capable of helping people with hearing loss to receive the benefit of hearing aids at a timely manner, to improve standardization of testing methodology to improve diagnoses of patients, and to continue research in multiple disciplines.
PROJECT MANAGEMENT:
Initiation:
Our team aims to implement a university-wide software catalog that will be managed by the university IT department to track software installations. To do so, we would need the following info of each software: type, developer, version/s, types of licensing agreements for each type, what departments use the software and on which computers the software is installed, and the installation dates of the software.
This would be useful information as we could better prioritize where to allocate funding for students and staff, e.g. if every engineering student need SolidWorks or other equivalent CAD software, it would be useful to have such a subscription available through the university. Furthermore, as each device is connected to the University network on campus, and would be accessing university servers off campus, all relevant details would be available through the request and would require minimum student/staff interaction; however, to verify that the service is not being abused, students/staff must login/authenticate to access the catalogue. To verify accurate, and useful spending on software, using a satisfaction survey at the start and end of the semester with another optional survey on the homepage of the catalogue for students/staff to input software that would be beneficial for the university to provide would be essential. However, not all requests can be met, and some software will require certain Role Based Access Control (RBAC); for example, expensive software such as SolidWorks should only be provided if the student has a COENG attribute or registered in a class that requires it if outside the COENG program; this will minimize spending to ensure availability of all other software and increase the budget for specific software that would be beneficial to other majors (Creative Cloud, Tableau, etc.).
Definition:
Preconditions: Users must use a Chromium-based (Chrome, Arc, etc.) browser or Firefox to ensure consistent access.
Functional Requirements: Software must be formally acquired and transacted through the university before release to students/staff; for example, IT must have receipts and verification that an enterprise software license has been acquired PRIOR TO posting to catalogue.
Operational Requirements: Software catalogue should be as simple as possible to reduce errors, such as icons and labels only; this will ensure that all versions can be accessed, as iconography/logos can change and cause confusion.
Design Limitations: Font should be available by default to most machines (eg. Calibri/Times), and colors should be in the SRGB color-space only to prevent display issues (crashing, improper branding, etc.).
Due to the number of working parts, a variable amount of participants that is expected to increase during development, the following goals have been set:
All current students that are not expected to graduate within a semester of expected implementation are the target demographic of this project; any students who graduates prior to this deadline should not be counted to avoid overspending.
All students who are part of this count should be able to use these services for at least one semester and up to a total of 10 full semesters or 8 full semesters and two summer terms; these sub-goals are defined as a full four years, or a five-year degree.
Design:
The Software Catalogue is to be accessed through the university website, which will require University authentication, providing necessary information to service providers for authentication of software to allow proper licensing and download privileges of the software.
Development:
To develop this catalogue, it is imperative to maintain a good relationship between the university and service providers; thus, all licenses must be accurately purchased, all NEW licenses should be purchased during the second week of each semester (post add-drop week) to ensure that all RBAC is properly accounted for, to minimize spending and ensure that any transfer students have enough time to receive the services. Additionally, it is equally if not more important to have a good relationship between IT and the students/staff; thus, 5% of current IT staff, or all new IT staff (whichever is greater or a mixture of both) should receive training on basic questions a user of the catalogue could have to assist any one with questions. Some questions include: how do I access the Catalogue (University website), how do I download the software (receive authentication and follow the link to service provider), and what criteria do I need to download software (student/staff, or registered in affiliated course).
Implementation:
As mentioned in the Designs phase, there are many moving parts; to limit scope during the pilot run, a split-sample of 50% of the total audience should be set. This means a target of 10% total audience by year participation (10%/ year up to five-year degree students) should be achieved by end of the first semester of implementation. All licenses should be acquired within a month of the start-date of software offering to ensure all licenses have successfully transacted and registered on the provider’s end.
During the first full year of implementation, a survey should be conducted four times: two weeks after the start date of the catalogue, a week before the end of the first semester of implementation, the second week of the second semester, and the end of the second semester. However, due to the nature of these surveys, the questions will vary; in the first survey, the following questions will be asked: did you know of the catalogue (Y/N), are the software’s you need for class present in the catalogue (Y/N), was the catalogue easy to find (Y/N), where did you find out about it (free response), was it easy to use (Y/N), and any feedback or notes a participant may have. This survey is focused on gaining an audience and increasing participation.
The second survey should have the following questions: repeat questions 1-2 from first semester but ask if all services from the previous semester were still available or not available this semester (Y/N), were you satisfied with the functionality and accessibility of software (Y/N), and which software did you use the most from the catalogue (free response, and N/A), repeat questions 3-5 from first semester. These questions will test student/staff satisfaction and determining software purchasing for the university.
The third survey will repeat the first survey, and the fourth will repeat the second survey to minimize fatigue of participation.
Follow-up:
The training mentioned in the development phase should occur each semester of implementation and the staff should be called the “Software Helpdesk”, the surveys should occur in the order of the implementation phase for proper purchasing and budgeting of software and be repeated as needed if the project is successful.
At the beginning and end of each semester, there should be a featured section of staff on the IT website to show some special talent, and those who worked diligently to make the catalogue successful.
EVENT REPORT:
Conference title: Gartner Data & Analytics Summit Lecture: Top trends in data and analytics in 2024
I chose to find a conference on data analytics (D&A), as it is one of the fundamental skills required to accurately conduct research, read charts and analyze trends to improve quality for clients, and the business; as I am looking to pursue audiology, this aspect will be particularly important in improving the quality of patient care. One notable topic in almost every trend of D&A included AI, its effects, and peoples fear and criticism, and how to address them.
As mentioned before, it is important to measure trends, and the four major trends in D&A that were presented at the conference are as follows:
1. D&A has gone from being good enough to have it, to a mandatory requirement for business to have it. This is due to the fact that as needs of businesses and clients become more specific, the data needs of the organizations become more complex; thus, the need for analytics on this influx of data is imperative to keeping the businesses afloat.
2. Initially, organization management was chaotic, but with data analytics, it can become managed complexity; this means that you can improve understanding of the chaos in the workplace, and account for international conflicts to better manage and mitigate consequences on an organizational level.
3. Initially it was as if there was a single source of truth (blind trust or faith in the data), to now having a “deluge of distrust”; this is not a good thing, as pointed out by the speaker, but just a fact of society that must be accepted. With rising privacy, societal, and politic concerns among other tensions, people have begun to trust institutions less than before. Additionally, as the ways to maliciously change the way data is used or presented improve such as deepfakes/AI generated content in the voice and mannerisms of individuals, even legitimate information has to be evaluated under the same level of scrutiny as that of illegitimate information.
4. From overloaded to empowered: the pandemic was catastrophic in multiple ways, including the shattered moral of workers, fear of employment or lack thereof, and factors that lowered productivity such as greatly lowered opportunities for inter-personal communication; this led to the need for employers to empower there workers, so that a state of normalcy and increased satisfaction could be achieved.
Evidence for these trends came from a Gartner survey of 61 businesses; for the first trend, they asked senior leaders’ perception of the impact of GenAI in the next three years, with 74% expecting it to have a high impact. There are five areas that need to be addressed properly to mitigate its effects: strategy, finances, technology, organization, and the employees themselves. Some issues between these five categories include lack of data-driven innovation, misallocation of resources and underused investments on the financial side, multiple points of failures in technology, loss of influence of the D&A team and incorrect use of data for decisions at an organizational level, and the need to address employee burnout and difficulty in sourcing talent.
To address the necessity of D&A to businesses, its vital prove your value to the organization by establishing a clear link between D&A capabilities and what is valued by the organization, optimizing resource allocation to improve investments, and to include intangible goals such as sustainability and diversity goals. Secondly, the use financial operations (finOps) to set and enforce trends, track and manage costs, conduct regular vendor price/performance reviews, and approach vendors that can automate FinOps to allow for responsible business spending. Third, you must establish D&A franchises in multiple aspects to develop best practices, skills and technology for the organization.
From the same survey, 71% of responses indicated the belief that tasks will only become more complicated; to minimize points of failures, D&A must be prioritized to reduce complexity and produce timely solutions for these vulnerabilities. However, not all complexity is bad, and it should be balanced; to do so, you must simplify governance through policies rather than elaborate guidebooks, and by simplifying the technological requirements. With this, you can understand the context of your organization and services; using AI and data mining processes, it is possible to develop responses to complex situations. However, one thing to emphasize is that organizations should not be aiming to replace people with AI, but rather to improve business efficiency so that the people can do the work they’re best at.
Even with these benefits, maintaining trust between the business, its employees, and customers is vital; a majority of responses included privacy concerns, worry of misuse of data, employees fearing for their jobs, and the potential to generate harmful content due to AI implementation. To mitigate these worries, organizations must focus on transparency, and to let people know how there data is being used.
As mentioned before, various concerns have led to decreased employee satisfaction and productivity, leading to a volatile business-employee relationship; once again, AI is a major concern with the second most popular opinion on the survey being that employees found AI to be threatening. Furthermore, this is exacerbated by the effects of the pandemic, where employees would rather have a better work life balance and feasibility of working due to high levels of burnout. Based on perception of employees, AI poses a threat to job security, thus fearing the lack of these benefits to employees. To mitigate these challenges and fears, policies around privacy concerns and best practices of AI needs to be written, with governance based on user empowerment by ensuring employees are educated in proper use of new technologies, change-management strategies, and decreasing the need for personal information by using public information to decrease privacy risks. Additionally, to make people more valuable to companies, time should be allocated regularly for employees to pursue side projects that are aligned with the business out of their own interest, to increase employee value and satisfaction.
To conclude, AI is a challenge that will percolate to every aspect of a business, but through data analytics a business can gain better understanding of its need and its employee’s needs to improve job satisfaction, and decrease the fears associated with rapidly evolving technologies, AI or otherwise.
References:
Herschel, G. (2024, March 13). Top Trends in Data and Analytics 2024 l Gartner Data & Analytics Summit. YouTube. https://www.youtube.com/live/uLWtnaFFhzQ?si=f4HO2cDO-jLyE-U
ETHICAL CASE STUDY:
Disclaimer:
Before I begin writing about any matter of physical, ethical, or other concerns of IoT, I want to outline my experiences that could bias my viewpoint; I have been wearing a smart watch connected to an Android phone that is closely connected to a Google account since high school, and my brother has an Alexa at home. As such, I have become well acquainted with IoT devices, but am not as overly fearful of them, as the articles I will discuss later may try to convey.
In the TedED video on IoT (TedED & Fw:Thinking, 2013), it mentioned a smart home environment that had an “understanding” of the people, including their emotional and physical desires, allowing for decision-making prior to human interaction; for example, the thermostat detects the weather outside as being cold, so it raises the temperature inside when it detects a person, and a fridge that tells someone what’s in the fridge and what can be made with it based on a person’s diet (collected from a fitness tracker). It’s important to mention this video is 11 years old and with hindsight could be considered a “worst case scenario” to an extent. In the video, it discussed some possibilities that are innocuous in a vacuum, yet terrifying in full effect; however, it’s unlikely someone would currently have what’s required for that level of fear. To be objective, and as fair as I can, let’s look at a semi-open ecosystem, as different brands differ in restrictions; I am choosing a Samsung A35 phone, and watch per person (a Midrange phone, and a more open ecosystem than Apple’s), a Google nest mini in every room of a four-bedroom house, and the living room as it’s currently slightly more popular than the Amazon Echo dot globally (Google Trends, 2024), the ring wired door bell (as it’s the most affordable model), two Nest cameras (bought as a pack) monitoring ONLY the front and backyard, and a Nest thermostat, which should be the bare minimum for this extreme use case.
Here is a base price breakdown if you were to install it yourself (at full retail, pre-tax and fees):
Product | Price | Quantity | Total |
---|---|---|---|
Samsung A35 | 399.99 | 4 | 1599.96 |
Samsung Watch | 220 | 4 | 880 |
Google Nest Mini | 49 | 5 | 245 |
Ring Wired Door Bell | 50 | 1 | 50 |
Nest Camera (2-Pack) | 330 | 1 | 330 |
Nest Thermostat | 280 | 1 | 280 |
Total Unit Price | Total Price | # of Devices |
---|---|---|
Total Unit Price (1 each) | Total Price | #Devices |
1328.99 | 3384.96 | 16 |
At a minimum (one person household) investment of $1238.99 & a preferred amount of $3384.96, ignoring the cost of the Wi-Fi/networking + storage for cameras, and assuming that the wiring of the network is feasible enough to allow a person to wire the 16 devices themselves, will require considerable effort on a person’s end, and the knowledge to do so. Thus, until the home/apartment developers start to make all of these devices standard and bundled into the price of construction as default, it would be difficult to have such an integrated lifestyle.
Let’s continue to assume that this scenario is used for the analysis of the next articles; it would be assumed that all of these devices would have to be registered to some account, thus tracking some information and would need to be set to a standard of privacy to limit invasive corporate involvement (selling data to marketers, service providers, etc). To combat this, Singer et al. stated that policies would need to specifically mention what data is shared, collected, and stored (Singer & Perry, 2015); various federal laws, such as the Privacy Act, GLBA, and COPPA require these concerns to be defined in privacy policies. To mitigate damage, it would be necessary that devices allow the option to save all data collected locally only to share minimal information to third party services. For example, Nest devices have the ability to save locally only (Google Support, n.d.), however this comes at the expense of the user for storage, but allows for personal verification, information management, and containment.
However, even if locally hosting and maintaining most personal info, it is possible for malicious users to gain access; one example given by Karsten was that a FitBit could be hacked and give wrong information to emergency responders (Karsten, 2016), but this would require several steps to occur. First, the user of the device would need to have provided complete and up to date information to the FitBit or an unsecure location on the mobile device; second, the user would have to have either a cellular (mobile or on the FitBit) or Bluetooth connection to the affiliated device; third, the malicious user would need to access the affiliated mobile device for full information, and then make a convincing call to first responders that the user is in imminent danger and in need of services. This would require not only the skills of the hacker, but for the user to have been reckless/negligently providing information in multiple sources; to prevent this, one possible solution is for users to provide the bare minimum info to the fitness tracker and to store the personal info in the emergency contact page, which is stored locally only. Furthermore, several states such as California and Virginia (CCPA, VCDPA respectively) have legislation that allow consumers to request companies to delete, or access their information (Brown, 2021), which when done routinely can minimize the amount of data companies store on them.
To conclude, the IoT is a technology that provides great benefits, but equal amounts of risk; however, due to the level of involvement for the worst case, proper risk-management strategies, and by providing the least amount of acceptable information can protect privacy, while retaining functionality.
References:
· Brown, G. A. (2021, March 3). Consumers’ “Right to Delete” under US State Privacy Laws. PrivacyWorld. https://www.privacyworld.blog/2021/03/consumers-right-to-delete-under-us-state-privacy-laws/
· Google Support. (n.d.). How nest cameras store recorded video. https://support.google.com/googlenest/answer/9242083?hl=en
· Google Trends. (2024, October 13). Google trends. Retrieved October 13, 2024, from https://trends.google.com/trends/explore?q=amazon%20echo%20dot,google%20nest%20mini&hl=en
· Karsten, J. (2016, July 29). Alternative perspectives on the Internet of things. Brookings Institute. https://www.brookings.edu/articles/alternative-perspectives-on-the-internet-of-things/
· Singer, R. W., & Perry, A. J. (2015). Wearables: the well-dressed privacy policy. Intellectual Property & Technology Law Journal, 27(7). https://link.gale.com/apps/doc/A420929651/AONE?u=tamp44898&sid=bookmark-AONE&xid=74b7983c
TedED, & Fw:Thinking. (2013, March 1). What is the Internet of things? TED-Ed. https://ed.ted.com/on/VGdKwYzz#watch
Iot News
Recently AI has been evolving rapidly and becoming more sophisticated in areas such as video generation, and voice synthetization; this has led to ethical a privacy concerns due to misuse, such as deepfakes (including synthetic pornography, mentioned later), and the potential to spread misinformation (Shamsi, 2024). To help combat this, the Coalition for Content Provenance and Authenticity (C2PA) was formed to establish standards to verify the legitimacy of content, and prevent mass scams, such as when a finance worker paid $25 million to a person using a deepfake of the CFO, and several close colleagues (Chen, Magramo, 2024). However, due to different models being trained in different ways and datasets (in both size and scope), making a universal detector is difficult; furthermore, the more advanced models get, the easier it becomes for users to evade detection. However, some generative models leave unique noise (randomness and unpredictability) or fingerprints (unique identifiers), which can be useful for still images. But, due to the copious amount of models, recognizing these takes time, and other measures such as physiological oddities (ex. unusual eye blinking movements) can occasionally be used; however, as AI models can represent a wide variety of content, it is possible for no physiological qualities to be present (ex. An inanimate object). For such cases, individual frame analysis or audio-visual inconsistencies can be useful (eg. A person is speaking, and their words are “out of synch”), but these above-mentioned strategies are not able to accurately detect AI use in still images. A new strategy for still images is to use pattern embedding in authentic content, such as inserting unique metadata or credentials into the image, which can be as small as 1-bit (or 8 bytes), such as adding an identifier of 45600123 at the beginning of the metadata, or by adding a “fragile watermark”, or a watermark that becomes irrecoverable if a single bit is changed (Chen, Magramo, 2024). Additionally, there are tools such as Nightshade (UCCS, 2024) and PhotoGuard (MIT CSAIL, 2023) that act as “poisons” to models by decreasing their reliability by adding authentication, and “hallucinations” (AI creating something it believes to be real, that is not).
Furthermore, with rising privacy concerns and a number of tools being built to prevent malicious AI use, legislation has begun to pass; recently, a law passed by NY governor Kathy Hochul, which was preceded by legislation in 45 other states, banned deepfake pornography (notably of young women) with the sentence being up to a year of jail time, and the right for victims to sue the perpetrator (Williams, 2023). However, at the time there are only state level laws and punishments, with no official federal law banning/criminalizing them; on June 20th, 2023, bill H.R.5586 was sponsored by representative Yvette Clarke to hold persons accountable to deepfakes, but it is currently under review in the House (US Congress, 2023).
Without any federal level law enforcement, or at the least criminal punishment in every state, the crime will only fester and become more advanced; for example, if an employee of an international bank can transfer $25 Million to a convincing panel of deepfake employees they frequently work with, imagine the chaos that could occur from someone as public and influential as the president of a country. The more public, and available a person’s personality/likeness is, the more effective a model could be trained to replicate their mannerisms. However, there are some steps you can take to limit or prevent yourself from becoming a victim, including managing who can see your content (eg. friends only, select people only, private account), watermarking photos, staying up to date on what kind of technology is being used, and of significant importance is to have MFA on your devices and individual accounts, with strong and unique passwords (NCA, 2023). Additionally, it is useful to combine these tips with traditional security advice:
Keep your devices security up to date with software and BIOS updates (if necessary)
Be analytical in your emails and messages of unknown senders to avoid phishing attacks by checking for strange fonts/symbols, semantical errors, garbled/bizarre email addresses, obvious grammatical issues, low resolution logos, and demands that would be unusual for an organization to request (ex. a “Microsoft Support” email asking for your banking details to fix an issue).
If something is too good to be true, approach it as if it is; if you get an email claiming to be from a high ranking official, representative, or well-known individual, check on their websites for there email structure, business email, and customer support emails; these are especially true for financial institutions.
Furthermore, due to the rapid rate of improvement, it is imperative for information professionals, IT providers, and privacy agencies/organizations to increase the educative materials that are related to generativeAI, privacy management, and proper email/computer analysis (such as point two above).
In conclusion, GenAI has spurred on a movement of deceptive practices that can be destructive to a person, organizations, or more; however, means are being produced to combat the negative effects. Currently, the rate of evolution is significantly higher than the rate at which it can be combatted, and with new legislation being passed to criminalize deepfakes, even if just at the state level, there is hope for the future.
References:
Chen, H., & Magramo, K. (2024, February 4). Finance worker pays out $25 million after
video call with deepfake ‘chief financial officer’. CNN.
https://www.cnn.com/2024/02/04/asia/deepfake-cfo-scam-hong-kong-intl-hnk/index.html
Gordon, R. (2023, July 31). Using AI to protect against AI image manipulation. MIT
News.
https://news.mit.edu/2023/using-ai-protect-against-ai-image-manipulation-0731
National Cybersecurity Alliance (NCA). (2023, December 22). How to protect yourself
against deepfakes. StaySafeOnline.
https://staysafeonline.org/resources/how-to-protect-yourself-against-deepfakes/
Shamsi, N. (2024, May 10). Technologies for authenticating media in the context of
Deepfakes. IEEE-USA InSight.
University of Chicago Computer Science (UCCS). (2024). TheGlazeProject - Our
mission and vision. Nightshade: Protecting Copyright.
https://nightshade.cs.uchicago.edu/aboutus.html
US Congress. (2023, September 20). H.R.5586 - DEEPFAKES Accountability Act.
https://www.congress.gov/bill/118th-congress/house-bill/5586/text
Williams, Z. (2023, October 2). New York bans Deepfake revenge porn distribution as AI
use grows. Bloomberg Law.
Event Report 2
I selected the 2019 RKKI Data Science (DS) conference because DS is a field directly within the field of Information Science (IS), and the speaker was Dr. Jaffray who is the CTO of MD Anderson Cancer Center. Thus, what is presented is at the intersection of information science and healthcare, both of which are fields I would be interested in joining. In this conference, Dr. Jaffray discusses the need for a Learning Health System (LHS), and to move away from the conventional definition of big data, as the base definition only includes the process of big data as flat as an Excel Spreadsheet (Jaffray, 2019, 1:20); however, healthcare is not flat, and the solutions need to be much more dynamic than just the functionality of a spreadsheet. To become more dynamic, it is necessary to address several challenges: the transition to LHS from the current form and mindset of clinicians, adding context to analytics, addressing bias and the data collection process through the use of ontology, and regulation for AI and Machine Learning (ML); there are a myriad of challenges not covered here, but these are the ones I will focus on to keep it brief.
Currently, AIs are trained on a plan-do-study-act paradigm, which means that the AI would “plan” (algorithmically, heuristically, etc.) how to get information, act (compose an output), study the results, and then learn based on insight such as positive reinforcement, or expert-verification (Jaffray, 2019, 10:11). This process is similar to natural learning, however as the final step requires human verification it can become challenging for consistent application as some clinicians are independent and may struggle to collaborate to verify such a large amount of trials/information. To account for this, the Anderson Cancer Center used the IOM paradigm to develop a system that accounts for the diversity of people, by aligning the data with digital teams, IT, and other departments with the organizations goals, allowing for seamless integration of the system into clinician and administrative staff workflow (Jaffray, 2019, 13:56-14:44). Additionally, the digital architecture/infrastructure should be built to work with current systems to optimize their performance, rather than replacing them; this can be done with the Big Machine concept, discussed next.
The Big Machine concept focuses on being able to collect a large amount of up-to-date data, responsibly and contextually, while conventional methods can be limited by old data. To achieve this, the four Vs of data can be applied: Volume (quantity), Velocity or how quickly data is created, processed and analyzed, Variety or the range of data types, and Veracity, or how reliable and accurate the data is. At the scale of a healthcare system, humans are restricted with how much information they can take in, especially if the goal is to understand the entire healthcare system, is impossible; some solutions to this are by using a closed-feedback loop to aggregate as much accurate data as possible through the use of ML, and by using more efficient and scalable data storage systems, such as data lakes, which are much more flexible/“viscous” then a conventionally rigid file structure (Jaffray, 2019, 15:29-17:59). This storage is imperative, as a new data type called Phenomics, or all the data about an individual’s characteristics and traits in one file, are massive in size, and equally in value; this data would add the necessary context to make informed decisions. However, as the data is unique and varied from person to person, it’s expensive, difficult to source, and computer analysis of such data is difficult due to the complexity of individuals. To decrease the complexity, it is possible to use ontology, which is a branch of philosophy to determine the reality/truthfulness of situations; in the concept of ML, ontology is creating a framework for machines to share meaning (Jaffray, 2019, 22:00-23:19).
Applying the concepts of web development and ontology onto healthcare, it is possible to semantically organize every measurement and quantity to be linked by subject, predicate (analysis of historical data), and consistent revision of the LHS; this would make it more efficient in collecting the context, while being in the format for analysis/aggregation of the LHS (Jaffray, 2019, 23:26-25:05).
However, as aggregation and training on patient data is necessary, the necessary legal and data frameworks must be in place to benefit the patient, while maintaining the integrity of the healthcare system. For example, humans inevitably have bias such as gender and racial disparities; if an OHS was trained on skewed/biased training data, it would benefit certain groups at the expense of increased health disparities for certain groups. As this topic is so important, it could be its own paper; to keep this section brief I will focus on three main aspects: privacy, data ownership, and ethical data use. Currently, there is a large amount of data that is necessary for training of LHS’, but getting this data would require a way to maintain the privacy, and security granted by laws such as HIPAA; an additional layer of complexity is that declassification of data is complicated, with the redaction of unique identifiers making it difficult for classification, and the removal of indirect identifiers can hinder the quality of analysis, if able to be performed at all. For example, let’s say you were to train a LHS to predict the probability for a person to develop the type of low frequency hearing loss I have, what would you need for quality analysis? You’d likely need some form of identifiers such as the unique frequencies a person has deficiencies in, whether they had any preexisting conditions (diabetes, cancer, infections, etc.), income level, and what industry the work in (if applicable), and what state they reside in to be predictors. Let’s say the data is completely redacted, to not have name, patient#, or any direct identifiers, would these data points still be able to identify someone with due diligence? It is possible; lets apply this to a fictitious redacted record of myself with public data. I have low frequency hearing loss, I was born prematurely, my dad is a doctor and photographer, and I am currently a researcher, and am living in Florida. If you combine all of these indirect identifiers, it would be possible for someone to potentially identify me with only these identifiers if you knew that 1. I was included in the training data, 2. you knew me personally or read my introductory posts on my Substack/website, 3. Had a LinkedIn account which would allow you to find almost all of this data, which is then linked to my name. With this perspective, would you redact all of this information as well? Most likely not, as now you don’t have any consistent demographic information, or information vital to understand the correlation of certain variables to the diagnosis; quality data requires sacrifice of total anonymity, because otherwise in this case you’d be nothing more than a data point on a graph, which is useless on its own. Finding the ideal opportunity cost of identifiers is difficult because certain direct identifiers can become direct identifiers in context; for example, if a person is from Timbuktu, which would normally be considered an indirect identifier, would be considered a direct identifier if they were one of a very select few in the dataset to be from there. Such extreme disparities can be reduced through normalization of values, such as setting a threshold of how many times a quality is present to be presented; even then, the biggest challenge in this element is identifying the bare minimum necessary information to achieve beneficial LHS training, while protecting the privacy of the patients.
To conclude, data science and LHS’ in healthcare have benefits, but have ample challenges that must be cleared up before widespread implementation is possible.
References:
Jaffray, D., & Rice Ken Kennedy Institute. (2019, October 18). 2019 Data Science Conference - David Jaffray. YouTube.
UPPER LEVEL COURSES:
List Of Upper-Level Information Science Courses:
All of these courses were taken between Fall 2023 to Spring 2024.
LIS3261 and LIS4204: Intro to Information Science (IS) and Information Behaviors (IB):
These courses involved learning about different Information Models such as Derwin’s Sense Making Theory, Wilson’s Model, and Kuhlthau’s Model; in LIS3261, these models allowed for a better understanding of the way people filter and sort information to address the What and Where of information needs. For example, in this class we were asked to identify which model was used to collect information (What), and Where the information is from (online, library, etc.). Conversely, LIS4204 used these models to deal with the Why and How of information seeking, namely the behaviors of people collecting information, such as if they preferred formal sources (libraries, peer-reviewed articles), informal sources (friends, family, YouTube, Wikipedia), and what the optimal time to use them is. These classes in tandem led me to become more interested in the way people interacted with information, and the process they go through to optimize the benefit of information seeking; However, I felt that the current models were too rigid and unnatural, leading to the creation of my Helmet Information Seeking Behavior model. Taken in Fall 2023, and Spring 2024 respectively, these were arguably the most influential courses I’ve taken in my upper level, as I plan to continue research in the field to understand the psychological/social aspects on information seeking, and cognition broadly.
IT or Cybersecurity related courses in healthcare:
LIS3353 and LIS4776: IT Concepts and Health IT (HIT)
Both of these courses dealt with the common technologies and how they operated, however LIS4776 was more specific to technologies in healthcare such as EHRs and CDSS’ and had a group project related to CVD prevention in the elderly (65+), where I was the group leader.
LIS4930 and LIS4477: Electronic Health Records (EHR) Clinical Decision Support
As mentioned in the previous section, EHR’s and CDSS’ were a point mentioned in LIS4776, but these went into depth of how they functioned on real life example data. In LIS4477, we were tasked with learning about neural networks, making basic NNs, and making data trees in SEE5 to understand data scraping and analysis in healthcare. These tools transferred to LIS4930, as now you are actually working in an EHR to enter fictitious patient info and using some basic CDSS in what would be a routine workflow for a clinician. This is important as I would like to work for Mayo Clinic or a similar organization at some point, and learning how to enter and manage patient data will be critical.
LIS4482, LIS4779, and CIS4365: Networks and Communication, Health Information Security (HIS), Computer Security Policies respectively
These three courses are all related to the security and technologies used to deal with the security of health information (HI) or personal/identifying information (PI, PII) in some form. For example, is LIS4482, we learned how to use Wireshark, a packet capture software to analyze network traffic (on private networks), which is useful to detect any strange/malicious activity on your network. This is useful as it can show where the origin of an attack came from in very specific circumstances, or general monitoring of network health and effectiveness. Learning about all of the technologies, internet standards, and mechanisms of security aided me in LIS4779 to understand the concepts better, and to develop more comprehensive security plans in case studies. IN CIS4365, we analyzed what laws were used for privacy, piracy, and copyright in domestic and international scenarios to better understand how to uphold security coming from the perspective of IT or systems integrators, or anyone involved in developing the security policies).
Data Analytics:
LIS4785 and LIS4800: Intro to Health Informatics and Introduction to Data Science (DS)
These courses were both related to data analytics, with LIS4785 focused conceptually on the different fields of data analytics in healthcare, its use case, and how it works at the basic level of organizations and hospitals, while LIS4800 focuses on programming in R to conduct statistical analysis on data sets. The information in these courses are extremely useful for my future as a researcher, as it provides necessary context and adds legitimacy/competency to my claims, while providing a practical use case for data analytics. Some additional course that helped in this aspect are the lower division courses while I was a cybersecurity major were CGS1540 and COP2513, or Intro to Database and Object Oriented Programming for IT; these courses allowed me to learn how to properly manage databases in SQL for analysis from a storage perspective, and how to use and scrape websites to gain data for web applications, such as an up-to-date stock chart using data from Yahoo Finance in C. One thing to note is that I was already certified in SQL in high school prior to taking CGS1540, and developed the code for an autonomous robot in C (which placed 27th at the state level), so these classes were to add experience and legitimacy to my skills.
LIS4934: Senior Capstone
This course allowed me to conceptualize my nearing of four years of college, and to make a professional website, which has been a long-term goal of mine; I was a free lance web developer for several years, but never had a site of my own because I wanted to unify all of my projects into one, and now have the proper skillset to do so.
IDS4914: Advanced Undergraduate Research Experience
This course is to get credit for my research in the Risk to Resilience Trafficking in Person’s (TIPs) Lab at USF under Dr. Reid’s guidance. I have made summaries for publications, transcribed and analyzed transcripts of meetings from 2021-24 to extract meaningful information, worked on the sampling frame for BRIGHT, a project that aims to unify services for trafficking victims in Florida, and managed sampling data and a small team for collecting service provider information across Florida.
List of Upper-Level External Courses:
ENC3250: Professional Writing
This course, alongside LIS2005 have been vital, even though ENC3250 is not directly connected into the flowchart of my degree; LIS2005 emphasized proper research methodology, how to use research databases, and how to manually create APA7 citations, which have greatly increased my competency and efficiency as a writer and researcher. However, I was not yet satisfied with my knowledge of writing and the mechanisms that add credibility. To account for this, I took ENC3250, which involved writing professionally in morally ambiguous or complicated scenarios, such as building a property that neighbors a Native American reservation, or the original group project on food scarcity for USF students; I was also group leader for this project.
SPA3030: Intro to Hearing Science
This is the first formal course I’ve taken to gain a better understanding of the way humans perceive speech, the function and anatomy of ears, and the physics behind them; previously, I was involved in audiology research for a number of years by providing data analysis. Additionally, I want to apply to audiology school to become an audiologist in the future, as such I want to gain as much knowledge as I can on these topics.
CCJ3024: Survey of the Crim Just System (CJS):
Every semester I do my best to take at least one class related to the research I am doing at the time; I am currently taking this course as I am working at the TIPs lab and wanted to gain context and apply my work externally; furthermore, this course has given me much more empathy to victims of the CJS, and has made my research at the TIPs lab much more fulfilling; I plan to continue this research into graduate school.
Part Two: Most Influential Courses
During my almost four years of college, I’ve taken several influential courses, which stood out for their intriguing concepts or from the lessons I still use in my daily life, which includes the assignments throughout the course or the required readings; however, there were three main assignments that have has a long-term effect or have been beneficial to my overall day-to-day operations: the final essay in Information Literacy, final paper for Introduction to Philosophy, and the presentation for intro to Health Informatics (HI) and the general coursework of Introduction to Information Science (IS).
First, from a rather understated class of information literacy, is the final GEA essay; this course focused on research methods, information collection and filtration of online info, such as filtering through academic sources and the library database, and most importantly how to manually or using specific citation managers to cite your sources in APA 7th format. The way I was taught how to manage references in this class just over a year ago is still my preferred way to manage references and has improved the quality of not only my research, but personal work as well; for example, I am a writer in my spare time, where establishing credibility is vital, which these lessons allowed for. In my research, I have to look through multiple sources of information, which include the manufacturer’s pages, and the individual studies on them to verify that all of the information collected is accurate, reliable, and timely; however, this leads to having dozens of tabs open at the same time, which may all be relevant for the future, but not necessarily for publication if not used. To manage these sources, I use the citation manager/generator CiteFast, as it is the most accurate APA7th generator I’ve found, and since it’s on the web I can get my citations conveniently on every device, while manually typing significantly less than popular managers such as EndNote. As I have several essays a month, alongside research and personal use, I end up using these techniques almost every day and have felt my work improve in quality and reliability, while being faster than my old methods.
Second, is a class outside of the IS major, but it has been extremely influential in my personal life, which was the final paper in my Introduction to Philosophy class in my first semester of college. In the final paper, we had to make a summary of the different philosophy books we read, including some from Plato, Aristotle, Mill, and Berkley, who is my personal favorite, amongst others, and how you would apply the philosophies to your life. Ever since writing that paper, my perception has changed, and I could feel that I had become more analytical in my daily life; for example, Berkley’s theory of Ideas describes that the individual components of every object would form your perception of the objects themselves, and this process is how I compose my photographs, and determine what I want to include in them. Additionally, Mill’s Utilitarianism describes the use of Hedonic Calculus or assigning values of pain and pleasure to different actions, to determine if an action is moral, or in this case, worth doing; conceptualizing all of the different factors that would affect a decision in my mind for important decisions has allowed me to prioritize, and optimize my work, personal or professional as it allowed me to become more aware of what is necessary.
Finally, returning to the IS major, the third most influential assignment are two classes that filled the same purpose for me: the group-presentation for Introduction to HI, where the group I led researched and presented the benefits of EKG/ECGs on Cardiovascular Disease prevention for people 65+, and the intro to IS course as a whole. Intro to HI was the first time I actually met peers within my major, as discussion posts became the norm when the IS major went online at the end of my sophomore year; thus, this class was my first interpretation of the students/individuals pursuing a career in health informatics. After taking this course, alongside intro to IS, I finalized my intent to pursue the IS degree, and to work towards a career in the field, as I would like to work as a clinician during my graduate school. Without these classes, I would not have felt as confident as I do in my decision to become an IS major; they made me realize how to work my peers, manage and organize information, and to help me get onto the side of Cognition research I am now most interested in: How people think and process information.
Information Science Major Reflection
Overall, during the coursework I took during this major I felt that I improved my skills in writing, project and time management, and gaining confidence in my skills for a clinical position. This is due to the fact that the assignments had fairly strict, but manageable deadlines, and often time varied minimums and maximums allowing for you to do as much or as little work necessary to achieve a goal to your satisfaction, rather than meeting word counts; thus, the work was often to the quality I wanted, allowing for me to feel pride not only in the major, but the individual courses as well. Additionally, rather than just having a PowerPoint on an important technology and moving on to the next topic, multiple units would be focused on operating/using these technologies in a well-controlled, but flexible environment to truly understand how to use them for there intended purpose, and for personal use. For example, we learned WireShark, which is a packet capture software used in cybersecurity for network traffic monitoring, and I use it monthly to check for anything wrong or strange on my network at home and in my personal network in my apartment. However, I didn’t enjoy the entirety of the IS major; going online but paying for rent in a roommate I didn’t get along with, and having no in person classes for a year was draining and I felt I had regressed at the end of the year since I had minimal interaction with others since I switched out of engineering and the friends I made there had classes and lived off campus. However, this made me seek out new clubs, research, and classes to take for personal enrichment rather than for credit since I was doing significantly better in these classes than engineering as I enjoyed the work much more than engineering, while being more capable in the coursework, and didn’t need “buffer classes” anymore. Since switching into IS, I have joined three labs, performed or contributed to research in five disciplines, became an e-board member for a club I founded with my brother, during my two and half years of IS, and started photography as I had less in person classes due to the major being online. However, this double-edged sword is an equally important lesson: having a lot of time means you have to use the time properly; at first, I couldn’t focus on online classes and would do self-review to compensate and felt I was wasting time, which led me to look for the opportunities that I am now so proud of, but that doesn’t mean that I would like more online classes. In fact, I would say it would be beneficial to have the entry level IS classes, such as intro to IS, Information Literacy, Information Seeking/Behaviors in person so that the major would properly connect you with your peers; I didn’t feel that connection until I engaged with my team for the PowerPoint in Intro to Health Informatics, and the senior capstone.
I love the work in IS, but my true passion lies in audiology; to merge these fields I hope I can become an audiologist who can use my skills in IS to better convey diagnosis, the process/transitionary period between fitting to first time hearing aid use in daily life and improve quality of care (QOC) for my patients. Furthermore, on a conceptual level Information Seeking in one of the most vital steps in our cognition and using IS strategies can improve how we understand individual thought processes, which has improved my empathy, and desire to understand a person, with the aim to improve QOC by properly conveying information to my patients. If there is nothing else I could say about IS, is that it is an incredibly practical and useful major; the focus is on information, and proper or thoughtful utilization of it at a macro level, which is necessary for every discipline, and I strongly recommend it for new freshman whenever I talk to undecided majors.
To conclude, Information Science is a major that should not be overlooked due to its flexibility personally and professionally but is a major that comes with unique challenges such as the intrinsic motivation required to use the time you have effectively, and lack of connection during the latter half of the major if you improperly plan like I did when I switched in during the start of my junior year.